Copyright © 2005 International Business
Machines®,
All Rights Reserved.
Licensed under the Common Public
License (CPL) version 1.0
The goal of this project is to produce an FCode Tokenizer that can both be used in-house and be presented to third-party vendors, VARs and the like, as a professional-quality tool for their use, without adversely affecting IBM's Intellectual-Property rights.
We are using, as a starting basis, the Open-Source tokenizer that is available from the OpenBIOS project .
We expect to be able to deliver this tool to our vendors by returning our modifications to the OpenBIOS project, from whence it can be obtained openly by anyone.
This document describes those features and how they are used.
There will be a brief overview of Error Detection and other messages.
There will be some examples at the end.
A FATAL condition is sufficient cause to immediately stop activity. It is usually (but not always) a symptom of a system failure, rather than a result of User input.
An ERROR
occurs as a result of User input. It is a condition sufficient to
make the run a failure, but not to stop activity. Unless the -i
("Ignore Errors") Command-Line
option has been specified, the production of a Binary
Output file
will be suppressed and theTokenizer will exit with a non-zero status,
if an Error Message has been issued.
A WARNING is issued for a condition that, while not necessarily an error, might be something to avoid. E.g., a deprecated feature, or a feature that might be incompatible with other Standard tokenizers.
An ADVISORY message is issued for a condition that is a response to User input, and where processing continues unchanged, but it is nonetheless (in this author's opinion) worthwhile to give the User a "heads-up" to make sure that what you got is what you wanted. ADVISORY messages are only displayed when the verbose Command-Line option is selected.
A User-generated MESSAGE
-- unsurprisingly -- is a message generated by the User via any of the
directives supported for that purpose.
A TRACE-NOTE is
issued if the Trace-Symbols
feature is activated, whenever a symbol on the Trace-List is either
created or invoked.
Each message of the above types is
accompanied by the name of the
source file and the line number in which the condition that triggered
it was detected, as well as the current position to which data is being
generated into the Binary Output. If a PCI Header is in effect,
the position relative to the end of that PCI Header will also be shown;
this is to maintain consistency with the "offsets" displayed by the DeTokenizer
The Tokenizer typically runs through to completion of the source file, displaying an ERROR message for each error it encounters (I.e., it does not "bail" after the first error). If the -i ("Ignore Errors") Command-Line option has been specified, the Tokenizer will attempt to produce binary output for each error, as far as is feasible, and will produce a Binary Output file. While this practice is not recommended, the author acknowledges that it might be useful in some limited circumstances.
At the end of its run, the Tokenizer will print a tally of the number of each type of message that was generated.Character-case is preserved in string sequences and in the
assignment
of names of headered definitions, but is ignored for purposes of
name-matching. Case sensitivity of filenames, of course, is
dependent on the Host Operating System.
This Tokenizer supports a pair of Special-Feature Flags that will enable the User to over-ride the preservation of character-case in the assignment of names of headered definitions.
Verbose -- print additional
messages (including Advisories) during
tokenization.
Ignore Errors. Generate a Binary Output even if errors were reported.
Direct the Binary Output (FCode result of Tokenization) to the named file instead of to the default-named file. This option is not valid when multiple input files are named.
FLoad List -- Collect the names of floaded files into an FLoad-List File. The names collected are in the same form as they were presented in the fload statements.
The name of the FLoad-List File is
derived
from
the name of
the Binary Output File, by replacing its extension with .fl
, or, if the Binary Output File name had no extension, merely appending
the
extension .fl
The Binary Output File name used for this
purpose is either the one specified on the Command
Line, or the one created by default.
Dependency List -- Collect the
fully-resolved pathnames
of floaded
and ENCODEd files into a Dependency-List
File. The names collected are in the form that is presented to
the Host Operating System: Shell Environment Variables
and
related expressions will be fully expanded, and the directory within
the Include-List in which the file was
found will
be attached.
The name of the Dependency-List File will be the same as that of the FLoad-List File, except that its extension will be .P instead of .fl
The name of the Missing-Files-List file will be the same as that of the FLoad-List File except that its extension will be .fl.missing instead of .fl
The Missing-Files-List file will not be
created if all of the files are read successfully.
If the name of the Binary Output File is changed by a directive embedded within the Tokenization Source File, that will not alter the names of the FLoad-List, Dependency List or Missing-Files-List files.
This Tokenizer supports the notion of an Include-List. The User creates the Include-List by specifying a number of -I directory pairs on the Command-Line. All file-reads, whether for an fload command or an encode-file directive, will involve a search for the named file through the directories of the Include-List, in the order they were supplied on the Command-Line.
If no Include-List is created, file-reads
are relative to the Current Working
Directory. If an Include-List is
created, file-reads are
restricted to the directories within it. For the Current Working
Directory to be included in the file-search, it must be specified
explicitly. -I.
will accomplish that quite
effectively.
This Tokenizer supports the notion of a
"Trace-List". The User creates the Trace-List by specifying a
number of -T
<symbol> pairs on the Command-Line.
When a name is defined, whether as an FCode, an alias, a Macro or anything else, either in normal tokenization mode or "Tokenizer Escape"-mode, if it matches a symbol that has been added to the Trace List, a Trace Note Message will be issued indicating that a definition of that name has been created. Subsequent Trace Note Messages will be issued when the definition of that name is invoked.
This "Trace-Symbols"
feature can be helpful during maintenance of Legacy
code, for instance, when multiple symbols carry the same name.
Define a Command-Line Symbol.
Optionally, assign a value to it. If you wish the "value" to
contain spaces or quotes, you can accomplish that using the shell
escape conventions. This sequence may be repeated. Once a
Symbol is defined on the command-line, it stays in effect for the
duration of the entire batch of tokenizations (i.e., if there are
multiple input files named on the command line). Command-Line
Symbols can be tested
for purposes of Conditional Tokenization, or their assigned values can
be
Evaluated.
The Tokenizer
recognizes a specific set of Special-Feature Flag-names; each is
associated with a
specific non-Standard variant
behavior. Pass the Flag-name as an argument to the -f
switch to enable the behavior; to disable it, precede the Flag-name
with the
optional string No
The settings of the Special-Feature Flags
can also be changed or displayed from
within the Source Input File
The Special-Feature Flags are all
initially set to be enabled, except where noted.
The Flag-names and their associated Special-Features are as follows:
Support IBM-style Local Values ("LV"s). Initially disabled.
Allow Semicolon for Local Values Separator ("Legacy").
Display a Warning Message when Semicolon is used as the Local Values Separator.
Allow ABORT" macro.
ABORT" with implicit IF ... THEN
Use -2 THROW , rather than ABORT, in an Abort" phrase
Allow "\ (Quote-Backslash) to interrupt string parsing.
Allow \ (Backslash) to interrupt hex-sequence parsing within a string.
Allow the C-style String-Escape pairs \n \t and \xx\ to be treated as special characters in string parsing.
Over-ride occurrences of the Standard directive headerless in the Source with -- effectively -- headers to make all definitions have a header. Occurrences of the directive external will continue to behave in the Standard manner. Initially disabled.
All definitions will be made as though under the external directive; occurrences of either Standard directive headerless or headers in the Source will be over-ridden. This Special-Feature Flag will also over-ride the Always-Headers Special-Feature Flag in the event that both have been specified. Initially disabled.
Also, the pseudo-Flag-name help will cause a list of the Flag-names and their associated Special-Features to be printed.
The use of some of these flags is illustrated in Example #2
The directive tokenizer[ behaves as specified in Section C.3.1 of the IEEE-1275 Standard: it saves the current tokenizer numeric conversion radix, sets the radix to sixteen (hexadecimal) and enters “tokenizer-escape” mode. Likewise, the directive ]tokenizer restores the radix and resumes the Tokenizer’s normal behavior.
For convenience and compatibility with IBM's source-base, the directives f[ and f] are synonyms for tokenizer[ and ]tokenizer respectively. In addition, the variant ]f is a synonym for f] .
The numeric conversion radix can be changed during
“Tokenizer-Escape” mode by the use of the standard directives hex
decimal and
octal . These will always change the numeric
conversion radix in “tokenizer-escape” mode; even if
“tokenizer-escape” mode was entered in the middle of a
colon-definition, they will not issue an FCode sequence. And, as
per the Standard, the numeric conversion radix will be restored when
the Tokenizer returns to "Normal" mode.
This
Tokenizer supports the emit-byte
command as specified in the Section cited above. In order to be
able to do that, the “tokenizer-escape” mode needs to be able to
support a tokenization-time data stack, and, indeed, it does.
In
“tokenizer-escape” mode, a string representing a number (optionally
preceded by one of the directives h#
d# or
o# ) causes that number to be pushed to the stack, where
it is available for use by any of several other commands as follows:
Note that this Tokenizer supports additional directives for manipulating the FCode-token-number
assignment counter.
Define <Name>
as a named constant with the value that was on the stack. <Name>
will be known within “tokenizer-escape” mode but will not be recognized
during normal tokenization. When <Name>
is invoked, its value will be pushed onto the stack. Two named
constants are pre-defined: true
and
false
Emit the value that was on the stack as
an FCode token, either one or two bytes long, depending on the
value. Report an Error if the number
supplied on the stack is
outside the legal range
specified by the IEEE-1275 Standard. Since this is sort of a
"cheat", issue an Advisory if the
operation is successful.
In addition to the above, “Tokenizer-escape” mode recognizes a limited
number of FORTH-compatible named constants and operations, as follows:
These are different from the corresponding words in "Normal" mode, which would compile an FCode token. In “Tokenizer-escape” mode, they initiate an immediate action within the Tokenization process.
This Tokenizer supports a number of directives that are not
specified
by the Standard, but which serve functions as follows:
The "True" segment immediately follows the Condition-Tester, and is ended either by the "False" segment switcher or by the Conditional-block terminator. If the "False" segment switcher is present, it introduces the "False" segment, which is ended by the Conditional-block terminator. The three delimiters (in reverse order) are:
If the
Conditional-block in question was nested within another
one, resume conditional processing at the
level of the enclosing segment. When the outermost
Conditional-block is exited, resume normal processing.
While it is a requirement that the Conditional-block terminator be contained within the same Input File as the Condition-Tester and its optional "False" segment switcher, the body of a Conditional Segment may contain separate fload directives, which will be processed or ignored in accordance with the prevailing Condition. An illustration can be seen in Example #4.
There are several synonyms for the Conditional-block terminator:Reverses the sense of the condition and introduces the "False" segment.
If the Condition-Test resulted in "TRUE", then the first segment -- the "True" segment -- was processed and the "False" segment will be ignored. Conversely, if the Condition-Test resulted in "FALSE", then the first segment -- the "True" segment -- was ignored and the "False" segment will be processed.
There are three synonyms for this
function:
A number that was placed on the stack in “Tokenizer-Escape” mode will be consumed and tested. If the number was non-zero, the Condition-Test will result in "TRUE", and the first segment -- the "True" segment -- will be processed and the "False" segment, if present, will be ignored. Conversely, if the number was zero, the Condition-Test will result in "FALSE", and the first segment -- the "True" segment -- will be ignored and the "False" segment, if present, will be processed.
There is only one word for this
function:
A name-string must follow the directive on the same line. A search for that name is conducted through the word-list of the mode -- i.e., "Tokenizer Escape" mode versus "Normal" mode -- and the Scope -- i.e., "Global" versus "Current Device-Node" -- in which the Tokenizer is currently operating.
If the directive is for the existence of the definition, and the name is found, or if the directive is for the non-existence of the definition, and the name is not found, the Condition-Test will result in "TRUE". Otherwise, the Condition-Test results in "FALSE".
(Note the variants with
and without a final 's')
A pair of Condition-Tester directives are supported that will test, respectively, for definition or non-definition of a named Symbol. Their operation is similar to that of the Tests for existence of non-existence of a Definition, in that a name-string must follow the directive on the same line, but they are different in the matter of where the search for the name string is conducted: these directives search the list of Command-Line Symbols. The relation between the type of directive and the result of the search is also similar to that of the Tests for existence of non-existence of a Definition.
When a definition is created whose name duplicates that of an existing definition, default behavior is to issue a WARNING notification.
Intentionally creating such a duplicately-named definition is called "overloading" the name. This may be required, for instance, to supplant future invocations of the named function with a version that incorporates the earlier instance of the function of the same name and supplies additional behavior.
When this is intentional, no warning should be issued. This
Tokenizer
supports a directive called overload
which the User may invoke to bypass the duplicate-name test for the
duration of only one definition that follows
it (as contrasted with suspending
the test globally). An illustration of its use may be seen in
Example 4.
The Standard allows Comments and Strings that extend across multiple
lines, terminated by their specific delimiter. In addition, this
Tokenizer allows User-generated Messages and Local-Values Declarations
that, likewise, extend across multiple lines. Because of the
potential for a cascade of errors that can be caused by a missing
delimiter, this Tokenizer issues a WARNING
notification whenever such an entity is encountered. Example #5 shows an occasion where this
might be helpful.
When this is intentional, no warning should be issued. This Tokenizer supports a directive called multi-line which the User may invoke to suppress the notification for the next multi-line item that follows it.
The User may wish to create a collection of utility functions that will be available to all the device-nodes in a multiple-node driver. Directly accessing such functions across device-nodes can be risky if any component uses instance data. However, if the use of instance data is precluded, such access can be performed safely.
This Tokenizer
supports a pair of directives for setting the Scope of
Definitions. The one called global-definitions
will cause all subsequent definitions -- Aliases and Macros as well as
FCode -- to be entered into the same vocabulary -- referred to as the
"core" -- from which all the Standard functions are drawn. The
other, called device-definitions
, will cause the Scope of Definitions to revert to the current
device-node. The use of these directives is
illustrated in
Example #4.
While Global Scope is in effect, the use of the word instance will not be allowed, nor will definitions made in any device-node be recognized. Conversely, an attempt to enter Global Scope while instance is in effect will be reported as an Error condition.
The Symbol name must appear on the same line as the directive.
If the Symbol name is not found or no value has been assigned to it, a WARNING will be issued.
Otherwise, the associated value will be interpreted as though it were source-code.The synonyms for the directive to evaluate a Command-Line Symbol's
assigned value are:
The FLITERAL directive can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, but the number to be emitted must have been put on top of the Stack during "Tokenizer Escape" mode.
The [fcode-time]
directive produces a string that consists of the current time,
formatted as hh:mm:ss ZZZ
(where ZZZ
represents the local Time-Zone).
An illustration
is included in Example 4.
The [function-name] directive, when invoked in "Normal" mode, produces an in-line string that consists of the name of the function (colon-definition) most recently -- or currently being -- defined. It will persist until a new colon-definition is started. This can be useful for embedding interpetation-time or run-time debugging-statements into or after functions' definitions. In "Tokenizer-Escape" mode, it will display the function name in more detail as a User-generated MESSAGE. Example 5 illustrates its use in "Normal" mode, and has a small illustration of its use in "Tokenizer-Escape" mode.
The [line-number]directive
produces an in-line numeric literal giving the Line-Number within the
current Input File.
These directives
can only be invoked in "Normal" mode; they are not supported -- or
needed -- in
"Tokenizer-Escape" mode, because any User-generated MESSAGE will
print that information. An illustration
is included in Example 5.
An attempt to issue the pci-header directive after FCode output has begun will be reported as an Error condition.
The PCI Header cannot be completed until after the FCode Binary image has been completely tokenized. The directive to complete the PCI Header should be issued at the end of the process. Synonyms for this directive are:
The default setting for this field is 1.
Ordinarily, the "Revision Level of the Vendor's ROM" field of the PCI Header follows the general PCI Standard convention of "Little-Endian" byte-order. However, some organizations' legacy code saves this field in Big-Endian order. To accommodate the need for compatibility, this Tokenizer supports a Special-Feature Flag called Big-End-PCI-Rev-Level. Its default is no, i.e., to follow the PCI Standard.
These directives can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, but the number to be examined must have been put on top of the Stack during "Tokenizer Escape" mode.
The default setting for this field is TRUE.
By default, the name of the Binary Output File is derived from the name of the Source Input File, by replacing its extension with .fc , or, if the Input File name had no extension, merely appending the extension .fc
The name of the Binary Output File may be specified on the Command Line, as noted above.
The name of the Binary Output File may also be specified by a directive
embedded within the Tokenization Source File. Synonyms for the
directive are save-image
and save-img
; the name follows after the directive on
the same line.
The directive can be invoked from either "Tokenizer Escape" mode or
normal tokenization mode. It does not cause an imediate action (i.e.,
saving the Binary Image), but merely alters the name of the file to
which the image will be saved when tokenization is completed.
Note that this directive does not change the name of the FLoad-List File, if one has been specified.
These directives can be invoked from either "Tokenizer Escape" mode or normal tokenization mode:
In addition, the commands ." (Dot-Quote) and .( (Dot-Paren), which have Standard meanings in normal tokenization mode, will, when invoked in "Tokenizer Escape" mode, collect text in their Standard manner and output it as a User-generated MESSAGE:
Also, as noted earlier, the [fcode-date] and [fcode-time] directives, when invoked in "Tokenizer Escape" mode, will display the current date and time, respectively, as a User-generated MESSAGE:
The User is hereby admonished to exercise caution when using this
directive. Not all combinations are meaningful, and automated
error-checking is not feasible. An Advisory
Message will
be issued to remind the User of the change.
In order to protect against unintended collisions in FCode-token
numbers, which can cause severe
errors at run-time, this Tokenizer will report an Error
the first time
an FCode-number assignment overlaps FCodes that were assigned in a
different range.
Because the programmer may choose to "recycle" FCode-token-numbers intentionally, a directive is supported that clears the records of FCode-number assignments and resets the FCode-token-number assignment counter to its inital value.
F[ h# 800 next-fcode F]
which merely re-initalizes the value of the FCode-token-number assignment counter, but does not prevent the first assignment of this FCode-number from being regarded as overlapping an earlier assignment and therefore being reported as an Error.The FCode-token-number assignment
counter and the records of previous FCode-number assignments will also
be automatically re-initalized when a new
PCI-Image block is started.
Symbol names defined in "Tokenizer Escape" mode persist throughout the entire run of the Tokenizer.
The directive Reset-Symbols enables the User to delete symbol-definitions at any other time, should that be required. Reset-Symbols is sensitive to the the mode in which the Tokenizer is operating: invoked during "Tokenizer Escape" mode, it will cause only the definitions made under "Tokenizer Escape" mode to be deleted; during "Normal" mode, only those made under "Normal" mode. Example #1 contains an invocation in "Tokenizer Escape" mode.In this section, we will discuss:
Certain commands and directives expect a token to appear on the same line, and will report an Error condition if no token appears on the same line:
If the token is present, but cannot be converted to a number using the appropriate radix, i.e., if the conversion fails, the Tokenizer will issue a WARNING, ignore the directive and attempt to process the token as ordinary input. If the name is not known, the "ordinary processing" will catch the error.
An exception is made for a word defined as a variable : Although such words are not a valid target for to, many platforms' firmware will execute the sequence correctly; this Tokenizer will issue a WARNING.
instance : nonsense dup swap
drop ;
variable boombah
the colon-definition of nonsense is unaffected by the occurrence of instance , which will, instead, be applied to the defining-word variable (which defines boombah ) on a later line.
Since instance would typically be followed immediately by the defining-word to which it is intended to apply, a sequence like the above can reasonably be presumed to be a likely error. This Tokenizer will issue a WARNING when an inapplicable defining-word is encountered while instance is in effect, and another WARNING when the dangling instance is finally applied to a valid definer.
The Standard says nothing about what
should occur
if instance
has not been applied by the time a device-node is "finish"ed
or a new device-node
is started. In most Platforms' implementations of the FCode
Interpreter, the instance
state will remain in effect. This Tokenizer will issue a WARNING
the first time a device-node
is changed while an instance
remains unresolved.
Also, instance
can not be allowed when Global Scope
is in effect, as explained elsewhere.
In normal tokenization mode the alias directive behaves as specified in the Standard, i.e., it creates a new command with the exact behavior of an existing command. Any occurrence of the new command will cause the assigned FCode of the old command to be generated, with all the implications that follow from that.
The Standard, further, states that: "In FCode source, alias cannot be called from within a colon definition." However, this Tokenizer can handle that, and issues a WARNING message when it does so.
The Standard does not specify the
behavior of alias
in "Tokenizer Escape" mode. This
Tokenizer will allow alias
commands issued in "Tokenizer Escape" mode to take effect in that mode,
and, furthermore, aliases to words that are recognized in either mode
may be created in either mode and will be recognized in either mode.
Generally speaking, an alias definition will take on the Scope that is current at the time it is made: If Device Scope is in effect, the new name -- even if it is an alias to a word that has Global scope -- will only be accessible from the current device-node. An ADVISORY message will be issued for this condition.
Common practice -- widely enough used as
to merit consideration as an undocumented de-facto standard -- has been
to recognize a set of letters and to translate them into special
characters when they immediately follow the "
.
This Tokenizer translates String-Escape Quoted-pairs as follows:
"n | New-Line |
"l | New-Line |
"r | Carriage-Return |
"t | Horizontal Tab |
"f | Form-Feed |
"b | Backspace |
"! | Bell |
"^L | Quote-Caret followed by a letter is translated as "Control"-the-letter. |
"Other | Any unrecognized character
following the "
is taken verbatim. |
"" | The way to embed a double-quote
into a string is to
escape it with itself. (This is a special instance of the
preceding rule) |
"( | As was mentioned above,
Quote-Open-Parenthesis begins parsing a hex-sequence as per Section A.2
of the
Standard under the description of the "
operator. Details will be discussed below. |
"\ | Quote-Backslash permits
insertion of remarks into the middle of a string definition; it will
interrupt
string parsing, causing the remainder of
the line, together with any whitespace that might begin the new line,
to be ignored.
Because this feature is not in usual practice, the User can disable it by invoking the noString-remark-escape Special-Feature Flag. If this feature is disabled, the Backslash following the " will be taken verbatim; the Backslash, the text following it on the remainder of the line, the new-line and the whitespace on the next line -- all of which would otherwise have been ignored -- will be included in the string parsing and incorporated into the result. |
The Standard makes no mention about what to do when a string reaches
a new-line before its termination. This
Tokenizer continues parsing and includes the new-line, together with
any whitespace that might begin the new line, in the result.
This Tokenizer also supports
an option -- not in usual practice,
hence listed separately here -- that permits insertion of remarks in
the middle of a hex-sequence in a string. The occurrence of a
single \
(backslash) character will interrupt hex-sequence parsing, causing the
remainder of the line,
together with any whitespace that might begin the next line, to be
ignored. The User can disable this feature by invoking the noHex-remark-escape Special-Feature
Flag.
If the Hex-Remark-Escape feature is disabled, the Backslash will be treated as an ordinary nonhexadecimal character, and Hex-Sequence parsing will proceed. Any hexadecimal characters on the remainder of the line -- which would otherwise have been ignored -- will be recognized and incorporated into the result.
Specifically, \n and \t are translated into New-Line and Horizontal Tab respectively.
If the Backslash is followed by other characters, the Tokenizer will attempt to read them as a digit-string, using the current base, and create a numeric byte. The numeric sequence ends with the first non-numeric character (which is treated as a delimiter and consumed, unless it's a double-quote, in which case it's allowed to terminate the string or apply whatever action is triggered by the character following it). If the value represented by the numeric sequence exceeds the size of a byte, its low-order byte will be used and a WARNING will be issued.
If the first character after the backslash was non-numeric, the character will be used literally (and a WARNING will be issued). As in C, the backslash can be used to escape itself.
The User can disable this feature by invoking the noC-Style-string-escape Special-Feature Flag.
To complicate matters, there are two
distinct styles in which this
macro is used in FCode drivers, "Apple" style and "Sun" style:
In Sun Style, the sequence, in the Source, would look like this:
<Condition> ABORT" Message text"
Semantically, it would mean that if the <Condition> is true, the Message text would be printed and a -2 THROW would be performed; conversely, if the <Condition> is false, the Message text would be bypassed and execution would continue with the next token after.
The sequence could be translated into FCode as a macro like this:
In Apple Style, the Source supplies the surrounding IF
... THEN
. The action of the ABORT"
command is to leave
the Message
text on the stack and perform the -2
THROW unconditionally, with the expectation that the system CATCH
will print the string
it finds on the stack.
The Source sequence would look like this:
<Condition> IF ABORT"
Message text" THEN
The ABORT" ... " portion of the sequence would be translated into FCode as a macro like this:
Because the ABORT" command is not specified in the Standard, the User can disable it by invoking the noABORT-Quote Special-Feature Flag.
The User who chooses to enable this feature, can, further, select to
disable "Sun" style in favor of "Apple" style by invoking the noSun-ABORT-Quote
Special-Feature
Flag.
And to complicate matters even further, some Legacy applications prefer to use the ABORT command (note there's no quote) in place of the -2 THROW sequence. Although the ABORT command is not recommended, it is a legitimate FCode function, and this Tokenizer supports a Special-Feature Flag, called Abort-Quote-Throw, which controls whether an ABORT" (Abort-Quote) phrase will be tokenized with the -2 THROW sequence or with the ABORT function. The User who chooses to have ABORT" (Abort-Quote) phrases tokenized with the ABORT function can do so by invoking noAbort-Quote-Throw
Occasionally, a User needs to create a
numeric constant whose value corresponds to a short sequence of
characters. For instance, PCIR
will get coded as h#
50434952. This tokenizer supports a convenient directive
called a#
, syntactically similar to h#
d# and o#
, which makes the conversion directly. The above example can be
written: a#
PCIR , sparing the programmer -- and the maintainer -- from
needing to translate ASCII on the fly.
The a#
operator expects
its target argument on the same line.
If the target-sequence contains more than
four characters, the last four will become the number; if the
target-sequence contains fewer than four characters, they will
fill the low-order part of the number. (I.e., the operation
of a#
is right-justified.) Thus:
Also, the conversion is case-sensitive: a# cpu is equivalent to h# 00637075
Syntactically similar to the Standard [']
("Bracket-Tick-Bracket"). Valid targets for F[']
are the same as for [']
or '
. Attempts to apply F[']
to an invalid target will be handled similarly.
This directive acquires the given word's FCode-Token number, which is then used according to whether the directive is invoked during "Normal Tokenization" or "Tokenizer-Escape" mode:
The given word's FCode-Token number is
tokenized as a literal, which can be used, for instance, as the
argument to a get-token
or set-token
command.
This function is the one exception to the general rule about the scope of words recognized in
"Tokenizer-Escape" mode; it will recognize function-names that
were defined during normal tokenization mode and that were current at the time "Tokenizer-Escape"
mode was entered.
The given word's FCode-Token number is
pushed
onto the data-stack, from whence it can be used, for instance, as the
numeric argument to a constant
definition.
The filename that follows the fload command or the encode-file directive may be an absolute path, a path relative to the Current Working Directory, or a path relative to one of the directories in the Include-List. It may also contain Shell Environment Variables and related expressions recognized by the Host Operating System environment in which the Tokenizer is running. These will all be expanded before loading the file. An illustration may be seen in Example #4.
An ADVISORY
message showing the expanded value will be printed if the verbose
option has
been selected or in the event of a failure to read the file.
This Tokenizer supports a directive that allows the User to define additional macros. Its syntax is:
[macro] <macroname> cmnd1 cmnd2 cmnd3 ... cmndN
The entire body of the macro definition must be contained on a
single line. The linefeed at the end of the line will be included
as part of the macro definition.
In this Tokenizer, macros are implemented as simple
string
substitutions, interpreted at the time they are invoked. If a
component of the macro should change its meaning -- i.e., be redefined
-- then, on subsequent invocations of the macro, the new meaning will
take effect. (Note that this is different from an alias). It is also (eminently) possible to
define a macro
that uses a name that has not been defined at the time the macro is
defined, to define that name later, and to invoke the macro after
the name has been defined. This is legitimate and will work.
For the same reason, macros may be nested; i.e., one macro may be defined in such a way as to invoke another. However, if a macro -- or a series of nested macros -- were to invoke a macro that is already running, that will be detected and reported as an Error condition. For a simple example, a User who needed to identify all occurrences of the word 2drop might attempt to write:
However, because the macro definition is not "compiled" in the same way as a colon-definition, but is, instead, interpreted at run-time, the 2drop that would be executed would, in fact, be the macro itself, leading to an infinite loop of messages (if the condition were not detected...). In order to protect against this condition, the User should, instead, do something like this:
This has the added advantage that, when the passage in which the
notification is needed comes to an end, the User can restore 2drop
to its Standard behavior with:
A User-Defined Macro takes on the Scope that is current at the time it is created. An illustration may be seen in Example #4.
In particular, an attempt within
one device-node to access directly
a method defined in another device-node must be flagged as an
error. Consider
what would happen at run-time if it were allowed: the called
method
would be expecting the instance-pointer to be pointing to the instance
data of the device-node in which that method was defined, but it would,
instead,
be pointing to the instance data of the device-node that made the
call.
This is an invitation to havoc that would be -- to put it politely --
somewhat difficult to trace.
The correct way to invoke a method across device-node boundaries is
via $call-parent
or $call-method
or the like.
In order to detect such
errors early on, this Tokenizer keeps track
of separate but linked "vocabularies"
associated with device-nodes. When the command new-device
is encountered in interpretation mode, a new device-node vocabulary is
opened and new
definitions are entered into it. Definition-names created in the
preceding device-node -- presumably the parent of the newly started
device -- are suspended from accessiblity.
Correspondingly, when the finish-device
command is encountered in interpretation mode, the "vocabulary" of the
device
being ended is emptied ("forgotten" in classic Forth parlance) and the
"vocabulary" of the parent-device is resumed.
The device-node vocabulary to
which definitions are being entered at
any given time, and from which definitions are accessible, may be
referred to as the current
device-node for purposes of discussion.
Note that the Tokenizer does not switch vocabularies when the new-device or finish-device commands are encountered in compilation mode (i.e., when they are being compiled-in to a method); they are treated as ordinary tokens, since the shift to a new device-node will not occur until run-time.
The commands new-device and finish-device must remain in balance. If a finish-device is encountered without a prior corresponding new-device, or if the end of FCode (or a Reset-Symbols directive issued in "normal" mode) is reached and not all occurrences of new-device are balanced by a call to finish-device, it will be reported as an Error condition.
Definitions made in "Tokenizer-Escape" mode, however, are
independent of device-node vocabularies and remain accessible until
they
are explicitly reset by a Reset-Symbols
directive issued in "Tokenizer-Escape" mode.
When the fcode-end
(or equivalent) that ends one body of code is processed, and before
the fcode-version<n>
that begins the next, the definitions that had been created are
forgotten, but assignment of FCode-token numbers will continue in
sequence. Likewise, definitions made in "Tokenizer-Escape" mode
will persist.
The User who desires to reset one or the other of these, or both,
can do so by issuing the directives:
The second example illustrates the use of
Special-Feature
Flags to select or de-select
specific non-standard
features.
The User is developing code that will run across all platforms, and therefore must must be neutral with regard to "Sun"- or "Apple"- -style usage of ABORT" ; this can be best achieved by disallowing the use of ABORT" altogether.
There is no concern, however, about compatibility of the Source with other Tokenizers, so the User need not forgo the conveniences of Local Values and String-remark-escapes.
Furthermore, the Source contains many passages taken from IBM Legacy sources, and the User does not wish to see a WARNING message when the Legacy Locals Separator is used.
The command-line for these conditions would include the following:
toke -f NOabort-quote -f local-values -f NOlv-legacy-warningAlternatively, these flags may be set from within the Source code thus:
Note that the invocation of -f
local-values is necessary, as its default state is to be
disabled.
Also, because -f
NOabort-quote is invoked, the setting of the Sun-ABORT-Quote
flag is irrelevant.
The third example illustrates two situations where two or three
FCode blocks
are incorporated into a single PCI Image.
In the first situation, the "Outer" block will byte-load
the code of
the "Inner" block before the "Outer" block has been completely
interpreted. It is therefore important to avoid collisions in
FCode-token numeric assignments.
In the second situation, the "Outer" block will have been completely
interpreted before it begins to byte-load
the code of the two "Inner" blocks. It can safely discard the
token-table of the interpreter, and allow its assigned FCode-token
numbers to be re-cycled. Furthermore, the large number of
definitions presents a real risk that the full range of usable
FCode-token numbers will be exhausted. For these reasons, the
User finds it necessary to reset the FCode-token numeric assignments to
their initial state before tokenizing the two "Inner" blocks.
This example serves to illustrate the use of the directives that control the Scope of Definitions, and also to show a means whereby the IBM-Style Local Values Support File can be incorporated at a Global level. Normally, that would be problematical because the Local Values Support functions are written to use instance data, in order to conserve use of System memory. By temporarily over-riding the definition of instance in the manner shown, the User has traded-off economy of System-memory for convenience of programming.
This example also offers an illustration of the use of Shell-Environment Variables in File-Names: Let us suppose that the directory in which the main file for the controller of the assembly whose driver being compiled here resides in a directory-tree, several layers under the root. Immediately below the root is a sub-directory called shared that contains shared functions and the Local Values support, and elsewhere under the tree are sharable bodies of driver code for device-modules that can be incorporated into various assemblies. Let us also suppose that the Makefile that governs this Tokenization process sets Environment Variables; the root of the tree is called DevRoot, and the directory in which the sharable bodies reside are called SCZ and SLZ respectively. And let us further suppose that the inclusion of these two subsidiary devices is optional, controlled by command-line symbol definitions.
This example serves to illustrate the use of the [function-name]
directive to create a series of "you are here" debugging messages.
We will create a pair of macros whose names we can cut'n'paste at the beginning and end of every function we want to give a "you are here" message, switchable by a local debug-flag. The macros will be globally defined, but will make reference to locally-defined names for the debug-flag and the device-node name string.