New Features in OpenBIOS Tokenizer toke

(A User's Guide)

Updated Wed, 18 Oct 2006 at 12:39 PDT by David L. Paktor

New Features in OpenBIOS Tokenizer toke (A User's Guide)

Overview
Scope of this Document
1. What this document does not cover:
Error Detection and other messages:
Case Sensitivity
Categories of Features
Features, by Category:
Examples:

Overview

The goal of this project is to produce an FCode Tokenizer that can both be used in-house and be presented to third-party vendors, VARs and the like, as a professional-quality tool for their use, without adversely affecting IBM's Intellectual-Property rights.

We are using, as a starting basis, the Open-Source tokenizer that is available from the OpenBIOS project .

We expect to be able to deliver this tool to our vendors by returning our modifications to the OpenBIOS project, from whence it can be obtained openly by anyone.

Scope of this Document

In order to achieve the first part of the goal -- making it suitable for in-house use -- a number of features were added that are not covered in the IEEE-1275 Standard for Boot Firmware (also referred to herein simply as "the Standard").

This document describes those features and how they are used.

There will be a brief overview of Error Detection and other messages.

There will be some examples at the end.

What this document does not cover:

Standard behavior, particularly conversion between ANSI / IEEE-1275 Standard Forth and IEEE-1275 Standard FCode, and standard Tokenizer Macros. These are expected and will be regarded as "presumed" by this document.
Changes to older versions of the OpenBIOS Tokenizer.
A complete list of Tokenization Source Errors detected and reported.
A complete list of conditions that generate WARNING or ADVISORY messages.
A complete list of FATAL conditions.
Support for IBM-Style Local Values. That is the subject of a separate writeup.

Error Detection and other messages:

The Tokenizer is capable of producing the following kinds of messages:

FATAL
ERROR
WARNING
ADVISORY
MESSAGE generated by the User.
TRACE-NOTE

A FATAL condition is sufficient cause to immediately stop activity. It is usually (but not always) a symptom of a system failure, rather than a result of User input.

An ERROR occurs as a result of User input. It is a condition sufficient to make the run a failure, but not to stop activity. Unless the -i ("Ignore Errors") Command-Line option has been specified, the production of a Binary Output file will be suppressed and theTokenizer will exit with a non-zero status, if an Error Message has been issued.

A WARNING is issued for a condition that, while not necessarily an error, might be something to avoid. E.g., a deprecated feature, or a feature that might be incompatible with other Standard tokenizers.

An ADVISORY message is issued for a condition that is a response to User input, and where processing continues unchanged, but it is nonetheless (in this author's opinion) worthwhile to give the User a "heads-up" to make sure that what you got is what you wanted. ADVISORY messages are only displayed when the verbose Command-Line option is selected.

A User-generated MESSAGE -- unsurprisingly -- is a message generated by the User via any of the directives supported for that purpose.

A TRACE-NOTE is issued if the Trace-Symbols feature is activated, whenever a symbol on the Trace-List is either created or invoked.

Each message of the above types is accompanied by the name of the source file and the line number in which the condition that triggered it was detected, as well as the current position to which data is being generated into the Binary Output. If a PCI Header is in effect, the position relative to the end of that PCI Header will also be shown; this is to maintain consistency with the "offsets" displayed by the DeTokenizer

The Tokenizer typically runs through to completion of the source file, displaying an ERROR message for each error it encounters (I.e., it does not "bail" after the first error). If the -i ("Ignore Errors") Command-Line option has been specified, the Tokenizer will attempt to produce binary output for each error, as far as is feasible, and will produce a Binary Output file. While this practice is not recommended, the author acknowledges that it might be useful in some limited circumstances.

At the end of its run, the Tokenizer will print a tally of the number of each type of message that was generated.

Case Sensitivity

Although Command-Line option switches are case-sensitive, this Tokenizer is insensitive to the distinctions between upper- or lower- -case characters in all other matters: in Command-Line symbol-definitions and Special-Feature Flag settings and in the Input Source.

Character-case is preserved in string sequences and in the assignment of names of headered definitions, but is ignored for purposes of name-matching. Case sensitivity of filenames, of course, is dependent on the Host Operating System.

This Tokenizer supports a pair of Special-Feature Flags that will enable the User to over-ride the preservation of character-case in the assignment of names of headered definitions.

Categories of Features

The features described in this document fall into four categories:

Directives
Command-Line options
Non-standard input, syntaxes, and behavior
Options available within “tokenizer-escape” mode (I.e., within the scope of a tokenizer[ ... ]tokenizer block).

Directives

Directives are commands that occur within the body of the Tokenization Source File, but are not part of the FCode program itself. They are recognized by the Tokenizer and cause it to perform special actions that might serve such diverse functions as:

Printing a message to the Standard Output during the Tokenization process
Controlling whether selected passages of the Source are processed or ignored (also called Conditional Compilation or Conditional Tokenization)
Outputting arbitrary sequences of bytes to the FCode binary file

Command-Line options

Command-Line options are, as the name suggests, switches that can be set on the command-line when the Tokenizer is invoked. They affect behaviors such as:

The verbosity of output during the Tokenization process
Definition of symbols that feed into some of the Conditional Tokenization Directives
Allowing output to be produced even when errors have been detected.
Selectively enabling variants of Non-standard input, syntaxes, and behavior

Non-standard input, syntaxes and behavior

Certain sequences of source code that are not covered by the IEEE-1275 Standard are, nonetheless, recognized by this Tokenizer and cause it to produce valid FCode. Examples of these include:

"C"-style escape characters in strings
Sun-style versus Apple-style implementation of the command ABORT"
IBM-style Local-Values

In order to accommodate the conflicting goals of standard-compliance and convenience, non-Standard variants will be selectively enabled by a set of Command-Line options known as "Special-Feature Flags"

“Tokenizer-Escape” Mode

Section C.3.1 of the IEEE-1275 Standard describes “tokenizer-escape” mode and its basic command, emit-byte The OpenBIOS Tokenizer toke, while it does not have a full FORTH capability, does support a few additional Forth-style commands that can be used to:

Define and retrieve named constants
Create aliases
Set the value of the next FCode-token

Features, by Category:

The following sections will supply a complete list of the new features, organized by category.

Command-Line options:

Command-Line option Switches are case-sensitive; their arguments, where applicable, are not.

Switches

-h or -?

Print a brief help message and then exit.

-v

Verbose -- print additional messages (including Advisories) during tokenization.

-i

Ignore Errors. Generate a Binary Output even if errors were reported.

-o <OutputFileName>

Direct the Binary Output (FCode result of Tokenization) to the named file instead of to the default-named file. This option is not valid when multiple input files are named.

-l

FLoad List -- Collect the names of floaded files into an FLoad-List File. The names collected are in the same form as they were presented in the fload statements.

The name of the FLoad-List File is derived from the name of the Binary Output File, by replacing its extension with .fl , or, if the Binary Output File name had no extension, merely appending the extension .fl

The Binary Output File name used for this purpose is either the one specified on the Command Line, or the one created by default.

-P (Note: Switch is upper-case)

Dependency List -- Collect the fully-resolved pathnames of floaded and ENCODEd files into a Dependency-List File. The names collected are in the form that is presented to the Host Operating System: Shell Environment Variables and related expressions will be fully expanded, and the directory within the Include-List in which the file was found will be attached.

The name of the Dependency-List File will be the same as that of the FLoad-List File, except that its extension will be .P instead of .fl

If an FLoad List or a Dependency List is being collected, the names of any files that cannot be read, for any reason, will be collected into a Missing-Files-List file.

The name of the Missing-Files-List file will be the same as that of the FLoad-List File except that its extension will be .fl.missing instead of .fl

The Missing-Files-List file will not be created if all of the files are read successfully.

If the name of the Binary Output File is changed by a directive embedded within the Tokenization Source File, that will not alter the names of the FLoad-List, Dependency List or Missing-Files-List files.

Include-List Directories

-I directory (Note: Switch is upper-case)

This Tokenizer supports the notion of an Include-List. The User creates the Include-List by specifying a number of -I directory pairs on the Command-Line. All file-reads, whether for an fload command or an encode-file directive, will involve a search for the named file through the directories of the Include-List, in the order they were supplied on the Command-Line.

If no Include-List is created, file-reads are relative to the Current Working Directory. If an Include-List is created, file-reads are restricted to the directories within it. For the Current Working Directory to be included in the file-search, it must be specified explicitly. -I. will accomplish that quite effectively.

Trace the Creation and Invocation of FCode or "Tokenizer Escape"-mode Definitions

-T <symbol> (Note: Switch is upper-case)

This Tokenizer supports the notion of a "Trace-List". The User creates the Trace-List by specifying a number of -T <symbol> pairs on the Command-Line.

When a name is defined, whether as an FCode, an alias, a Macro or anything else, either in normal tokenization mode or "Tokenizer Escape"-mode, if it matches a symbol that has been added to the Trace List, a Trace Note Message will be issued indicating that a definition of that name has been created. Subsequent Trace Note Messages will be issued when the definition of that name is invoked.

This "Trace-Symbols" feature can be helpful during maintenance of Legacy code, for instance, when multiple symbols carry the same name.

Command-Line Symbol Definitions

-d Symbol[=Value]

Define a Command-Line Symbol. Optionally, assign a value to it. If you wish the "value" to contain spaces or quotes, you can accomplish that using the shell escape conventions. This sequence may be repeated. Once a Symbol is defined on the command-line, it stays in effect for the duration of the entire batch of tokenizations (i.e., if there are multiple input files named on the command line). Command-Line Symbols can be tested for purposes of Conditional Tokenization, or their assigned values can be Evaluated.

Special-Feature Flags

-f [no]<FlagName>

The Tokenizer recognizes a specific set of Special-Feature Flag-names; each is associated with a specific non-Standard variant behavior. Pass the Flag-name as an argument to the -f switch to enable the behavior; to disable it, precede the Flag-name with the optional string No

The settings of the Special-Feature Flags can also be changed or displayed from within the Source Input File

The Special-Feature Flags are all initially set to be enabled, except where noted.

The Flag-names and their associated Special-Features are as follows:

Local-Values

Support IBM-style Local Values ("LV"s). Initially disabled.

LV-Legacy-Separator

Allow Semicolon for Local Values Separator ("Legacy").

LV-Legacy-Message

Display a Warning Message when Semicolon is used as the Local Values Separator.

ABORT-Quote

Allow ABORT" macro.

Sun-ABORT-Quote

ABORT" with implicit IF ... THEN

Abort-Quote-Throw

Use -2 THROW , rather than ABORT, in an Abort" phrase

String-remark-escape

Allow "\ (Quote-Backslash) to interrupt string parsing.

Hex-remark-escape

Allow \ (Backslash) to interrupt hex-sequence parsing within a string.

C-Style-string-escape

Allow the C-style String-Escape pairs \n \t and \xx\ to be treated as special characters in string parsing.

Always-Headers

Over-ride occurrences of the Standard directive headerless in the Source with -- effectively -- headers to make all definitions have a header. Occurrences of the directive external will continue to behave in the Standard manner. Initially disabled.

Always-External

All definitions will be made as though under the external directive; occurrences of either Standard directive headerless or headers in the Source will be over-ridden. This Special-Feature Flag will also over-ride the Always-Headers Special-Feature Flag in the event that both have been specified. Initially disabled.

Warn-if-Duplicate

Display a WARNING message whenever a definition is made whose name duplicates that of an existing definition. Disabling this flag will suspend the duplicate-names test globally, until it is re-enabled. A Directive is supported that will suspend the test for the duration of only a single definition, without affecting global behavior.

Obsolete-FCode-Warning

Display a WARNING message whenever an FCode function is invoked that the Standard identifies as "obsolete".

Trace-Conditionals

Issue Advisory Messages about the state of Conditional Tokenization. (Remember that Advisory Messages are displayed only if the "verbose" option -v is set.) Initially disabled.

Upper-Case-Token-Names
Lower-Case-Token-Names

When outputting the names of headered functions ("Token-Names") to the Binary Output File, over-ride the character-case in which the names appeared in the Source, and convert them to Upper- or Lower- -Case, respectively. (These flags do not affect text string sequences, whose character-case is always preserved.) Initially disabled.

Big-End-PCI-Rev-Level

Save the "Revision Level of the Vendor's ROM" field of the PCI Header in Big-Endian byte-order, rather than "Little-Endian" as per the general PCI Standard convention. (This flag does not affect any other field of the PCI Header). Initially disabled.

Ret-Stk-Interp

Allow Return-Stack Operations during Interpretation. While the Standard specifies that usage of the operators >r r@ and r> while interpreting is allowed, actual practice in the industry is inconsistent. Developers who wish to take a more cautious approach to this question can disable this flag so that any attempt to use the operators >r r@ and r> in the "interpreting" state will generate an ERROR Message.

Also, the pseudo-Flag-name help will cause a list of the Flag-names and their associated Special-Features to be printed.

The use of some of these flags is illustrated in Example #2

“Tokenizer-Escape” Mode

The directive tokenizer[ behaves as specified in Section C.3.1 of the IEEE-1275 Standard: it saves the current tokenizer numeric conversion radix, sets the radix to sixteen (hexadecimal) and enters “tokenizer-escape” mode. Likewise, the directive ]tokenizer restores the radix and resumes the Tokenizer’s normal behavior.

For convenience and compatibility with IBM's source-base, the directives f[ and f] are synonyms for tokenizer[ and ]tokenizer respectively. In addition, the variant ]f is a synonym for f] .

The numeric conversion radix can be changed during “Tokenizer-Escape” mode by the use of the standard directives hex decimal and octal . These will always change the numeric conversion radix in “tokenizer-escape” mode; even if “tokenizer-escape” mode was entered in the middle of a colon-definition, they will not issue an FCode sequence. And, as per the Standard, the numeric conversion radix will be restored when the Tokenizer returns to "Normal" mode.

Function-names that were defined during normal tokenization mode will not be recognized in “Tokenizer-escape” mode and vice-versa. The one exception is that F['] , even when invoked in “Tokenizer-escape” mode, will apply to function-names that were defined during normal tokenization mode.

The emit-byte command

This Tokenizer supports the emit-byte command as specified in the Section cited above. In order to be able to do that, the “tokenizer-escape” mode needs to be able to support a tokenization-time data stack, and, indeed, it does.

Other commands

In “tokenizer-escape” mode, a string representing a number (optionally preceded by one of the directives h# d# or o# ) causes that number to be pushed to the stack, where it is available for use by any of several other commands as follows:

Standard Commands

The next-fcode command

Set the value of the FCode-token-number assignment counter to the number that was on the stack. Subsequent fcode-functions defined in the source will be assigned FCode token numbers starting with this number. If the number supplied on the stack is outside the legal range specified by the IEEE-1275 Standard, the attempt will be disallowed and reported as an Error.

Note that this Tokenizer supports additional directives for manipulating the FCode-token-number assignment counter.

Non-Standard operations

constant <Name>

Define <Name> as a named constant with the value that was on the stack. <Name> will be known within “tokenizer-escape” mode but will not be recognized during normal tokenization. When <Name> is invoked, its value will be pushed onto the stack. Two named constants are pre-defined: true and false

emit-fcode

Emit the value that was on the stack as an FCode token, either one or two bytes long, depending on the value. Report an Error if the number supplied on the stack is outside the legal range specified by the IEEE-1275 Standard. Since this is sort of a "cheat", issue an Advisory if the operation is successful.

Additional FORTH-compatible operations

In addition to the above, “Tokenizer-escape” mode recognizes a limited number of FORTH-compatible named constants and operations, as follows:

TRUE Push -1 onto the stack
FALSE Push 0 onto the stack
0= Invert the Boolean-flag on top of the stack: replace 0 with TRUE or any non-zero number with FALSE
SWAP Exchange top two stack items
2SWAP Exchange top two pairs of stack items
noop Take no action.

These are different from the corresponding words in "Normal" mode, which would compile an FCode token. In “Tokenizer-escape” mode, they initiate an immediate action within the Tokenization process.

Directives

This Tokenizer supports a number of directives that are not specified by the Standard, but which serve functions as follows:

Conditional Tokenization
Suspending the Duplicate-Name Test for one Definition ("Overloading")
Suspending the Multi-Line Warning for one occasion
Controlling the Scope of Definitions
Evaluation of symbols defined on the command-line
Outputting arbitrary sequences of bytes to the FCode binary file
Encoding blocks of binary data taken from a file
Generating Special Text-Strings and Literals:

The current date or time.
The name of the function currently being defined
The Input-File Name and Line Number

Pre-pending PCI headers to FCode images
Modifying the PCI Header
Changing the name of the Binary Output File
Issuing a message at tokenization time
Over-riding the setting of a Special-Feature Flag
Displaying the settings of all the Special-Feature Flags
Saving, Restoring and Resetting the FCode-Token-Number Assignment Counter
Resetting the symbols defined either in Normal Tokenization Mode or in "Tokenizer Escape" mode.

Conditional Tokenization

The OpenBIOS Tokenizer toke supports a fully-nestable Conditional Tokenization capability comparable to the one in the "C" pre-processor. A Conditional Tokenization block consists of either one or two segments -- a "True" segment and an optional "False" segment -- delimited by Conditional Operators of three types:

the Condition-Tester
the optional "False" segment switcher
the Conditional-block terminator

The "True" segment immediately follows the Condition-Tester, and is ended either by the "False" segment switcher or by the Conditional-block terminator. If the "False" segment switcher is present, it introduces the "False" segment, which is ended by the Conditional-block terminator. The three delimiters (in reverse order) are:

The Conditional-block Terminator

Exit the current level of Conditional Tokenization.

If the Conditional-block in question was nested within another one, resume conditional processing at the level of the enclosing segment. When the outermost Conditional-block is exited, resume normal processing.

While it is a requirement that the Conditional-block terminator be contained within the same Input File as the Condition-Tester and its optional "False" segment switcher, the body of a Conditional Segment may contain separate fload directives, which will be processed or ignored in accordance with the prevailing Condition. An illustration can be seen in Example #4.

There are several synonyms for the Conditional-block terminator:

[then]

#then

[#then]

[endif]

#endif

[#endif]

The "False" segment-switcher

Reverses the sense of the condition and introduces the "False" segment.

If the Condition-Test resulted in "TRUE", then the first segment -- the "True" segment -- was processed and the "False" segment will be ignored. Conversely, if the Condition-Test resulted in "FALSE", then the first segment -- the "True" segment -- was ignored and the "False" segment will be processed.

There are three synonyms for this function:

[else]
#else
[#else]

Condition-Testers

Tokenization can be controlled according to the following conditions:

A True/False flag on the top of the stack
Existence or non-existence of an FCode or "Tokenizer Escape"-mode definition
Definition or non-definition of a Command-Line symbol.

True/False flag on the top of the stack

A number that was placed on the stack in “Tokenizer-Escape” mode will be consumed and tested. If the number was non-zero, the Condition-Test will result in "TRUE", and the first segment -- the "True" segment -- will be processed and the "False" segment, if present, will be ignored. Conversely, if the number was zero, the Condition-Test will result in "FALSE", and the first segment -- the "True" segment -- will be ignored and the "False" segment, if present, will be processed.

There is only one word for this function:

[if]

The [if] Operator can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, but the number to be tested must have been put on top of the Stack during "Tokenizer Escape" mode.

Existence or non-existence of an FCode or "Tokenizer Escape"-mode definition

A name-string must follow the directive on the same line. A search for that name is conducted through the word-list of the mode -- i.e., "Tokenizer Escape" mode versus "Normal" mode -- and the Scope -- i.e., "Global" versus "Current Device-Node" -- in which the Tokenizer is currently operating.

If the directive is for the existence of the definition, and the name is found, or if the directive is for the non-existence of the definition, and the name is not found, the Condition-Test will result in "TRUE". Otherwise, the Condition-Test results in "FALSE".

The synonyms for the test for existence of a definition are:

[ifexist]
#ifexist
[#ifexist]
[ifexists]
#ifexists
[#ifexists]

(Note the variants with and without a final 's')

The synonyms for the test for non-existence of a definition are:

[ifnexist]
#ifnexist
[#ifnexist]

(Note that variants with a final 's' didn't make sense here.)

Definition or non-definition of a Command-Line Symbol

As noted above, the Tokenizer recognizes the syntax

-d Symbol[=Value]

as the means by which to define a Command-Line Symbol and optionally assign a value to it.

A pair of Condition-Tester directives are supported that will test, respectively, for definition or non-definition of a named Symbol. Their operation is similar to that of the Tests for existence of non-existence of a Definition, in that a name-string must follow the directive on the same line, but they are different in the matter of where the search for the name string is conducted: these directives search the list of Command-Line Symbols. The relation between the type of directive and the result of the search is also similar to that of the Tests for existence of non-existence of a Definition.

The synonyms for the test for definition of a Command-Line Symbol are:

[ifdef]
#ifdef
[#ifdef]

The synonyms for the test for non-definition of a Command-Line Symbol are:

[ifndef]
#ifndef
[#ifndef]

The test is simply to verify the existence of a Symbol; it is not sensitive to whether or not the optional value was assigned.

Tracing the state of Conditional Tokenization

The nesting and un-nesting of Conditional Tokenization can become complicated; an floaded file may contain a segment-switcher or Conditional-block terminator that interacts with the prevailing Condition-Tester that originated in another Source file, which might lead to unexpected results. To help trace the status of Conditional Tokenization, the User can invoke the Trace-Conditionals Special-Feature Flag together with the"verbose" option ( -v ). This will cause an Advisory Message to be displayed whenever a Conditional Operator is encountered, even if it is in a segment that is already being ignored. The message will indicate the location of the associated Condition-Tester (or the state of the condition, if the Conditional Operator is a Condition-Tester) and whether the segment is being Processed or being Ignored.

Suspending the Duplicate-Name Test for one Definition ("Overloading")

When a definition is created whose name duplicates that of an existing definition, default behavior is to issue a WARNING notification.

Intentionally creating such a duplicately-named definition is called "overloading" the name. This may be required, for instance, to supplant future invocations of the named function with a version that incorporates the earlier instance of the function of the same name and supplies additional behavior.

When this is intentional, no warning should be issued. This Tokenizer supports a directive called overload which the User may invoke to bypass the duplicate-name test for the duration of only one definition that follows it (as contrasted with suspending the test globally). An illustration of its use may be seen in Example 4.

Suspending the Multi-Line Warning for one occasion

The Standard allows Comments and Strings that extend across multiple lines, terminated by their specific delimiter. In addition, this Tokenizer allows User-generated Messages and Local-Values Declarations that, likewise, extend across multiple lines. Because of the potential for a cascade of errors that can be caused by a missing delimiter, this Tokenizer issues a WARNING notification whenever such an entity is encountered. Example #5 shows an occasion where this might be helpful.

When this is intentional, no warning should be issued. This Tokenizer supports a directive called multi-line which the User may invoke to suppress the notification for the next multi-line item that follows it.

Controlling the Scope of Definitions

The User may wish to create a collection of utility functions that will be available to all the device-nodes in a multiple-node driver. Directly accessing such functions across device-nodes can be risky if any component uses instance data. However, if the use of instance data is precluded, such access can be performed safely.

This Tokenizer supports a pair of directives for setting the Scope of Definitions. The one called global-definitions will cause all subsequent definitions -- Aliases and Macros as well as FCode -- to be entered into the same vocabulary -- referred to as the "core" -- from which all the Standard functions are drawn. The other, called device-definitions , will cause the Scope of Definitions to revert to the current device-node. The use of these directives is illustrated in Example #4.

While Global Scope is in effect, the use of the word instance will not be allowed, nor will definitions made in any device-node be recognized. Conversely, an attempt to enter Global Scope while instance is in effect will be reported as an Error condition.

Evaluating Command-line Symbols

A Directive is supported that can be used to extract the optional value that was assigned to a Command-Line Symbol.

The Symbol name must appear on the same line as the directive.

If the Symbol name is not found or no value has been assigned to it, a WARNING will be issued.

Otherwise, the associated value will be interpreted as though it were source-code.

The synonyms for the directive to evaluate a Command-Line Symbol's assigned value are:

[defined]
#defined
[#defined]

Outputting Arbitrary Byte-Sequences to the FCode Binary

In addition to the Standard emit-byte command, this Tokenizer supports a directive called FLITERAL , which will emit the number popped from the top of the Data Stack as a literal to the FCode Binary, as though the number were invoked during normal operation.

The FLITERAL directive can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, but the number to be emitted must have been put on top of the Stack during "Tokenizer Escape" mode.

Encoding blocks of binary data taken from a file

The directive ENCODE-FILE takes a file-name as an argument on the same line. It translates the contents of the file into a series of strings interspersed with encode-bytes and encode+ commands, resulting in a block of byte-encoded binary data.

Generating Special Text-Strings and Literals

This Tokenizer supports a set of directives that will present a string or a literal for processing as though it were part of the Tokenization Source File. Except as otherwise noted, these directives may be invoked in either "Normal" or "Tokenizer-Escape" mode: in "Normal" mode, the string produced will be compiled-in as an in-line text string; in "Tokenizer-Escape" mode, it will be displayed as a User-generated MESSAGE.

Current date or time

The [fcode-date] directive produces a string that consists of the current date, formatted as mm/dd/ccyy .

The [fcode-time] directive produces a string that consists of the current time, formatted as hh:mm:ss ZZZ (where ZZZ represents the local Time-Zone).

An illustration is included in Example 4.

Name of the Function currently being defined

The [function-name] directive, when invoked in "Normal" mode, produces an in-line string that consists of the name of the function (colon-definition) most recently -- or currently being -- defined. It will persist until a new colon-definition is started. This can be useful for embedding interpetation-time or run-time debugging-statements into or after functions' definitions. In "Tokenizer-Escape" mode, it will display the function name in more detail as a User-generated MESSAGE. Example 5 illustrates its use in "Normal" mode, and has a small illustration of its use in "Tokenizer-Escape" mode.

Input-File Name and Line Number

The [input-file-name]directive produces an in-line string that consists of the name of the current Input File.

The [line-number]directive produces an in-line numeric literal giving the Line-Number within the current Input File.

These directives can only be invoked in "Normal" mode; they are not supported -- or needed -- in "Tokenizer-Escape" mode, because any User-generated MESSAGE will print that information. An illustration is included in Example 5.

Pre-pending PCI Headers to FCode images

The directive pci-header , which can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, will cause a PCI Header to be generated to the Binary Output file, in front of the FCode Binary image. It takes three parameters from the Data Stack: Vendor ID as the third item on the Stack, Device ID as the second, and Class Code as the top item on the Data Stack. These must be put on the Stack during "Tokenizer Escape" mode.

An attempt to issue the pci-header directive after FCode output has begun will be reported as an Error condition.

The PCI Header cannot be completed until after the FCode Binary image has been completely tokenized. The directive to complete the PCI Header should be issued at the end of the process. Synonyms for this directive are:

pci-header-end
pci-end

Modifying the PCI Header

Directives are supported that will alter specific fields of the PCI Header. They can be issued at any time before the pci-end directive to specify the contents of certain fields of the PCI Data Structure as follows:

"Revision Level of the Vendor's ROM" (2 bytes at offset 0x12)
"Last Image Indicator" bit (High-order bit of 1 byte at offset 0x15)

The syntax for these functions is:

"Revision Level of the Vendor's ROM"

To set the "Revision Level of the Vendor's ROM" field of the PCI Header to the number popped from the top of the Data Stack, use any of these synonyms:

set-rev-level
pci-revision
pci-code-revision

These directives can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, but the number to be used must have been put on top of the Stack during "Tokenizer Escape" mode.

The default setting for this field is 1.

Byte-Order

Ordinarily, the "Revision Level of the Vendor's ROM" field of the PCI Header follows the general PCI Standard convention of "Little-Endian" byte-order. However, some organizations' legacy code saves this field in Big-Endian order. To accommodate the need for compatibility, this Tokenizer supports a Special-Feature Flag called Big-End-PCI-Rev-Level. Its default is no, i.e., to follow the PCI Standard.

"Last Image Indicator" bit

The "Last Image Indicator" bit can be specified in either of three ways:

Force it to be set to TRUE. Synonyms for this are:

last-image
last-img

Force it to be reset to FALSE. Synonyms for this are:

not-last-image
not-last-img

Set or reset it according to the number popped from the top of the Data Stack. Synonyms for this are:

set-last-image
set-last-img

If the number was zero, the "Last Image Indicator" bit will be reset to FALSE; if it was non-zero, the bit will be set to TRUE.

These directives can be invoked from either "Tokenizer Escape" mode or normal tokenization mode, but the number to be examined must have been put on top of the Stack during "Tokenizer Escape" mode.

The default setting for this field is TRUE.

Changing the name of the Binary Output File

By default, the name of the Binary Output File is derived from the name of the Source Input File, by replacing its extension with .fc , or, if the Input File name had no extension, merely appending the extension .fc

The name of the Binary Output File may be specified on the Command Line, as noted above.

The name of the Binary Output File may also be specified by a directive embedded within the Tokenization Source File. Synonyms for the directive are save-image and save-img ; the name follows after the directive on the same line.

The directive can be invoked from either "Tokenizer Escape" mode or normal tokenization mode. It does not cause an imediate action (i.e., saving the Binary Image), but merely alters the name of the file to which the image will be saved when tokenization is completed.

Note that this directive does not change the name of the FLoad-List File, if one has been specified.

Issuing messages at Tokenization time.

Directives are supported that will gather text and output it as a User-generated MESSAGE during Tokenization.

These directives can be invoked from either "Tokenizer Escape" mode or normal tokenization mode:

#MESSAGE" will parse text in string-collection mode, i.e., with sensitivity to the special escaped-characters, until a concluding Quote-followed-by-whitespace,

and the synonyms:

[MESSAGE]
#MESSAGE
[#MESSAGE]

will gather the text remaining on the line.

In addition, the commands ." (Dot-Quote) and .( (Dot-Paren), which have Standard meanings in normal tokenization mode, will, when invoked in "Tokenizer Escape" mode, collect text in their Standard manner and output it as a User-generated MESSAGE:

Dot-Quote will parse text in string-collection mode,
Dot-Paren will collect text until a balancing Close-Paren -- ) -- character.

Also, as noted earlier, the [fcode-date] and [fcode-time] directives, when invoked in "Tokenizer Escape" mode, will display the current date and time, respectively, as a User-generated MESSAGE:

Changing Special-Feature Flags from Source

A directive is supported that can be used from within the Tokenization Source File to over-ride the setting of any one of the Special-Feature Flags that might have been made from the Command Line. The name of the Flag -- with, of course, an optional leading No -- must follow after the directive on the same line. It can be invoked from either "Tokenizer Escape" mode or normal tokenization mode. Its synonyms are:

[FLAG]
#FLAG
[#FLAG]

The User is hereby admonished to exercise caution when using this directive. Not all combinations are meaningful, and automated error-checking is not feasible. An Advisory Message will be issued to remind the User of the change.

Displaying Special-Feature Flags from Source

A directive is supported that can be used from within the Tokenization Source File to display a MESSAGE with a list of the settings of all of the Special-Feature Flags. Its synonyms are:

[FLAGS]
#FLAGS
[#FLAGS]
SHOW-FLAGS

Manipulating the FCode-Token-Number Assignment Counter

In addition to the Standard next-fcode command, this Tokenizer supports additional directives that can be used to alter the normal sequence of FCode-token-number assignment.

In order to protect against unintended collisions in FCode-token numbers, which can cause severe errors at run-time, this Tokenizer will report an Error the first time an FCode-number assignment overlaps FCodes that were assigned in a different range.

Because the programmer may choose to "recycle" FCode-token-numbers intentionally, a directive is supported that clears the records of FCode-number assignments and resets the FCode-token-number assignment counter to its inital value.

Saving and Restoring

The directive FCode-Push can be used to save the value of the FCode-token-number assignment counter on the tokenization-time data stack . Later, the saved value can be restored by invoking the FCode-Pop directive. Note that FCode-Pop is functionally equivalent to next-fcode except that FCode-Push and FCode-Pop can be invoked from either "Tokenizer Escape" mode or "Normal" mode.

Resetting

The directive FCode-Reset will re-initalize the value of the FCode-token-number assignment counter and clear the records of previous FCode-number assignments. Note that it is not equivalent to:

F[ h# 800 next-fcode F]

which merely re-initalizes the value of the FCode-token-number assignment counter, but does not prevent the first assignment of this FCode-number from being regarded as overlapping an earlier assignment and therefore being reported as an Error.

The FCode-token-number assignment counter and the records of previous FCode-number assignments will also be automatically re-initalized when a new PCI-Image block is started.

Resetting Symbols Defined in Either Mode

Symbol names defined in "Normal" mode persist until a device-node is finished, or until the end of an FCode-block or of a PCI Image.

Symbol names defined in "Tokenizer Escape" mode persist throughout the entire run of the Tokenizer.

The directive Reset-Symbols enables the User to delete symbol-definitions at any other time, should that be required. Reset-Symbols is sensitive to the the mode in which the Tokenizer is operating: invoked during "Tokenizer Escape" mode, it will cause only the definitions made under "Tokenizer Escape" mode to be deleted; during "Normal" mode, only those made under "Normal" mode. Example #1 contains an invocation in "Tokenizer Escape" mode.

Non-standard input, syntaxes, and behavior

In this section, we will discuss:

This Tokenizer's handling of conditions the Standard calls "ambiguous"
Non-Standard synonyms for Standard functions
Non-Standard functions or behaviors that are supported by this Tokenizer.
Featured behaviors, supported by this Tokenizer, that are not mentioned in the Standard.

Ambiguous Conditions

The Standard only identifies a few questionable combinations as "ambiguous conditions" or "behavior ... unspecified" and leaves their handling open to interpretation of the implementor. Here's how this Tokenizer handles them:

Tokens expected on the same line

Certain commands and directives expect a token to appear on the same line, and will report an Error condition if no token appears on the same line:

The h# d# and o# directives, and the a# and al# directives.
The [ifexist] [ifnexist] [ifdef] [ifndef] and [defined] directives and all their synonyms
The commands ['] and '
The F['] directive
The ALIAS directive (expects two tokens on the same line)
The [macro] directive (expects at least two tokens on the same line)
The directives control ascii char and [char]
The FLOAD command
The SAVE-IMAGE directive
The [FLAG] directive and its synonyms
The ENCODE-FILE directive

The h# d# and o# directives

The directives h# d# and o# convert the token immediately following to a number, in hex, decimal or octal, respectively.

If the token is present, but cannot be converted to a number using the appropriate radix, i.e., if the conversion fails, the Tokenizer will issue a WARNING, ignore the directive and attempt to process the token as ordinary input. If the name is not known, the "ordinary processing" will catch the error.

The commands leave ?leave or unloop outside of a loop-control framework

The Standard only speaks briefly about the execution consequences of this condition, and says nothing about whether a Tokenizer should permit such a sequence. This Tokenizer will report it as an Error condition.

The commands ['] or ' followed by a name that is not a valid target

Valid targets for the commands ['] and ' are function-names that were defined during normal tokenization mode. This Tokenizer reports an Error if the name following the ['] or ' is not a valid target (e.g., is the name of a macro). It will then attempt to process the name in the ordinary manner.

The Forth word to followed by a name that is not a valid target

The Standard specifies the Forth word to as the means for setting the value of a word defined with either the value or defer defining-words, but says nothing about how a Tokenizer should respond to attempts to apply it to other targets. This Tokenizer reports an Error if the name following the to is not a valid target (e.g., is the name of a macro or a colon-definition).

An exception is made for a word defined as a variable : Although such words are not a valid target for to, many platforms' firmware will execute the sequence correctly; this Tokenizer will issue a WARNING.

The word instance

Followed by an inapplicable defining-word

The Standard implies (but does not state) that, once instance has been executed, it remains in effect until an applicable defining-word ( value , variable , defer or buffer: ) is encountered. For example, in the sequence:

instance : nonsense dup swap drop ;
variable boombah

the colon-definition of nonsense is unaffected by the occurrence of instance , which will, instead, be applied to the defining-word variable (which defines boombah ) on a later line.

Since instance would typically be followed immediately by the defining-word to which it is intended to apply, a sequence like the above can reasonably be presumed to be a likely error. This Tokenizer will issue a WARNING when an inapplicable defining-word is encountered while instance is in effect, and another WARNING when the dangling instance is finally applied to a valid definer.

Left unresolved

Multiple occurrences of instance are not cumulative; if a second instance is encountered before the first is applied, only one occurrence of an applicable defining-word will be modified. This Tokenizer will issue a WARNING when a second occurrence of instance is encountered before the first has been applied.

The Standard says nothing about what should occur if instance has not been applied by the time a device-node is "finish"ed or a new device-node is started. In most Platforms' implementations of the FCode Interpreter, the instance state will remain in effect. This Tokenizer will issue a WARNING the first time a device-node is changed while an instance remains unresolved.

Not Allowed in...

Although the Standard says nothing about the subject, an inclusion of instance inside a Colon-definition would serve no purpose and is potentially harmful; instance should not be allowed in a Colon-definition. This Tokenizer will detect an attempt to compile instance into a Colon-definition and report it as an Error condition.

Also, instance can not be allowed when Global Scope is in effect, as explained elsewhere.

The command fload inside a colon-definition

If the command fload is encountered while a colon-definition is in progress, this Tokenizer will process it immediately, and read the contents of the floaded file as a continuation of Source input.

Non-Standard Synonyms for Standard Functions

This Tokenizer accepts the following non-Standard words as synonyms for the Standard words, as listed:

ENDIF as a synonym for THEN
NAME as a synonym for DEVICE-NAME

Non-Standard Functions or behaviors

This section will discuss features supported by this Tokenizer that extend capabilities regarded by the Standard as unsupportable, or that are left only vaguely specified as "implementation-dependent". It will also present a few minor items that are introduced as conveniences for the User.

Alias

In normal tokenization mode the alias directive behaves as specified in the Standard, i.e., it creates a new command with the exact behavior of an existing command. Any occurrence of the new command will cause the assigned FCode of the old command to be generated, with all the implications that follow from that.

The Standard, further, states that: "In FCode source, alias cannot be called from within a colon definition." However, this Tokenizer can handle that, and issues a WARNING message when it does so.

The Standard does not specify the behavior of alias in "Tokenizer Escape" mode. This Tokenizer will allow alias commands issued in "Tokenizer Escape" mode to take effect in that mode, and, furthermore, aliases to words that are recognized in either mode may be created in either mode and will be recognized in either mode.

Generally speaking, an alias definition will take on the Scope that is current at the time it is made: If Device Scope is in effect, the new name -- even if it is an alias to a word that has Global scope -- will only be accessible from the current device-node. An ADVISORY message will be issued for this condition.

String-Escape characters and other String-Gathering Features

The Standard specifies, in effect, that a string is terminated by a double-quote (the " character) when it is immediately followed by whitespace; that if the " is followed by an open-parenthesis, hex-sequence parsing begins until a close-parenthesis is found; and that if the " is followed by any other character, "the result is implementation-dependent".

Quoted String-Escape Characters

Common practice -- widely enough used as to merit consideration as an undocumented de-facto standard -- has been to recognize a set of letters and to translate them into special characters when they immediately follow the " .

This Tokenizer translates String-Escape Quoted-pairs as follows:

"n New-Line

"l New-Line

"r Carriage-Return

"t Horizontal Tab

"f Form-Feed

"b Backspace

"! Bell

"^L Quote-Caret followed by a letter is translated as "Control"-the-letter.

"Other Any unrecognized character following the " is taken verbatim.

"" The way to embed a double-quote into a string is to escape it with itself. (This is a special instance of the preceding rule)

"( As was mentioned above, Quote-Open-Parenthesis begins parsing a hex-sequence as per Section A.2 of the Standard under the description of the " operator. Details will be discussed below.

"\ Quote-Backslash permits insertion of remarks into the middle of a string definition; it will interrupt string parsing, causing the remainder of the line, together with any whitespace that might begin the new line, to be ignored.
Because this feature is not in usual practice, the User can disable it by invoking the noString-remark-escape Special-Feature Flag.

If this feature is disabled, the Backslash following the " will be taken verbatim; the Backslash, the text following it on the remainder of the line, the new-line and the whitespace on the next line -- all of which would otherwise have been ignored -- will be included in the string parsing and incorporated into the result.

Embedded New-Lines

The Standard makes no mention about what to do when a string reaches a new-line before its termination. This Tokenizer continues parsing and includes the new-line, together with any whitespace that might begin the new line, in the result.

Hex-Sequence Processing

The Standard mentions parsing hexadecimal characters in pairs in the substring between "( and ) but makes no mention of what to do when a single hexadecimal character appears between nonhexadecimal characters. The usual practice is to treat it as representing the low-order digit of a single byte, and this Tokenizer follows that convention.

This Tokenizer also supports an option -- not in usual practice, hence listed separately here -- that permits insertion of remarks in the middle of a hex-sequence in a string. The occurrence of a single \ (backslash) character will interrupt hex-sequence parsing, causing the remainder of the line, together with any whitespace that might begin the next line, to be ignored. The User can disable this feature by invoking the noHex-remark-escape Special-Feature Flag.

If the Hex-Remark-Escape feature is disabled, the Backslash will be treated as an ordinary nonhexadecimal character, and Hex-Sequence parsing will proceed. Any hexadecimal characters on the remainder of the line -- which would otherwise have been ignored -- will be recognized and incorporated into the result.

C-Style String-Escape Characters

In addition to the Quoted String-Escape sequences, this Tokenizer recognizes a small subset of the Backslash-based String-Escape pairs that are used in C and translates them similarly.

Specifically, \n and \t are translated into New-Line and Horizontal Tab respectively.

If the Backslash is followed by other characters, the Tokenizer will attempt to read them as a digit-string, using the current base, and create a numeric byte. The numeric sequence ends with the first non-numeric character (which is treated as a delimiter and consumed, unless it's a double-quote, in which case it's allowed to terminate the string or apply whatever action is triggered by the character following it). If the value represented by the numeric sequence exceeds the size of a byte, its low-order byte will be used and a WARNING will be issued.

If the first character after the backslash was non-numeric, the character will be used literally (and a WARNING will be issued). As in C, the backslash can be used to escape itself.

The User can disable this feature by invoking the noC-Style-string-escape Special-Feature Flag.

The ABORT" (Abort-Quote) command

The Standard is not very clear about the use of this FORTH command, and does not assign it an FCode. Nonetheless, it has historically been treated as a macro, and this Tokenizer will support that treatment.

To complicate matters, there are two distinct styles in which this macro is used in FCode drivers, "Apple" style and "Sun" style:

In Sun Style, the sequence, in the Source, would look like this:

<Condition> ABORT" Message text"

Semantically, it would mean that if the <Condition> is true, the Message text would be printed and a -2 THROW would be performed; conversely, if the <Condition> is false, the Message text would be bypassed and execution would continue with the next token after.

The sequence could be translated into FCode as a macro like this:

<Condition> IF " Message text" TYPE -2 THROW THEN

In Apple Style, the Source supplies the surrounding IF ... THEN . The action of the ABORT" command is to leave the Message text on the stack and perform the -2 THROW unconditionally, with the expectation that the system CATCH will print the string it finds on the stack.

The Source sequence would look like this:

<Condition> IF ABORT" Message text" THEN

The ABORT" ... " portion of the sequence would be translated into FCode as a macro like this:

" Message text" -2 THROW

Because the ABORT" command is not specified in the Standard, the User can disable it by invoking the noABORT-Quote Special-Feature Flag.

The User who chooses to enable this feature, can, further, select to disable "Sun" style in favor of "Apple" style by invoking the noSun-ABORT-Quote Special-Feature Flag.

And to complicate matters even further, some Legacy applications prefer to use the ABORT command (note there's no quote) in place of the -2 THROW sequence. Although the ABORT command is not recommended, it is a legitimate FCode function, and this Tokenizer supports a Special-Feature Flag, called Abort-Quote-Throw, which controls whether an ABORT" (Abort-Quote) phrase will be tokenized with the -2 THROW sequence or with the ABORT function. The User who chooses to have ABORT" (Abort-Quote) phrases tokenized with the ABORT function can do so by invoking noAbort-Quote-Throw

Conveniently Convert Short Character-Sequence to Number

Occasionally, a User needs to create a numeric constant whose value corresponds to a short sequence of characters. For instance, PCIR will get coded as h# 50434952. This tokenizer supports a convenient directive called a# , syntactically similar to h# d# and o# , which makes the conversion directly. The above example can be written: a# PCIR , sparing the programmer -- and the maintainer -- from needing to translate ASCII on the fly.

The a# operator expects its target argument on the same line.

If the target-sequence contains more than four characters, the last four will become the number; if the target-sequence contains fewer than four characters, they will fill the low-order part of the number. (I.e., the operation of a# is right-justified.) Thus:

a# CPU is equivalent to h# 00435055

and

a# LotsOfStuff is equivalent to a# tuff or h# 74756666

Also, the conversion is case-sensitive: a# cpu is equivalent to h# 00637075

Left-Justified

To accommodate situations that call for the characters to occupy the high-order part of the number, this Tokenizer supports an "Ascii-Left" conversion directive called al# , which is equivalent to a# in all respects except when the target-sequence contains fewer than four characters: al# CPU is equivalent to h# 43505500

F['] ("Eff-Bracket-Tick-Bracket")

Syntactically similar to the Standard ['] ("Bracket-Tick-Bracket"). Valid targets for F['] are the same as for ['] or ' . Attempts to apply F['] to an invalid target will be handled similarly.

This directive acquires the given word's FCode-Token number, which is then used according to whether the directive is invoked during "Normal Tokenization" or "Tokenizer-Escape" mode:

F['] in "Normal Tokenization" Mode:

The given word's FCode-Token number is tokenized as a literal, which can be used, for instance, as the argument to a get-token or set-token command.

For example, the sequence:

F['] rl@ get-token

is equivalent to:

h# 234 get-token

F['] in "Tokenizer-Escape" Mode:

This function is the one exception to the general rule about the scope of words recognized in "Tokenizer-Escape" mode; it will recognize function-names that were defined during normal tokenization mode and that were current at the time "Tokenizer-Escape" mode was entered.

The given word's FCode-Token number is pushed onto the data-stack, from whence it can be used, for instance, as the numeric argument to a constant definition.

Shell-Environment Variables in File-Names

The filename that follows the fload command or the encode-file directive may be an absolute path, a path relative to the Current Working Directory, or a path relative to one of the directories in the Include-List. It may also contain Shell Environment Variables and related expressions recognized by the Host Operating System environment in which the Tokenizer is running. These will all be expanded before loading the file. An illustration may be seen in Example #4.

An ADVISORY message showing the expanded value will be printed if the verbose option has been selected or in the event of a failure to read the file.

Featured behaviors, supported by this Tokenizer, that are not mentioned in the Standard

User-Defined Macros

The Standard mentions built-in tokenizer macros (such as 3drop for example) but makes no provision for the User to define any additional macros.

This Tokenizer supports a directive that allows the User to define additional macros. Its syntax is:

[macro] <macroname> cmnd1 cmnd2 cmnd3 ... cmndN

The entire body of the macro definition must be contained on a single line. The linefeed at the end of the line will be included as part of the macro definition.

In this Tokenizer, macros are implemented as simple string substitutions, interpreted at the time they are invoked. If a component of the macro should change its meaning -- i.e., be redefined -- then, on subsequent invocations of the macro, the new meaning will take effect. (Note that this is different from an alias). It is also (eminently) possible to define a macro that uses a name that has not been defined at the time the macro is defined, to define that name later, and to invoke the macro after the name has been defined. This is legitimate and will work.

For the same reason, macros may be nested; i.e., one macro may be defined in such a way as to invoke another. However, if a macro -- or a series of nested macros -- were to invoke a macro that is already running, that will be detected and reported as an Error condition. For a simple example, a User who needed to identify all occurrences of the word 2drop might attempt to write:

 overload [macro]  2drop  message" 2DROP called here" 2drop

in the expectation that the second 2drop would be interpreted as the generic one, which outputs the corresponding token.

However, because the macro definition is not "compiled" in the same way as a colon-definition, but is, instead, interpreted at run-time, the 2drop that would be executed would, in fact, be the macro itself, leading to an infinite loop of messages (if the condition were not detected...). In order to protect against this condition, the User should, instead, do something like this:

\   
Keep a "generic" 2DROP handy.

alias
generic-2drop  2drop

overload [macro] 2drop message" 2DROP called here" generic-2drop

This has the added advantage that, when the passage in which the notification is needed comes to an end, the User can restore 2drop to its Standard behavior with:

overload  alias  2drop  generic-2drop

A User-Defined Macro takes on the Scope that is current at the time it is created. An illustration may be seen in Example #4.

Multiple device-nodes

The typical use of a tokenizer is to write a driver for a single-node device. For such an application, the commands new-device and finish-device are not required. However, many newer devices consist of complexes of subordinate and peer devices within a single assembly; some additional caution is required when developing drivers for such configurations.

In particular, an attempt within one device-node to access directly a method defined in another device-node must be flagged as an error. Consider what would happen at run-time if it were allowed: the called method would be expecting the instance-pointer to be pointing to the instance data of the device-node in which that method was defined, but it would, instead, be pointing to the instance data of the device-node that made the call. This is an invitation to havoc that would be -- to put it politely -- somewhat difficult to trace.

The correct way to invoke a method across device-node boundaries is via $call-parent or $call-method or the like.

In order to detect such errors early on, this Tokenizer keeps track of separate but linked "vocabularies" associated with device-nodes. When the command new-device is encountered in interpretation mode, a new device-node vocabulary is opened and new definitions are entered into it. Definition-names created in the preceding device-node -- presumably the parent of the newly started device -- are suspended from accessiblity.

Correspondingly, when the finish-device command is encountered in interpretation mode, the "vocabulary" of the device being ended is emptied ("forgotten" in classic Forth parlance) and the "vocabulary" of the parent-device is resumed.

The device-node vocabulary to which definitions are being entered at any given time, and from which definitions are accessible, may be referred to as the current device-node for purposes of discussion.

Note that the Tokenizer does not switch vocabularies when the new-device or finish-device commands are encountered in compilation mode (i.e., when they are being compiled-in to a method); they are treated as ordinary tokens, since the shift to a new device-node will not occur until run-time.

The commands new-device and finish-device must remain in balance. If a finish-device is encountered without a prior corresponding new-device, or if the end of FCode (or a Reset-Symbols directive issued in "normal" mode) is reached and not all occurrences of new-device are balanced by a call to finish-device, it will be reported as an Error condition.

Definitions made in "Tokenizer-Escape" mode, however, are independent of device-node vocabularies and remain accessible until they are explicitly reset by a Reset-Symbols directive issued in "Tokenizer-Escape" mode.

Global Definitions

Occasionally, the User might need to create definitions of methods that are intended to be directly accessible to all the device-nodes in a driver, even though they reside in one device node. Such definitions are, effectively, added to the "core" vocabulary, and are Global in scope. Directives to control the creation of Global Definitions are supported by this Tokenizer.

Multiple FCode-block Images

In addition to supporting complexes of subordinate and peer devices within a single assembly, this Tokenizer supports configurations in which subordinate devices' drivers are kept in separate but related FCode blocks. This is accomplished simply by invoking multiple bodies of code bracketed by fcode-version<n> and fcode-end (or one of its synonyms).

When the fcode-end (or equivalent) that ends one body of code is processed, and before the fcode-version<n> that begins the next, the definitions that had been created are forgotten, but assignment of FCode-token numbers will continue in sequence. Likewise, definitions made in "Tokenizer-Escape" mode will persist.

The User who desires to reset one or the other of these, or both, can do so by issuing the directives:

FCode-Reset

Reset-Symbols

respectively, in "Tokenizer-Escape" mode. This is illustrated in Example #3.

Multiple PCI-Image

The PCI Standard allows for a chain of PCI Images to be bundled together in a single binary (as indicated by the "Last Image" bit). This Tokenizer supports such configurations by correctly processing Source that contains multiple occurrences of blocks of code bracketed by pci-header ... pci-header-end directives. An illustration can be seen in Example #1, below.

Examples:

Example #1:

In the first example, the "Inner Body" file has some sections that compile different code, depending on the value of a named-constant switch. The "Outer Shell " file controls the floading of the "Inner Body": It reminds the User to specify the setting of the switch, then produces an image with two PCI-Images, based on opposite settings of the switch. Note that the use of Reset-Symbols in "Tokenizer-Escape" mode before the redefinition of function-switch, to avoid a "Duplicate definition" condition, may be omitted if the User is willing to tolerate the warning.

File: InnerBody.fth

fcode-version2
... Common code <obligatory sneeze> ...
F[   function-switch   F]   [if]
    True-conditional code
      [message]   The true path was wisely chosen
[else]
    False-conditional code
      [message]   You have foolishly chosen the false path.
[then]
... More common code <now cough>.
end0

File: OuterShell.fth

[ifndef] first-path
F[
." Add a command-line switch: -d ""first-path=<true|false>"" "n"\
   "tthen run this again."
    F]
[else]
F[
   [defined] first-path constant function-switch

vendor-id device1-id class-code1 pci-header
F]
not-last-image
fload InnerBody.fth
pci-header-end

F[ reset-symbols
[defined] first-path 0= constant function-switch

vendor-id device2-id class-code2 pci-header
F]
last-image
fload InnerBody.fth
pci-header-end

[then]

Command-Line invocation:

toke -v -d "first-path=true" OuterShell.fth

Example #2:

The second example illustrates the use of Special-Feature Flags to select or de-select specific non-standard features.

The User is developing code that will run across all platforms, and therefore must must be neutral with regard to "Sun"- or "Apple"- -style usage of ABORT" ; this can be best achieved by disallowing the use of ABORT" altogether.

There is no concern, however, about compatibility of the Source with other Tokenizers, so the User need not forgo the conveniences of Local Values and String-remark-escapes.

Furthermore, the Source contains many passages taken from IBM Legacy sources, and the User does not wish to see a WARNING message when the Legacy Locals Separator is used.

The command-line for these conditions would include the following:

toke -f NOabort-quote -f local-values -f NOlv-legacy-warning

Alternatively, these flags may be set from within the Source code thus:

[flag]
NOabort-quote

[flag] local-values

[flag] NOlv-legacy-warning

Note that the invocation of -f local-values is necessary, as its default state is to be disabled. Also, because -f NOabort-quote is invoked, the setting of the Sun-ABORT-Quote flag is irrelevant.

Example #3:

The third example illustrates two situations where two or three FCode blocks are incorporated into a single PCI Image.

In the first situation, the "Outer" block will byte-load the code of the "Inner" block before the "Outer" block has been completely interpreted. It is therefore important to avoid collisions in FCode-token numeric assignments.

F[
h# 5afe  set-rev-level

h# beef       
\ Vendor

h# c0de       
\ Device ID

h# 90210       \
Class Code   (A "classy" ZIP Code... ;-)

F]pci-header

fcode-version2

fload
outer-block.fth

fcode-end

\ 
At this point, the last definition's assigned FCode number
is approximately 0Xac0

fcode-version2

\  The next
definitions' assigned FCode numbers continue from
0Xac1

fload
inner-block.fth

fcode-end

pci-header-end

In the second situation, the "Outer" block will have been completely interpreted before it begins to byte-load the code of the two "Inner" blocks. It can safely discard the token-table of the interpreter, and allow its assigned FCode-token numbers to be re-cycled. Furthermore, the large number of definitions presents a real risk that the full range of usable FCode-token numbers will be exhausted. For these reasons, the User finds it necessary to reset the FCode-token numeric assignments to their initial state before tokenizing the two "Inner" blocks.

F[

h# face  
   set-rev-level

h# cafe  
            \
Vendor

h# d00d  
            \
Device ID

h# 95014  
           \ Another Class(y
ZIP) Code
F]pci-header

fcode-version2

fload outer-block.fth

fcode-end

\  At this
point, the last definition's assigned FCode number
is approximately 0Xae0

fcode-version2

\ 
Reset the assigned FCode numbers for the first "inner" block

FCode-Reset

fload inner-block_01.fth

fcode-end

\  Load the second "inner" block.

fcode-version2

\  Reset
the assigned FCode numbers for the second "inner" block

FCode-Reset

fload inner-block_02.fth

fcode-end

pci-header-end

Example #4

This example serves to illustrate the use of the directives that control the Scope of Definitions, and also to show a means whereby the IBM-Style Local Values Support File can be incorporated at a Global level. Normally, that would be problematical because the Local Values Support functions are written to use instance data, in order to conserve use of System memory. By temporarily over-riding the definition of instance in the manner shown, the User has traded-off economy of System-memory for convenience of programming.

This example also offers an illustration of the use of Shell-Environment Variables in File-Names: Let us suppose that the directory in which the main file for the controller of the assembly whose driver being compiled here resides in a directory-tree, several layers under the root. Immediately below the root is a sub-directory called shared that contains shared functions and the Local Values support, and elsewhere under the tree are sharable bodies of driver code for device-modules that can be incorporated into various assemblies. Let us also suppose that the Makefile that governs this Tokenization process sets Environment Variables; the root of the tree is called DevRoot, and the directory in which the sharable bodies reside are called SCZ and SLZ respectively. And let us further suppose that the inclusion of these two subsidiary devices is optional, controlled by command-line symbol definitions.

\ Define a few macros and functions that will become available

\ to all subordinate device-nodes in this bundle.

\ Some of our functions are written using Local Values.

\ We can make the Local Values Support accessible Globally

\     without incurring any Warning messages, and
without

\     altering the LocalValuesSupport file, if we
temporarily

\     disable the definition of INSTANCE

fcode-version2

headers

global-definitions

\  Bypass warning about 
Instance

\      without altering
LocalValuesSupport file

\  Leave ourselves a way to recover
alias generic-instance 
instance

\  Here's where we disable it.  During development, give
confirmation

\      that the macro has been activated by
printing a message.  Also,

\      insert a token in the place where
the  Instance  would have

\      gone so that our detokenizations line
up
when we inspect.  We

\      do this by defining a comand-line
symbol called  proto-code

\  For the final production run, we can make the macro completely

\      silent by leaving  proto-code 
undefined.
[ifdef] proto-code
overload [macro]
instance  message"
Bypassing Instance"noop
[else]

       \  Make it completely silent
overload [macro]
instance f[  noop  ]f
[endif]

fload ${DevRoot}/shared/LocalValuesSupport.fth

\  Restore normal meaning of  Instance , also in Global scope.
overload alias instance
generic-instance

\  Here's a global definition that uses Local Values

: $CAT   ( _max _str1 _len1 _str2 _len2 -- _max _str1 _len1' )
{ _max _str1
_len1 _str2 _len2 }

   _len1 _max <
if                   
\ there is room

      _str2 _str1 _len1 + _len2 _max _len1 -
min move

   then

   _max _str1 _len1 _len2 + _max min   \ always leave
total length

;

\  Here is a global macro
[macro] 4DUP   2over 2over

\  And another
[macro] 4DROP  2drop
2drop

\  And yet another
[macro] (.h)  base @
swap hex (.) rot base !

\  Other shared functions are in a file in the Shared Code
directory:
fload
${DevRoot}/shared/sharedfuncts.fth

\  Definitions in all subsequent device-nodes will be able to
access

\      the $CAT function and the macros 
4DUP  and 
4DROP, as well as

\      to make use of IBM-Style Local Values
without re-floading

\      the Support File

\  Now let's get back into the primary device-node.
device-definitions

\  Use instance data to create a large temporary buffer

d# 40 dup instance buffer:  temp-buf     
( max )

\  Convolutions to create a name on the fly...

s" controller_" tuck  
                
( max len adr len )

temp-buf
swap                      
    ( max len adr buf len )

move                               
    ( max len )

temp-buf
swap                      
    ( max buf len )

my-self
(.h)                           
( max buf len adr2 len2 )

$cat                                   
( max buf len )

device-name drop

\  Make a macro of that.  Define it in parts.

\  Requires the core of the name on the stack.

\  Macro will create its own  temp-buf  in the current
device.

global-definitions

\  First part of the operation:   ( adr len --
max adr len )

\  Note that "instance" is allowed here because it is not

\      invoked -- or even parsed -- 
until the macro is invoked.
[macro] 
create-buffr  d# 40 dup instance buffer: temp-buf  -rot

\  Second part of the operation:  ( max adr len -- max buf
len )
[macro] 
name-to-buffr   tuck temp-buf swap move temp-buf swap

\  Last part of the operation:  ( max buf len -- )

\  We're using  my-self  as part of the device-name

\      for no particularly good reason other than

\      that it makes for an interesting example... 
;-}
[macro] 
buffr-to-dev-name  my-self (.h) $cat device-name drop

\  Combine the parts:
[macro]  make-my-dev 
create-buffr  name-to-buffr  buffr-to-dev-name

\  Back to the primary device-node.
device-definitions

\  Load its methods from the Current Directory
fload controller_methods.fth

\  Create timestamp property based on
actual tokenization date and time

\  Combine into a single string (with a space between).

\      No trailing null-byte until end.

[fcode-date] encode-bytes   "  " encode-bytes encode+

[fcode-time] encode-string encode+ " release-time" property

\  Log timestamp to the Audit Trail

tokenizer[   [fcode-date] [fcode-time]  ]tokenizer

\  First optional subsidiary
device.  Inclusion controlled by command-line symbol.
[ifdef] scuzzy
new-device

" scuzzy_"  make-my-dev

\  Load its methods

fload ${SCZ}/scuzzy_methods.fth

finish-device

[endif]

\  Second optional subsidiary device, controlled by command-line
symbol.
[ifdef] sleazy
new-device

" sleazy_"  make-my-dev

\  Load its methods

fload ${SLZ}sleazy_methods.fth

finish-device

[endif]

\  That's enough for now...
fcode-end

Example #5

This example serves to illustrate the use of the [function-name] directive to create a series of "you are here" debugging messages.

We will create a pair of macros whose names we can cut'n'paste at the beginning and end of every function we want to give a "you are here" message, switchable by a local debug-flag. The macros will be globally defined, but will make reference to locally-defined names for the debug-flag and the device-node name string.

fcode-version2

global-definitions

\  Each dev-node will create its own debug-flag and alias it
to  debug-me?

\  Each dev-node creates a macro called my-dev-name giving its
device-name

[macro] .fname&dev  [function-name] type ."  in " my-dev-name type 

[macro] name-my-dev  
my-dev-name device-name

[macro] .dbg-enter 
debug-me? @ if ."
Entering " .fname&dev 
cr then

[macro] .dbg-leave 
debug-me? @ if ."
Leaving " 
.fname&dev  cr then

device-definitions

headers

\  Top-most device, named billy

[macro] my-dev-name  "
billy"

name-my-dev

\  debug-billy?  is a flag the user can turn "on" to get
debug-messages

\  from the methods of the device we call "billy"

variable debug-billy? 
debug-billy? off

\  Set it up for the macro

alias debug-me? debug-billy?

: bill

.dbg-enter           
\  This will display  Entering bill in billy

    [char] G dup

    control G 3drop

.dbg-leave           
\  This will display  Leaving bill in billy

;

: factl recursive  ( n -- n! )

type     ." Entering First vers. of " .fname&dev cr

    ?dup 0= if
1

    else  dup 1- 
factl *

    then

    ." Leaving
First vers. of "
.fname&dev type cr

;

: factl ( n -- n! )

type     ." Entering Second vers. of " [function-name]
cr

    ?dup 0= if
1 factl

    else  dup 1-
recurse *

    then

type     ." Leaving Second vers. of " [function-name]
cr

;

variable naught

defer  do-nothing

20 value twenty

30 value thirty

40 buffer: forty

50 constant fifty

create three 0 , 00 , h#
000 ,

struct

4 field >four

constant /four

: peril

.dbg-enter           
\  This will display  Entering peril in billy

    ['] noop to do-nothing

    100 to
thirty

    5 to
naught             
\  Generates a WARNING

    thirty dup - abort"
Never Happen"

.dbg-leave           
\  You get the idea...

;

: thirty ( new-val -- )

    .dbg-enter

    dup to thirty

    alias .dec .d             
\  Generates a WARNING and an ADVISORY

    ." Dirty"  .dec

    .dbg-leave

;

\  First subsidiary device, "child" of billy

new-device

    instance variable cheryl

    [macro] 
my-dev-name  " cheryl"

    name-my-dev

    \  Third-level device, "grandchild" of billy

    new-device

        [macro]  my-dev-name  " meryl"

        name-my-dev

        variable debug-meryl? 
debug-meryl? off

        alias debug-me? debug-meryl?

        : merle

.dbg-enter          
\  This will display  Entering merle in meryl

cheryl                
\  ERROR.  Is in a different dev-node.

        .dbg-leave

        ;

    variable
beryl

    finish-device

    \  Now we're back to "cheryl"

    variable
debug-cheryl?  debug-cheryl? off

    alias debug-me? debug-cheryl?

     : queryl

.dbg-enter          
\  This will display  Entering queryl in cheryl

    over
rot dup nip drop swap   \  Not the most useful
code...  ;-}

    .dbg-leave

     ;

finish-device

\  And we're back to billy.

: droop ( -- )

.dbg-enter           
\  This will display  Entering droop in billy

    twenty

    0 ?do i .h loop

.dbg-leave           
\  You get the idea....

;

\  The following will generate some unexpected errors:

: quack ( maybe not -- oops!  I-forgot-the-close-paren

.dbg-enter           
\  This should display  Entering quack in billy

\  But it doesn't display anything and I don't know why.

    h# 30  .d

.dbg-leave           
\  and I'm still baffled...

;      \  Oh, well, soldier on...

: cluck  ( don't count -- before hatched )  \  Now, there's a close-paren....

Note: At this point, the Tokenizer has issued a Multi-Line Warning about a Comment that began six lines earlier, which gives the User a good indication as to just what went wrong...

   
.dbg-enter           
\  This does display, but not what I expected.

    ." Coming from" [input-file-name]
type ."  line
" [line-number] .d cr

tokenizer[  [function-name]  
]tokenizer    \   Let's not wait for run-time

   
.dbg-leave           
\  Now I get it...

;

fcode-end

"n	New-Line
"l	New-Line
"r	Carriage-Return
"t	Horizontal Tab
"f	Form-Feed
"b	Backspace
"!	Bell
"^L	Quote-Caret followed by a letter is translated as "Control"-the-letter.
"Other	Any unrecognized character following the " is taken verbatim.
""	The way to embed a double-quote into a string is to escape it with itself. (This is a special instance of the preceding rule)
"(	As was mentioned above, Quote-Open-Parenthesis begins parsing a hex-sequence as per Section A.2 of the Standard under the description of the " operator. Details will be discussed below.
"\	Quote-Backslash permits insertion of remarks into the middle of a string definition; it will interrupt string parsing, causing the remainder of the line, together with any whitespace that might begin the new line, to be ignored. Because this feature is not in usual practice, the User can disable it by invoking the noString-remark-escape Special-Feature Flag. If this feature is disabled, the Backslash following the " will be taken verbatim; the Backslash, the text following it on the remainder of the line, the new-line and the whitespace on the next line -- all of which would otherwise have been ignored -- will be included in the string parsing and incorporated into the result.

New Features in OpenBIOS Tokenizer toke

(A User's Guide)

Table Of Contents

Overview

Scope of this Document

What this document does not cover:

Error Detection and other messages:

Case Sensitivity

Categories of Features

Directives

Command-Line options

Non-standard input, syntaxes and behavior

“Tokenizer-Escape” Mode

Features, by Category:

Command-Line options:

Switches

Include-List Directories

Trace the Creation and Invocation of FCode or "Tokenizer Escape"-mode Definitions

Command-Line Symbol Definitions

Special-Feature Flags

“Tokenizer-Escape” Mode

The emit-byte command

Other commands

Standard Commands

Non-Standard operations

Additional FORTH-compatible operations

Directives

Conditional Tokenization

The Conditional-block Terminator

The "False" segment-switcher

Condition-Testers

True/False flag on the top of the stack

Existence or non-existence of an FCode or "Tokenizer Escape"-mode definition

Definition or non-definition of a Command-Line Symbol

Tracing the state of Conditional Tokenization

Suspending the Duplicate-Name Test for one Definition ("Overloading")

Suspending the Multi-Line Warning for one occasion

Controlling the Scope of Definitions

Evaluating Command-line Symbols

Outputting Arbitrary Byte-Sequences to the FCode Binary

Encoding blocks of binary data taken from a file

Generating Special Text-Strings and Literals

Current date or time

Name of the Function currently being defined

Input-File Name and Line Number

Pre-pending PCI Headers to FCode images

Modifying the PCI Header

"Revision Level of the Vendor's ROM"

Byte-Order

"Last Image Indicator" bit

Changing the name of the Binary Output File

Issuing messages at Tokenization time.

Changing Special-Feature Flags from Source

Displaying Special-Feature Flags from Source

Manipulating the FCode-Token-Number Assignment Counter

Saving and Restoring

Resetting

Resetting Symbols Defined in Either Mode

Non-standard input, syntaxes, and behavior

Ambiguous Conditions

Tokens expected on the same line

The h# d# and o# directives

The commands leave ?leave or unloop outside of a loop-control framework

The commands ['] or ' followed by a name that is not a valid target

The Forth word to followed by a name that is not a valid target

The word instance

Followed by an inapplicable defining-word

Left unresolved

Not Allowed in...

The command fload inside a colon-definition

Non-Standard Synonyms for Standard Functions

Non-Standard Functions or behaviors

Alias

String-Escape characters and other String-Gathering Features

Quoted String-Escape Characters

Embedded New-Lines

Hex-Sequence Processing

C-Style String-Escape Characters

The ABORT" (Abort-Quote) command

Conveniently Convert Short Character-Sequence to Number