Copyright © 2005 International Business
Machines®, All Rights Reserved.
Licensed under the Common Public
License (CPL) version 1.0
"Local Values" are an IBM-specific extension of FORTH syntax, currently used both by the FCode Tokenizer and Platform Firmware. They might be considered a variant that meets the spirit, if not the letter, of the suggestions for a "Locals word set" discussed -- but not specified -- in the ANSI FORTH Standard, Section 13 and Appendix A.13
(Please note that the ANSI document does not really specify this feature, because the Committee could not reach an agreement. Appendix A.13 records the somewhat lively discussions that accompanied this topic.)
We will refer to this feature with the nomenclature "Local Values" in preference to "Local Variables" or "Locals" in order to (a) more accurately characterize the behavior of these objects, and (b) further emphasize the differences between the IBM-specific extension and those discussed in the ANSI document.
The section labeled Implementation is a description of the underlying parsing and support mechanisms that meet the Design Objectives.
Also, the implementation shall support a means whereby to remain compatible with IBM's existing code-base.
Local Values may only be declared in connection with a colon-definition (A "word" in FORTH parlance.)
Declaration of Local Values is triggered by an open-curly-brace (i.e.,: { ) , and ends with a close-curly-brace ( } ).
A further distinction is made between Initialized Local Values and
Uninitialized Local Values: Initialized Local Values are declared
first, and
are separated by a special
character from Uninitialized Local Values.
Declaration of Local Values may only occur once within the body of
the colon-definition.
Declaration of Local Values after code has been compiled into the
body of the word is not recommended, but is permitted. A Local
Values
Declaration that occurs inside a Flow-Control Structure will be
reported as an Error.
A Local Values Declaration may include comments and may continue across multiple lines. See the example in the Implementation section.
Two symbols are accepted as the separator between Initialized and Uninitialized Local Values, the Semicolon ( ; ) and the Vertical-Bar ( | ).
Since, in FORTH, Semicolon is heavily fraught with a very important meaning, it is preferable to use a different symbol -- one that isn't used for anything else -- as the separator between Initialized and Uninitialized Local Values. Better still would be a symbol that's given at least passing mention in the discussion about the (failed) attempt to establish an ANSI standard for Locals (see the ANSI Forth Spec., section 13.6.2.1795).
The Vertical-Bar symbol ( | ) fills that bill nicely.
Local Values Declarations will accept Semicolon as an alternative ("Legacy") separator between Initialized and Uninitialized Local Values, and issue a Warning message to the effect that the use of Semicolons in that context is deprecated in favor of the Vertical-Bar.
The User may suppress this message by means of a Command-line switch, known as the Special-Feature Flag named NoLV-Legacy-Message , which is described in the Tokenizer User's Guide.
Conversely, the User who wishes to disallow the use of Semicolon as an alternative separator may do so by means of the Special-Feature Flag named NoLV-Legacy-Separator . When the Legacy Local Values Separator is thus disallowed, occurrences will be treated as an Error.
No comments are permitted between the -> and the Local-Value name to which it applies.
The -> and the Local-Value name to which it applies must be on the same line.
The -> operator relates to the Local-Value name to which it is applied in a manner similar to the way the TO operator relates, when it is applied, to a name defined by VALUE ; it causes the numeric value on top of the Parameter Stack to be popped and stored into -- associated with -- the named Local Value.
Initialized Local Values are initialized from the stack at the start of execution of the defined word, in the same order as the convention for a stack-diagram, i.e., the first-named Local Value is initialized from the stack-item whose depth corresponds to the total number of initialized Local Values, the last-named Local Value is initialized from the top-of-stack item, and so on in between.
The following will serve to illustrate:
: <word-name> ( P_x ... P_y P_0 P_1 ... P_n-2 P_n-1 --
??? )
{ IL_0 IL_1
... IL_n-2 IL_n-1 | UL_0 UL_1 }
\
At the start of the word,
IL_0 through IL_n-1 are initialized
\
with P_0 thorough P_n-1, respectively, and the stack contains
( P_x ... P_y )
(1) The ANSI FORTH Committee discussions make no provision for
Uninitialized Locals,
and
(2) The order of initialization is reversed. In the ANSI document, Locals are initialized in the order they are declared, so that the first-declared will take the topmost value on the stack, and the last-declared will take the deepest value.
The general consensus within IBM is that this scheme is confusing at best, and does not serve the intent of the Design Objectives.
Following the -> ("dash-arrow") symbol with anything other than the name of a Local Value is an Error.
Each new Local Value name has an integer assigned to it. The Parser assigns successive integers, starting with 0, to the Local Value names, in the order that they are declared, and enters the name of each new Local Value, together with its assigned integer, into the separate reserved temporary area.
After all the Local Value names have been declared, i.e., after the close-curly-brace has been read, the Parser compiles-in the number of Initialized Local Values, followed by the number of Uninitialized Local Values, where they will act as arguments to the appropriate function, which the Parser compiles-in immediately after. The function will be the special one that allocates space for, and initializes, the Local Values at the time they are about to be used.
While the definition under construction is being compiled, the area where the temporary compile-time definitions of the new Local Value names have been created must be available to the scanning process, so that the new names will be recognized when invoked. Also, it should be scanned first, ahead of any other word-lists, so that the Local Value names will supercede any similarly-named words, in case of a naming-overlap.
When a Local Value's name is invoked, the Parser compiles-in its assigned integer as an argument to the appropriate function, which is compiled-in immediately after. The function will be a common one that will push onto the stack the address at which the numbered Local Value can be accessed. The Parser will then compile-in either the "fetch" function ( @ ) or the "store" function ( ! ), depending on whether the Local Value name was invoked by itself or in conjunction with the -> operator. This way the User/Programmer's view of Local Values' VALUE-style behavior is preserved.
The FORTH functions exit and ; (semicolon) have to be overloaded. (Section 13.3.3 of the ANSI document also mentions ;CODE and DOES> but these are not recognized by the Tokenizer, so we will not discuss them here.) The overloaded definitions must take special action at compile-time (note that ; -- semicolon -- does that normally, anyway, but exit does not) to: compile-in the total number of Local Values as an argument to the appropriate function, which is compiled-in immediately afterwards, before completing their normal behavior. The function in this case will be the special one that releases the space that had been allocated for the Local Values, and restores the state of Local Values storage to the way the calling routine left it. Semicolon must also clear the area where the temporary compile-time definitions of the new local-names were created, rendering them inaccessible.
The three functions' names are:
{push-locals} ( #ilocals #ulocals -- )
{pop-locals} ( total#locals -- )
_{local}
( local-var# --
addr )
:
faber ( m4 m3 n2 n1 n0 -- m4 m3 ) { \ These are initialized values: _otter _weasel _skunk | \ These are uninitialized: _muskrat _mole } _skunk 40 * -> _muskrat _muskrat alloc-mem -> _mole base @ hex _weasel (.) _mole place decimal _otter (.) _mole $cat base ! _mole count type _mole _muskrat free-mem ; |
\ Does nothing useful. Just an example. \ BEGIN the declaration of Local Values. \ _otter is initialized with the value of n2 \ _weasel is initialized with the value of n1 \ _skunk is initialized with the value of n0 \ and will be used to determine an amount of memory to allocate. \ Vertical bar ends the group of Initialized Local Values. \ NOTE: m4 and m3 stay on the stack. \ _muskrat will take the final size of the allocation. \ _mole will hold the address of the allocated memory \ END the declaration of Local Values. |
The compilation of
faber
starts with 3 2
{push-locals} . The first invocation of _skunk
(by itself) compiles as 2 _{local} @ and the
sequence
->
_muskrat
compiles as 3 _{local) !
Finally, faber
ends with 5 {pop-locals} before the unnest
. After
that, the local-names are no longer accessible.
The obvious way to deliver this package of support functions would
be to incorporate, into
the FCode source being Tokenized, a Prologue or "Library" file that
contains the definitions of the three above-named compiled-in
functions, along with all their required support.
A file defining the Local Values Support Functions has been written
and
will be delivered as part of the implementation of this
Project. The user/programmer will be responsible for floading
it into the FCode source program to be Tokenized.
The user/programmer has the option of specifying the
placement of the Local Values Support Functions file within the body of
the
FCode source program, and even of making alterations to it, if needed.
Error handling:
If the Local Values Support Functions file is not floaded, then the
Parser, when it completes the processing of a Local Values declaration,
i.e, when it encounters the close-curly-brace, or, similarly, when it
encounters an invocation of a Local Value's name, will proceed as
normal to compile-in the call to the appropriate function. That
function's name will not be recognized, and the Tokenizer will exhibit
the normal error-behavior for an invocation of an unrecognized name.
We define a locals-base pointer that will point to the base -- within the reserved Local Values Storage Area -- of the set of Local Values currently in use; it will be initialized to point just past the end of the locals-storage area.
The address to which the <n> _{local} routine will point is calculated as the given number of cells above the locals-base pointer.
The ( #I-Ls #U-Ls -- ) {push-locals} routine works in two stages: for the Unitialized Local Values, it simply decrements the locals-base pointer by the number of cells given in the top argument. The Initialized Local Values are then handled one at a time: the locals-base pointer is decremented by a single cell, and the data-item on top of the parameter stack is popped and stored into the cell at which the locals-base pointer now points. The result is that the topmost stack-item is placed in the last-declared Initialized Local, and so on down the line until the lowest stack-item is placed in the first-declared Initialized Local Value. Neat, sweet, and petite.
The ( #-Ls -- ) {pop-locals} routine simply increments the locals-base pointer by the given number of cells, which is the total number of Local Values used by the function in which it occurs.
Because functions that use Local Values can call each other, (i.e., the use of Local Values can be nested), the depth of the nesting might be unpredictable. Therefore, the {push-locals} routine must perform error-checking: Before decrementing the locals-base pointer, it must test whether doing so would put the pointer below the start of the area reserved for Local Values Storage. Such an error is inevitably fatal, and can only be handled by an ABORT occurring in conjunction with a warning message advising the programmer to increase the size of the Local Values Storage (and, by implication, re-Tokenize).
It will be the developer's responsibility to catch all such errors during early testing. To prevent generating hidden errors of this sort, the programmer is advised to use Local Values judiciously, and particularly to avoid using them in functions that may be called re-entrantly or recursively to an uncontrolled depth. Fortunately, such routines are rare and easily identified.
Additional help can be provided in the form of a second floadable
Local Values Support Function source file -- to be used during
development only -- that would overload the
{push-locals}
and
{pop-locals} routines with the additional action
of keeping track of -- and, of course, displaying at will
--
the maximum depth used in the course of a test run. Such overloading of
functions is very simple and straightforward in FORTH.
The Tokenizer is sophisticated enough to keep a separate vocabulary for each device-node, and will flag an Error if Local Values are used in a device-node for which the Local Values Support Functions file has not been floaded.
However, should the user so choose, a means is available whereby a single floading of the Local Values Support Functions can become accessible to all Device Nodes in a driver, trading off economy of System-memory for convenience of programming.
An FCode program that utilizes Local Values, that calls throw , and that has a corresponding catch to guard it, will need to keep its Local Values properly synchronized.
A throw done by an FCode program that does not have a corresponding catch to guard it will be caught outside the scope of that FCode program, and the question of synchronizing Local Values will be rendered irrelevant.
An overloaded
catch in the Local Values Support
Functions file does the job.
Constructing it was quite simple: It needs to (a) save the locals-base pointer onto the return stack, (b) do a system (generic) CATCH, and (c) restore the locals-base pointer. Counterintuitive though this might be, it does not even need to examine the result of the system (generic) CATCH ; it can restore the locals-base pointer in either case. If the result was zero (i.e., no throw occurred), the Local Values Pointer will be the same as it was when saved and restoring it will be harmless...