In a production, epsilon represents an empty right-hand-side of a production.


non-terminals = {
g_abs,
g_array_tail2,
g_atn,
g_comparison,
g_conjucntion_tail,
g_conjunction,
g_cos,
g_datastmt,
g_datastmt_tail,
g_datum_value,
g_def_arglist,
g_defstmt,
g_dimstmt,
g_dimtail,
g_disjunction,
g_disjunction_tail,
g_exitstmt,
g_exp,
g_factor,
g_factor_tail,
g_forstmt,
g_for_tail,
g_function_tail,
g_go_stmt,
g_gosubstmt,
g_gotail,
g_gotostmt,
g_ifstmt,
g_inputstmt,
g_inputstmt_tail,
g_int,
g_integer,
g_letstmt,
g_letstmttail,
g_lineno,
g_lines,
g_linestail,
g_log,
g_matrix,
g_navar,
g_nextstmt,
g_not,
g_numeric_expression,
g_numeric_expression_tail,
g_nvar,
g_on_gotostmt,
g_on_gotostmt_tail,
g_optionbasestmt,
g_primary,
g_print_item,
g_print_item_tail,
g_printlist,
g_printstmt,
g_program,
g_randomizestmt,
g_readstmt,
g_readstmt_tail,
g_real,
g_relational_expression,
g_relational_term,
g_relationa_primary,
g_relop,
g_relop2,
g_remstmt,
g_restorestmt,
g_returnstmt,
g_len,
g_rnd,
g_sgn,
g_sin,
g_sqr,
g_statement,
g_stopstmt,
g_string_expression,
g_tan,
g_target_arraytail2,
g_target_variable,
g_term,
g_term_tail,
g_udf
}

See ECMA-55.TXT for detailed defintions of string and numeric
literals such as T_REAL, T_QSTRING, T_UQSTRING.
Quoted and unquoted strings are described in section 4.2,
and floating point numbers are described in seciton 6.1
of the ECMA-55.TXT file.  Briefly, a quoted string begins
with a ", then contains anything but a newline, EOF, or
another ", then a closing ".  This compiler, as an extension,
permits lower-case letters within a quoted string.  An
unquoted string is similar but does not have the leading or
closing " bytes.  Floating point numbers can be in four
formats:

implicit point representation              sd...d
explicit point unscaled representation     sd..drd..d
explicit point scaled representation       sd..drd..dEsd..d
implicit point scaled representation       sd..dEsd..d

where d is a decimal digit, r is a full-stop s is an
optional sign, and E is the explicit character E.
A sign is '+' or '-'.  Integers used for Minimal BASIC
line numbers are in a format d..d with between 1 and 4
digits and are not permitted to have a sign indicator.


terminals = {
T_ABS=0,        // ABS() function 'ABS'
T_ADD,          // binary add operator '+'
T_AND,          // logical AND
T_ATN,          // ATN() function 'ATN'
T_BASE,         // part of OPTION BASE statement 'BASE'
T_COMMA,        // comma used in INPUT, PRINT, DATA ','
T_COS,          // COS() function 'COS'
T_DATA,         // DATA statement 'DATA'
T_DEF,          // DEF statement 'DEF'
T_DIM,          // DIM statement 'DIM'
T_DIV,          // binary divide operator '/'
T_END,          // END statement 'END'
T_EOF,          // synthetic end-of-file token
T_EOL,          // end-of-line token '\n'
T_EQ,           // equals relational operator '='
T_EXIT,         // used for EXIT FOR
T_EXP,          // EXP() function 'EXP'
T_FNID,         // numeric user-defined function 'FN?' where ? is a ASCII letter 'A'..'Z'
T_FOR,          // part of FOR statement 'FOR'
T_GE,           // greater than or equals relational operator '>='
T_GO,           // part of GO TO or GO SUB 'GO'
T_GOSUB,        // GOSUB statement 'GOSUB'
T_GOTO,         // GOTO statement 'GOTO'
T_GT,           // greater than relational operator '>'
T_IF,           // part of IF statement 'IF'
T_INPUT,        // INPUT statement 'INPUT'
T_INT,          // INT() function 'INT'
T_INTEGER,      // integer literal 1 or more digits, where digit is '0'..'9', optionally preceeded by a single '-' or '+'
T_LE,           // less than or equal to relational operator '<='
T_LEN,          // LEN() function 'LEN'
T_LET,          // LET statement 'LET'
T_LOG,          // LOG() function 'LOG'
T_LPAREN,       // left parenthesis '('
T_LT,           // less than relational operator '<'
T_MUL,          // binary multiply operator '*'
T_NAVAR,        // numeric array variable 'A'..'Z'
T_NE,           // not equals relational operator '<>'
T_NEXT,         // part of FOR statement 'NEXT'
T_NOT,          // logical NOT
T_NVAR,         // numeric scalar variable 'A'..'Z','A0'..'Z0',...'A9'..'Z9'
T_ON,           // part of ON .. GOTO statement 'ON'
T_OPTION,       // part of OPTION BASE statement 'OPTION'
T_OR,           // logical OR
T_POW,          // power operator '^'
T_PRINT,        // PRINT statement 'PRINT'
T_QSTRING,      // quoted string used in PRINT, DATA
T_RANDOMIZE,    // RANDOMIZE statement 'RANDOMIZE'
T_READ,         // READ statement 'READ'
T_REAL,         // floating point literal value
T_REM,          // REM statement 'REM' following by anything up to a newline
T_RESTORE,      // RESTORE statement 'RESTORE'
T_RETURN,       // RETURN statement 'RETURN'
T_RND,          // RND function 'RND'
T_RPAREN,       // right parenthesis ')'
T_SEMI,         // semicolon ';'
T_SGN,          // SGN() function 'SGN'
T_SIN,          // SIN() function 'SIN'
T_SQR,          // SQR() function 'SQR'
T_STEP,         // part of FOR statement 'STEP'
T_STOP,         // STOP statement 'STOP'
T_SUB,          // part of GOSUB statement 'SUB'
T_SUBTRACT,     // binary subtraction operator '-'
T_SVAR,         // string scalar variable 'A$'..'Z$'
T_TAB,          // TAB() function 'TAB'
T_TAN,          // TAN() function 'TAN'
T_THEN,         // part of IF statement 'THEN'
T_TO,           // part of GOTO and ON .. GOTO 'TO'
T_UQSTRING      // unquoted string used in DATA and INPUT
}

start symbol = g_program

productions = {
g_program -> g_lines T_UINTEGER T_END T_EOL T_EOF
g_linestail -> epsilon
g_linestail -> g_lineno g_statement T_EOL g_linestail
g_lines -> epsilon
g_lines -> g_lineno g_statement T_EOL g_linestail
g_statement -> g_gosubstmt
g_statement -> g_gotostmt
g_statement -> g_go_stmt
g_statement -> g_returnstmt
g_statement -> g_stopstmt
g_statement -> g_remstmt
g_statement -> g_optionbasestmt
g_statement -> g_restorestmt
g_statement -> g_randomizestmt
g_statement -> g_letstmt
g_statement -> g_printstmt
g_statement -> g_forstmt
g_statement -> g_nextstmt
g_statement -> g_dimstmt
g_statement -> g_datastmt
g_statement -> g_readstmt
g_statement -> g_inputstmt
g_statement -> g_on_gotostmt
g_statement -> g_ifstmt
g_statement -> g_defstmt
g_statement -> g_exitstmt
g_exitstmt -> T_EXIT T_FOR
g_stopstmt -> T_STOP
g_remstmt -> T_REM
g_returnstmt -> T_RETURN
g_gosubstmt -> T_GOSUB g_lineno
g_gotostmt -> T_GOTO g_lineno
g_gotail -> T_SUB glineno
g_gotail -> T_TO glineno
g_go_stmt -> T_GO g_gotail
g_optionbasestmt -> T_OPTION T_BASE T_INTEGER
g_restorestmt -> T_RESTORE
g_randomizestmt -> T_RANDOMIZE
g_letstmt -> T_LET g_letstmt_tail
g_letstmt_tail -> g_target_variable T_EQ g_string_expression
g_printstmt -> T_PRINT g_printlist
g_printlist -> epsilon
g_printlist -> g_print_item g_print_item_tail
g_print_item_tail -> epsilon
g_print_item_tail -> T_SEMI g_print_item g_print_item_tail
g_print_item_tail -> T_COMMA g_print_item g_print_item_tail
g_print_item -> epsilon
g_print_item -> T_TAB T_LPAREN g_numeric_expression T_RPAREN
g_print_item -> g_numeric_expression
g_print_item -> g_string_expression
g_for_tail -> epsilon
g_for_tail -> T_STEP g_numeric_expression
g_forstmt -> T_FOR T_NVAR T_EQ g_numeric_expression T_TO g_numeric_expresion g_for_tail
g_nextstmt -> T_NEXT T_NVAR
g_dimtail -> epsilon
g_dimtail -> T_COMMA T_NAVAR T_LPAREN T_INTEGER g_matrix T_RPAREN g_dimtail
g_matrix -> espilon
g_matrix -> T_COMMA T_INTEGER
g_dimstmt -> T_DIM T_NAVAR T_LPAREN T_INTEGER g_matrix T_RPAREN g_dimtail
g_datastmt_tail -> epsilon
g_datastmt_tail -> T_COMMA g_datum g_datastmt_tail
g_datastmt -> T_DATA g_datum g_datastmt_tail
g_datum_value -> T_REAL
g_datum_value -> T_INTEGER
g_datum_value -> T_QSTRING
g_datum_value -> T_UQSTRING
g_readstmt_tail -> epsilon
g_readstmt_tail -> g_target_variable g_readstmt_tail
g_readstmt -> T_READ g_target_variable g_readstmt_tail
g_inputstmt_tail -> epsilon
g_inputstmt_tail -> g_target_variable g_inputstmt_tail
g_inputstmt -> T_INPUT g_target_variable g_inputstmt_tail
g_target_variable -> T_SVAR
g_target_variable -> T_NVAR
g_target_arraytail2 -> epsilon
g_target_arraytail2 -> T_COMMA g_numeric_expression
g_target_variable -> T_NAVAR T_LPAREN g_numeric_expression g_target_arraytail2 T_RPAREN
g_on_gotostmt_tail -> epsilon
g_on_gotostmt_tail -> T_COMMA g_lineno g_on_gotostmt_tail
g_on_gotostmt_tail1 -> T_GOTO g_lineno g_on_gotostmt_tail
g_on_gotostmt_tail1 -> T_GO T_TO g_lineno g_on_gotostmt_tail
g_on_gotostmt -> T_ON g_numeric_expression g_on_gotostmt_tail1
g_relational_expression -> g_disjunction
g_ifstmt -> T_IF g_relational_expression T_THEN g_lineno
g_lineno -> T_INTEGER
g_def_arglist -> epsilon
g_def_arglist -> TLPAREN T_NVAR TRPAREN
g_defstmt -> T_DEF T_FNID g_def_arglist T_EQ g_numeric_expression
g_disjunction -> g_conjunction g_disjunction_tail
g_disjunction_tail -> T_OR g_conjunction g_disjunction_tail
g_disjunction_tail -> epsilon
g_conjunction -> g_relational_term g_conjunction_tail
g_conjunction_tail -> T_AND g_relational_term g_conjunction_tail
g_conjunction_tail -> epsilon
g_relational_term -> g_not g_relational_primary
g_not -> T_NOT
g_not -> epsilon
g_relational_primary -> ( g_relational_expression )
g_relational_primary -> g_comparison
g_comparison -> g_string_expression g_relop2 g_string_expression
g_comparison -> g_numeric_expression g_relop g_numeric_expression
g_relop -> T_LE
g_relop -> T_LT
g_relop -> T_GE
g_relop -> T_GT
g_relop -> T_NE
g_relop -> T_EQ
g_relop2 -> T_NE
g_relop2 -> T_EQ
g_numeric_expression_tail -> epsilon
g_numeric_expression_tail -> T_ADD g_term g_numeric_expression_tail
g_numeric_expression_tail -> T_SUBTRACT g_term g_numeric_expression_tail
g_numeric_expression -> g_sign g_term g_numeric_expression_tail
g_string_expression -> T_QSTRING
g_string_expression -> T_SVAR
g_term_tail -> epsilon
g_term_tail -> T_DIV g_factor g_term_tail
g_term_tail -> T_MUL g_factor g_term_tail
g_term -> g_factor g_term_tail
g_factor_tail -> epsilon
g_factor_tail -> T_POW g_primary
g_factor -> g_primary g_factor_tail
g_primary -> T_LPAREN g_numeric_expression T_RPAREN
g_primary -> g_navar
g_primary -> g_nvar
g_primary -> g_udf
g_primary -> g_real
g_primary -> g_integer
g_primary -> g_rnd
g_primary -> g_tan
g_primary -> g_sqr
g_primary -> g_sin
g_primary -> g_sgn
g_primary -> g_log
g_primary -> g_int
g_primary -> g_exp
g_primary -> g_cos
g_primary -> g_atn
g_primary -> g_abs
g_primary -> g_len
g_array_tail2 -> epsilon
g_array_tail2 -> T_COMMA g_numeric_expression
g_navar -> T_NAVAR T_LPAREN g_numeric_expression g_array_tail2 T_RPAREN
g_nvar -> T_NVAR
g_function_tail -> epsilon
g_function_tail -> T_LPAREN g_numeric_expression T_RPAREN
g_udf -> T_FNID g_function_tail
g_real -> T_REAL
g_integer -> T_INTEGER
g_tan -> T_TAN T_LPAREN g_numeric_expression T_RPAREN
g_sqr -> T_SQR T_LPAREN g_numeric_expression T_RPAREN
g_sin -> T_SIN T_LPAREN g_numeric_expression T_RPAREN
g_sgn -> T_SGN T_LPAREN g_numeric_expression T_RPAREN
g_log -> T_LOG T_LPAREN g_numeric_expression T_RPAREN
g_int -> T_INT T_LPAREN g_numeric_expression T_RPAREN
g_exp -> T_EXP T_LPAREN g_numeric_expression T_RPAREN
g_cos -> T_COS T_LPAREN g_numeric_expression T_RPAREN
g_atn -> T_ATN T_LPAREN g_numeric_expression T_RPAREN
g_abs -> T_ABS T_LPAREN g_numeric_expression T_RPAREN
g_rnd -> T_RND
g_len -> T_LEN T_LPAREN g_string_expression T_RPAREN
}


NOTES

Minimal BASIC's grammar about unary minus requires that an expression like -2^2
yield -4, not 4 as I expected.  So let's look at possible parse trees for that:

         -                        ^
       /                        /   \
      ^                        -     2
    /   \                     /
   2     2                   2

   ECMA-55                 I had expected it
Minimal BASIC                 to yield 4
Must yield -4

Python, tcl, and vala yeild -4 in this situation.  The grammar Mr. Ham wrote
and used for the compiler for years failed to handle this case, but somehow it
never came up in practice.  When it initially came up in the NBS test suite
programs P026 and P038, I just assumed I was correct and marked those as test
case errors and forgot about it.  However, while debugging a new DAG-based
arithmetic expression code generator used in later releases, this problem was
seen with some ad-hoc code used in my testing.  This bug affected both the full
parser code, and the similar (in spirit) code used for the constant folding,
since they were based on the same incorrect grammar.  At least I assumed at
that point it was the grammar.

While it might be argued the precedence of unary - (and unary +) is an
unfortunate choice in the language specification, it is necessary to faithfully
implement the specification in order to claim true compliance.  Many
other scripting languages also have a power operator with higher precdence
than unary minus, so this precedence must be accepted practice today.
This verifies that the ECMA-55 specification was correct and my expectation
was wrong.  A new test was added for this specific case.

While testing the compiler to see just how bad the bug was, I decided to try
something silly where I tried to print -----5 and that actually caused the
compiler to abort.  This is when I realized that all of the unary - operator
handling in my grammar was flat-out wrong.  Another test was added for this
case.  Then the grammar investigation began in earnest and it turns out there
were multiple problems.  But even the scanner had problems, since it was
returning things like '-2' as a single T_INTEGER token with value -2 instead of
two tokens, T_SUBTRACT and T_INTEGER with a value of 2.  We want the token
stream to be

T_SUBTRACT T_INTEGER T_POW T_INTEGER

where the value of both integers is 2.  And from that we want to build
the tree:

         -
       /
      ^
    /   \
   2     2

This means unary minus must have a lower precedence than the power operator.
It was time to take another look in the ECMA-55 standard.  My re-reading of
section 6.2,8.2 and 9.2 eventually resulted in this grammar:

numeric-expression        -> leading-sign term numeric-expression-tail
numeric-expression-tail   -> sign term numeric-expression-tail
numeric-expression-tail   ->
leading-sign              -> sign
leading-sign              ->
sign                      -> '+'
sign                      -> '-'
term                      -> factor term-tail
term-tail                 -> multiplier factor term-tail
term-tail                 ->
factor                    -> primary factor-tail
factor-tail               -> '^' primary factor-tail
factor-tail               ->
multiplier                -> '*'
multiplier                -> '/'
primary                   -> numeric-variable
primary                   -> numeric-rep
primary                   -> numeric-function-ref
primary                   -> '(' numeric-expression ')'
numeric-function-ref      -> numeric-function-name argument-list
numeric-function-name     -> user-defined-function
numeric-function-name     -> numeric-supplied-function
argument-list             -> '(' argument ')'
argument-list             ->
numeric-supplied-function -> ABS
numeric-supplied-function -> ATN
numeric-supplied-function -> COS
numeric-supplied-function -> EXP
numeric-supplied-function -> INT
numeric-supplied-function -> LOG
numeric-supplied-function -> RND
numeric-supplied-function -> SGN
numeric-supplied-function -> SIN
numeric-supplied-function -> SQR
numeric-supplied-function -> TAN
numeric-defined-function  -> FNletter
where letter is a member of the set {'A'..'Z'}

And section 6.2 shows numeric-rep is an unsigned integer or floating point
value.  Section 8.4 actually explictely says in an example that -A^B is
interpreted as -(A^B).  My strict reading of the specification showed me that
in fact -----5 cannot be used but instead if it was really desired then
-(-(-(-(-5)))) would have to be used instead.  The leading 'sign' in right-hand
side of numeric-expression is allowed zero or one times.  The second 'sign' in
that right-hand side must occur exactly once.  Following the grammar, term to
factor to primary to numeric-rep does not allow another 'sign'.  In fact, the
grammar would prevent 3+-2 since you would get:

numeric-expression    -> leading-sign term numeric-expression-tail
leading-sign ->
term -> factor term-tail
factor-> primary factor-tail
primary -> numeric-rep                 [accept '3']
factor-tail->
term-tail->
numeric-expression-tail -> sign term numeric-expression-tail
sign -> '+'                            [accept '+']
term -> factor term-tail
factor -> primary factor-tail
primary -> ?????                       [crash]

Instead, you must type 3+(-2) to get a negative two after the add.
Ok, well once I knew what was wrong, I fixed it ;-D
