For ECMA-116 BASIC-1 Full BASIC, most people would use the DECIMAL math
mode since that is the default mode.  The mode already implemented for ecma55
is sufficient to be correct for NATIVE math mode.  Since BCD math does not work
in 64bit mode for x86-64, that means implementing a full decimal math package,
or more likely using one of the FOSS decimal math libraries that are available
on the internet.  In that case, each operation would require a function call (or
perhaps a macro) very similar to the way it has been done for ecma55.

ECMA-116 BASIC-2 requires yet another math mode, FIXED, which would also
require using a library for all math operations.  While implementing what
ECMA-116 calls NATIVE mode was hard, implementing a compliant FIXED mode would
be orders of magnitude more difficult and that is one of the many reasons that
the standard never was widely implemented.  In fact, the author of ecma55 has
been unable to find any fully compliant implementation of ANSI Full BASIC, even
commercially.  The closest implementation discovered so far that is a compiler
is BASICAcc at http://hp.vector.co.jp/authors/VA008683/english/BASICAcc.htm,
but it emits Object Pascal for the Free Pascal compiler and not assembly code.
Another reason that Full BASIC has never been adopted is the fact that no
set of complete examples with expected output when using DECIMAL or FIXED modes
is present in the ECMA-116 standard, making assessing compliance essentially
impossible.  Since the corresponding ANSI standard is not free, the author of
ecma55 has no idea if it includes examples or not.  What is even more shocking
is that TrueBASIC at http://www.truebasic.com/, founded by the creators of
BASIC, does not even attempt to implement ANSI Full BASIC even as an optional
mode, and uses what the standards call NATIVE mode with 64bit doubles, just as
ecma55 does.

The ecma55 compiler once generated scalar AVX code when the -A option was used,
but it did not take advantage of the 3-operand syntax.  It also did not do
any vectorization.  Sadly, this meant that the AVX code was actually a bit
slower than the SSE2 code, so the -A option was removed.

Future work for the ecma55 compiler should include generating even better
arithmetic expression evaluation code that uses the 3-operand AVX syntax,
but only after vectorization is added.

                             [  An Aside  ]

    The author of ecma55 would like to thank Dr. Suwanna Rasmequan,
    Dean of Faculty of Informatics, Burapha University, for
    providing him with an awesome Haswell machine to continue his
    compiler development research.

                                ---+---

Implementing OPTION ARITHMETIC DECIMAL would be beneficial.  However, the
benefits of OPTION ARITHMETIC FIXED are small and it is likely that nobody will
ever implement that mode in any new BASIC implementation.  That mode appears to
mimic COBOL style math, and a decimal or arbitrary precision library would be
more useful.  Actually efficiently implementing DECIMAL math now that Intel has
dropped BCD support for 64bit processors would be non-trivial, and using a
library from an outside source would make sense.  Things to consider would
include:

decNumber        http://speleotrove.com/decimal/#decNumber
                 The decPacked format would be general purpose,
                 but decQuad would probably be easier to incorporate
                 and would still give 34 digits of precision.

MPFR             http://www.mpfr.org/
MAPM             http://www.tc.umn.edu/~ringx004/mapm-main.html
BigDigits        http://www.di-mgt.com.au/bigdigits.html
apfloat          http://www.apfloat.org/apfloat/
CLN              http://www.ginac.de/CLN/

                                ---+---

Saving/Restoring FPU state for AMD64 CPUs

Let's say you have incoming args in xmm1, xmm3, xmm7 and
the result goes in xmm1.  How can we do whatever we want
with all xmm registers, and yet still on return only xmm1
is changed, and it has the result?

        pushq   %rbp
        movq    %rsp, %rbp
        andq    $-16, %rsp      # force align the stack on 16-byte
        pushq   %r8             # save r8
        subq    $8,%rsp         # padding to keep 16-byte alignment
        # save total FPU status
        subq    $512,%rsp       # 512 FPU save area on stack
        fxsave64 (%rsp)

.... do whatever ....

        movq    %xmm1,%r8       # r8=xmm1
        # restore total FPU status
        fxrstor64 (%rsp)
        movq    %r8,%xmm1       # xmm1=r8
        addq    $8,%rsp         # remove padding
        popq    %r8             # restore r8
        movq    %rbp, %rsp
        popq    %rbp

Can it be done better?  Well, the consecutive subq's can be combined into
a single "subq $520,%rsp".  What about GP registers?  This showed how
to preserve r8.  If you need to preserve more, just add the required push
statements at the top, and the pop statements at the end.  If you have
an even number of 8 byte registers, then no padding is needed.  If you have
an odd number of 8 byte registers, you need 8 bytes of padding.  The alignment
force is required since fxsave64/fxrstor64 will crash if the save area is
not 16-byte aligned.

Logarithms

x <= 0 domain error
x == 1 returns 0
x == +INF, returns +INF

log2(x) = ln(x)/ln(2)
ln(2) = 0.6931471805599453
1/ln(2) = 1.4426950408889634
IEEE float : 0x3FB8AA3B
IEEE double: 0x3FF71547652B82FE
log2(x) = 1.4426950408889634 * ln(x)

log10(x) = ln(x)/ln(10)
ln(10) = 2.302585092994046
1/ln(10) = 0.4342944819032518
IEEE float : 0x3EDE5BD9
IEEE double: 0x3FDBCB7B1526E50E
log10(x) = 0.4342944819032518 * ln(x)

pow() X^Y according to man pages:

       X < 0 and finite, Y is finite non-integer, return NaN
         raise NEGATIVE QUANTITY RAISED TO A NON-INTEGRAL POWER
       X = 1, return 1 for any Y, even NaN
       Y = 0, return 1 for any X, even NaN
P403:  X = 0, Y > 0 and odd integer, return 0
P404:  X = 0, Y > 0 and not odd integer, return 0
P391:  X = -1, Y = +INF, return 1
P392:  X = -1, Y = -INF, return 1
       ABS(X) < 1, Y=-INF, return +INF
       ABS(X) > 1, Y=-INF, return 0
       ABS(X) < 1, Y=+INF, reutrn 0
       ABS(X) > 1, Y=+INF, return +INF
P401:  X=-INF, Y < 0 and odd integer, return 0
P402:  X=-INF, Y < 0 and non-odd integer, return 0
P393:  X=-INF, Y > 0 and odd integer, return -INF
P394:  X=-INF, Y > 0 and non-odd integer, return +INF
P397:  X=+INF, Y < 0, return 0
P398:  X=+INF, Y > 0, return +INF
P395:  X = 0, Y < 0 and odd integer, return +INF
P396:  X = 0, Y < 0 and non-odd integer, return +INF
P399:  X > 1, Y > 0 and finite, X^Y is +INF, OVERFLOW
P400:  X > 1, Y < 0 and finite, X^Y is -INF, OVERFLOW

The hyperbolic sine of x is defined mathematically as:

sinh(x) = (exp(x) - exp(-x)) / 2

The hyperbolic cosine of x is defined mathematically as:

cosh(x) = (exp(x) + exp(-x)) / 2

TANH(X) is the hyperbolic tangent of x which is defined
mathematically as:

tanh(x) = sinh(x) / cosh(x)

ASIN(X) is the principal value of the arc sine of x.
The arc sin of x is the value whose sine is x.
It returns the principal value of the arc sine of x in radians.
The return value is in the range [-pi/2, pi/2].

ACOS(X) is the principal value of the arc cosine of x.
The arc cosin of x is the value whose cosine is x.
It returns the principal value of the arc cosine of x in radians.
The return value is in the range [-pi/2, pi/2].

sec(x) = 1 / cos(x)

csc(x) = 1 / sin(x)

cot(x) = 1 / tan(x)
