[swift-evolution] Pre-proposal: Safer Decimal Calculations

Fri Mar 18 17:42:20 CDT 2016

First draft towards a tentative pre-proposal:
https://gist.github.com/rbrockerhoff/6874a5698bb479886e83
------

Pre-proposal: Safer Decimal Calculations
Proposal: TBD
Author(s): Rainer Brockerhoff
Status: TBD
Review manager: TBD

Quoting the “The Swift Programming Language” book: “Swift adopts safe
programming patterns…”; “Swift is friendly to new programmers”. The
words “safe” and “safety” are found many times in the book and in online
documentation. The usual rationale for safe features is, to quote a
typical sentence, “…enables you to catch and fix errors as early as
possible in the development process”.

One frequent stumbling point for both new and experienced programmers
stems from the vagaries of binary floating-point arithmetic. This
tentative pre-proposal suggests one possible way to make the dangers
somewhat more clear.

My intention here is to start a discussion on this to inform the ongoing
(and future) reasoning on extending and regularising arithmetic in Swift.

Motivation

Floating-point hardware on most platforms that run Swift — that is,
Intel and ARM CPUs — uses the binary representation forms of the IEEE
754-2008 standard. Although some few mainframes and software libraries
implement the decimal representations this is not currently leveraged by
Swift. Apple's NSDecimal and NSDecimalNumber implementation is awkward
to use in Swift, especially as standard arithmetic operators cannot be
used directly.

Although it is possible to express floating-point constants in
hexadecimal (0x123.AB) with an optional binary exponent (0x123A.Bp-4),
decimal-form floating-point constants (123.45 or 1.2345e2) are extremely
common in practice.

Unfortunately it is tempting to use floating-point arithmetic for
financial calculations or other purposes such as labelling graphical or
statistical data. Constants such as 0.1, 0.01, 0.001 and variations or
multiples thereof will certainly be used in such applications — and
almost none of these constant can be precisely represented in binary
floating-point format.

Rounding errors will therefore be introduced at the outset, causing
unexpected or outright buggy behaviour down the line which will be
surprising to the user and/or the programmer. This will often happen at
some point when the results of a calculation are compared to a constant
or to another result.

Current Solution

As things stand, Swift's default print() function, Xcode playgrounds
etc. do some discreet rounding or truncation to make the problem less
apparent - a Double initialized with the literal 0.1 prints out as 0.1
instead of the exact value of the internal representation, something
like 0.100000000000000005551115123125782702118158340454101562.

This, unfortunately, masks this underlying problem in settings such as
“toy” programs or educational playgrounds, leading programmers to be
surprised later when things won't work. A cursory search on
StackOverflow reveals tens of thousands of questions with headings like
“Is floating point math broken?".

Warning on imprecise literals

To make decimal-format floating-point literals safe, I suggest that the
compiler should emit a warning whenever a literal is used that cannot be
safely represented as an exact value of the type expected. (Note that
0.1 cannot be represented exactly as any binary floating-point type.)

The experienced programmer will, however, be willing to accept some
imprecision under circumstances that cannot be reliably determined by
the compiler. I suggest, therefore, that this acceptance be indicated by
an annotation to the literal; a form such as ~0.1 might be easiest to
read and implement, as the prefix ~ operator currently has no meaning
for a floating-point value. A “fixit” would be easily implemented to
insert the missing notation.

Conversely, to avoid inexperienced or hurried programmers to strew ~s
everywhere, it would be useful to warn, and offer to fix, if the ~ is
present but the literal does have an exact representation.

Tolerances

A parallel idea is that of tolerances, introducing an ‘epsilon’ value to
be used in comparisons. Unfortunately an effective value of the epsilon
depends on the magnitude of the operands and there are many edge cases.

Introducing a special type along the lines of “floating point with
tolerances” — using some accepted engineering notation for literals like
100.5±0.1 — might be useful for specialised applications but will not
solve this specific problem. Expanding existing constructs to accept an
optional tolerance value, as has been proposed elsewhere, may be useful
in those specific instances but not contribute to raise programmer
awareness of unsafe literals.

Full Decimal type proposal

There are cogent arguments that prior art/habits and the already complex
interactions between Double, Float, Float80 and CGFloat are best left alone.

However, there remains a need for a precise implementation of a workable
Decimal value type for financial calculations. IMHO repurposing the
existing NSDecimalNumber from Objective-C is not the best solution.

As most experienced developers know, the standard solution for financial
calculations is to internally store fixed-point values — usually but not
always in cents — and then print the “virtual” point (or decimal comma,
for the rest of us) on output.

I propose, therefore, an internal data layout like this:

UInt16 - position of the “virtual” point, starting at 0
UInt16 - data array size - 1
[Int32] - contiguous data array, little-endian order, grown as needed.
Note that both UInt16 fields being zero implies that the number is
reduced to a 32-bit Integer. Number literals in Swift can be up to 2048
bits in size, so the maximum data array size would be 64, although it
could conceivably grow beyond that. The usual cases of the virtual point
position being 0 or 2 could be aggressively optimized for normal
arithmetic operators.

Needless to say such a Decimal number would accept and represent
literals such as 0.01 with no problems. It would also serve as a BigNum
implementation for most purposes.

No doubt implementing this type in the standard library would allow for
highly optimized implementations for all major CPU platforms. In
particular, the data array should probably be [Int64] for 64-bit platforms.

Acknowledgement

Thanks to Erica Sadun for their help with an early version of this
pre-proposal.

Some references

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
https://docs.python.org/2/tutorial/floatingpoint.html
https://en.wikipedia.org/wiki/IEEE_floating_point
https://randomascii.wordpress.com/category/floating-point/
http://code.jsoftware.com/wiki/Essays/Tolerant_Comparison

-- 
Rainer Brockerhoff  <rainer at brockerhoff.net>
Belo Horizonte, Brazil
"In the affairs of others even fools are wise
In their own business even sages err."
http://brockerhoff.net/blog/