[swift-evolution] Pre-proposal: Safer Decimal Calculations

Thu Mar 24 07:54:05 CDT 2016

On Mar 23, 2016, at 5:26 AM, Rainer Brockerhoff via swift-evolution <swift-evolution at swift.org> wrote:

> On 3/22/16 23:20, Michael Gottesman via swift-evolution wrote:
>> 
>>> On Mar 18, 2016, at 3:42 PM, Rainer Brockerhoff via swift-evolution <swift-evolution at swift.org> wrote:
>>> 
>>> First draft towards a tentative pre-proposal:
>>> https://gist.github.com/rbrockerhoff/6874a5698bb479886e83
>>> ------
>>> 
>>> Pre-proposal: Safer Decimal Calculations
>>> Proposal: TBD
>>> Author(s): Rainer Brockerhoff
>>> Status: TBD
>>> Review manager: TBD
>>> ...
>>> Full Decimal type proposal
>>> 
>>> There are cogent arguments that prior art/habits and the already complex
>>> interactions between Double, Float, Float80 and CGFloat are best left alone.
>>> 
>>> However, there remains a need for a precise implementation of a workable
>>> Decimal value type for financial calculations. IMHO repurposing the
>>> existing NSDecimalNumber from Objective-C is not the best solution.
>>> 
>>> As most experienced developers know, the standard solution for financial
>>> calculations is to internally store fixed-point values — usually but not
>>> always in cents — and then print the “virtual” point (or decimal comma,
>>> for the rest of us) on output.
>>> 
>>> I propose, therefore, an internal data layout like this:
>>> 
>>> UInt16 - position of the “virtual” point, starting at 0
>>> UInt16 - data array size - 1
>>> [Int32] - contiguous data array, little-endian order, grown as needed.
>>> Note that both UInt16 fields being zero implies that the number is
>>> reduced to a 32-bit Integer. Number literals in Swift can be up to 2048
>>> bits in size, so the maximum data array size would be 64, although it
>>> could conceivably grow beyond that. The usual cases of the virtual point
>>> position being 0 or 2 could be aggressively optimized for normal
>>> arithmetic operators.
>>> 
>>> Needless to say such a Decimal number would accept and represent
>>> literals such as 0.01 with no problems. It would also serve as a BigNum
>>> implementation for most purposes.
>>> 
>>> No doubt implementing this type in the standard library would allow for
>>> highly optimized implementations for all major CPU platforms. In
>>> particular, the data array should probably be [Int64] for 64-bit platforms.
>> 
>> Rainer: I quickly skimmed this. Just to make sure I am understanding 100%: you are proposing a fixed point decimal calculation or a floating point decimal calculation. The former, no?
> 
> Right, fixed-point. (NSDecimalNumber is decimal floating-point, of course).

What you’re describing is actually not a fixed-point format at all, but rather a variant of what IEEE 754 calls an “extendable precision [floating-point] format”.

A fixed-point format has a *fixed* radix point (or scale) determined by the type.  A couple examples of common fixed point formats are:

- 8 bit pixel formats in imaging, which use a fixed scale of 1/255 so that 0xff encodes 1.0 and 0x00 encodes 0.0.

- “Q15” or “Q1.15”, a fairly ubiquitous format in signal processing, which uses 16-bit signed integers with a fixed scale of 2**-15, so that 0x8000 encodes -1.0 and 0x7fff encodes 0.999969482421875.

Your proposed format, by contrast encodes the radix point / scale as part of the number; instead of being constant for all values of the type it “floats”, making it a floating-point format.  Now that terminology is squared away, let’s look at what IEEE 754 says about such formats.  We don’t necessarily need to follow the guidelines, but we need a good reason if we’re going to do something different.  I’ve included some explanatory commentary inline with the standard text:

> These formats are characterized by the parameters b, p, and emax, which may match those of an interchange format and shall:

b here is the radix or base; in your format b is 10 (decimal).
p is the precision, or the number of (base-10) digits that are stored.
emax is the largest allowed (finite) encoded exponent (the exponent bias and hence minimum exponent then fall out via formulas; the exponent range is approximately symmetric about zero).

> 		• ―  provide all the representations of floating-point data defined in terms of those parameters in 3.2 and 3.3

This just says that you should be able to represent +/-0, +/-infinity, quiet and signaling NaNs as well as finite values.

> 		• ―  provide all the operations of this standard, as defined in Clause 5, for that format.

This says that you should provide all the "basic operations”: round to integral in the various modes, nextup, nextdown, remainder, min, max, quantize, scale and log, addition, subtraction, multiplication, division, square root, fused multiply add, conversions to and from other formats and strings, comparisons, abs, copysign, and a few other things.  The exact list isn’t terribly important.  It’s worth noting that I’ll be trying to drive the FloatingPoint protocol to match these requirements, so we can really just say “your type should conform to FloatingPoint”.

> This standard does not require an implementation to provide any extended or extendable precision format. Any encodings for these formats are implementation-defined, but should be fixed width and may match those of an interchange format.

This just says that such types are optional; languages don’t need to have them to claim IEEE 754 support.

> Language standards should define mechanisms supporting extendable precision for each supported radix. Language standards supporting extendable precision shall permit users to specify p and emax. Language standards shall also allow the specification of an extendable precision by specifying p alone; in this case emax shall be defined by the language standard to be at least 1000×p when p is ≥ 237 bits in a binary format or p is ≥ 51 digits in a decimal format. 

This says that users should be able to define a number in this format just by the precision, leaving the implementation to choose the exponent range.  In practice, this means that you’ll want to have a 32- or 64-bit exponent field; 16 bits isn’t sufficient.  I would suggest that the swifty thing is to use Int for the exponent field and size both.

The usual thing would be to use a sign-magnitude representation (where the significand is unsigned and the signbit is tracked separately), rather than a twos-complement significand.  It just works out more nicely if you can treat all the words in the significand the same way.

To the IEEE 754 recommendations, it sounds like you would want to add a policy of either growing the precision to keep results exact when possible, or indicating an error when the result is not exact.  How do you propose to handle division / square root, where the results are essentially never finitely representable?

– Steve