[swift-dev] State of String: ABI & Performance

Chris Lattner clattner at nondot.org
Wed Jan 10 23:31:10 CST 2018


> On Jan 10, 2018, at 9:29 PM, Chris Lattner <clattner at nondot.org> wrote:
> 
> On Jan 10, 2018, at 11:55 AM, Michael Ilseman via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>> (A gist-formatted version of this email can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f <https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f>)
> 
> I’m very very excited for this, thank you for the detailed writeup and consideration of the effects and tradeoffs involved.
> 
>> Given that ordering is not fit for human consumption, but rather machine processing, it might as well be fast. The current ordering differs on Darwin and Linux platforms, with Linux in particular suffering from poor performance due to choice of ordering (UCA with DUCET) and older versions of ICU. Instead, [String Comparison Prototype](https://github.com/apple/swift/pull/12115 <https://github.com/apple/swift/pull/12115>)  provides a simpler ordering that allows for many common-case fast-paths and optimizations. For all the Unicode enthusiasts out there, this is the lexicographical ordering of NFC-normalized UTF-16 code units.
> 
> Thank you for fixing this.  Your tradeoffs make perfect sense to me.
> 
>> ### Small String Optimization
> ..
>> For example, assuming a 16-byte String struct and 8 bits used for flags and discriminators (including discriminators for which small form a String is in), 120 bits are available for a small string payload. 120 bits can hold 7 UTF-16 code units, which is sufficient for most graphemes and many common words and separators. 120 bits can also fit 15 ASCII/UTF-8 code units without any packing, which suffices for many programmer/system strings (which have a strong skew towards ASCII).
>> 
>> We may also want a compact 5-bit encoding for formatted numbers, such as 64-bit memory addresses in hex, `Int.max` in base-10, and `Double` in base-10, which would require 18, 19, and 24 characters respectively. 120 bits with a 5-bit encoding would fit all of these. This would speed up the creation and interpolation of many strings containing numbers.
> 
> I think it is important to consider that having more special cases and different representations slows down nearly *every* operation on string because they have to check and detangle all of the possible representations.  Given the ability to hold 15 digits of ascii, I don’t see why it would be worthwhile to worry about a 5-bit representation for digits.  String should be an Any!

^ String should NOT be an Any!

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20180110/66427033/attachment.html>


More information about the swift-dev mailing list