[swift-evolution] Strings in Swift 4
Jordan Rose
jordan_rose at apple.com
Fri Jan 20 18:48:58 CST 2017
>>
>> Are the backing representations for String also the same types that can be exposed statically (as in the mentioned `NFCNormalizedUTF16String`)?
>
> Roughly. I think we want at least the following backing representations for String:
>
> 1. The two compressed representations used by Cocoa "tagged pointer" strings
> 2. A third "tagged pointer" representation that stores 63 bits of UTF-16 (so arbitrary UnicodeScalars and most Characters can be stored efficiently)
> 3. A known Latin-1 backing store that we can fast-path
> 4. A known UTF-16 backing store
> 5. A type-erased arbitrary (or nearly-arbitrary, if we have to accept a UTF16 subset restriction) instance of Unicode
>
> It's possible that some of the representations in the range 3...5 can be collapsed into one.
Cocoa's "tagged pointer" string actually has three representations, which external developer Mike Ash covered in detail on his blog <https://mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html>:
> Thus we can see that the structure of the tagged pointer strings is:
>
> • If the length is between 0 and 7, store the string as raw eight-bit characters.
> • If the length is 8 or 9, store the string in a six-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013bDNvwyUL2O856P-B79AFKEWV_zGJ/HYX".
> • If the length is 10 or 11, store the string in a five-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013"
None of this is currently part of Foundation's ABI, of course, and technically it wouldn't have to be part of Swift's either. The particular thing I wanted to note is that they went with UTF-8 instead of UTF-16 for the non-alphabetic representation*; burning an additional representation that can store 3 UTF-16 code units may or may not be worth it.
Jordan
* at least in 2015 when Mike Ash disassembled that particular Foundation. I'm not sure if we're allowed to share what Foundation is currently doing and if it is different.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170120/e84951bd/attachment.html>
More information about the swift-evolution
mailing list