<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div><blockquote type="cite" class=""><div dir="auto" class=""><div class=""><div class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""> Are the backing representations for String also the same types that can be exposed statically (as in the mentioned `NFCNormalizedUTF16String`)?</div></div></div></blockquote><div class=""><br class=""></div>Roughly. I think we want at least the following backing representations for String:</div><div class=""><br class=""></div><div class="">1. The two compressed representations used by Cocoa "tagged pointer" strings</div><div class="">2. A third "tagged pointer" representation that stores 63 bits of UTF-16 (so arbitrary UnicodeScalars and most Characters can be stored efficiently)</div><div class="">3. A known Latin-1 backing store that we can fast-path</div><div class="">4. A known UTF-16 backing store</div><div class="">5. A type-erased arbitrary (or nearly-arbitrary, if we have to accept a UTF16 subset restriction) instance of Unicode</div><div class=""><br class=""></div><div class="">It's possible that some of the representations in the range 3...5 can be collapsed into one.</div></div></div></blockquote><br class=""></div><div>Cocoa's "tagged pointer" string actually has <i class="">three</i> representations, which external developer Mike Ash <a href="https://mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html" class="">covered in detail on his blog</a>:</div><div><br class=""></div><div><blockquote type="cite" class="">Thus we can see that the structure of the tagged pointer strings is:<br class=""><br class=""><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>• If the length is between 0 and 7, store the string as raw eight-bit characters.<br class=""></div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>• If the length is 8 or 9, store the string in a six-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013bDNvwyUL2O856P-B79AFKEWV_zGJ/HYX".<br class=""></div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>• If the length is 10 or 11, store the string in a five-bit encoding, using the alphabet "eilotrm.apdnsIc ufkMShjTRxgC4013"</div></blockquote><br class=""></div><div>None of this is currently part of Foundation's ABI, of course, and technically it wouldn't have to be part of Swift's either. The particular thing I wanted to note is that they went with UTF-8 instead of UTF-16 for the non-alphabetic representation*; burning an additional representation that can store 3 UTF-16 code units may or may not be worth it.</div><div><br class=""></div><div>Jordan</div><div><br class=""></div><div>* at least in 2015 when Mike Ash disassembled that particular Foundation. I'm not sure if we're allowed to share what Foundation is currently doing and if it is different.</div></body></html>