<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">The ultimate model of strings is going to be complicated whether or not it’s on String itself, although I argue that regardless of that complexity, Swift inherently starts from a much better place than f.ex. Java from just having Array vs. 30 different Array-like things. That dovetails into the point I was trying to make up-thread, which is that complicating the overall type space to serve specific use cases practically results in less-experienced users not knowing about or not finding it, even when they need to. Furthermore, “use UTF8String when you need to to be super-fast (and don’t we all want to be super fast???)” is the kind of cargo-culting that sticks, not “when caveats A, B, C, and D apply and you want to be fast and you’ve considered all the Unicode implications and when the optimizer breaks down and you have observed a performance problem you should consider etc etc etc”.</div><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 25, 2017, at 4:21 PM, Ben Cohen &lt;<a href="mailto:ben_cohen@apple.com" class="">ben_cohen@apple.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Jan 24, 2017, at 8:16 PM, Zach Waldowski via swift-evolution &lt;<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><span style="font-family: Arial; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I strongly want Swift to have world-class string processing, but I believe even more strongly in the language's spirit of progressive disclosure. Newcomers to Swift's current String API find it difficult (something I personally disagree with, but that's neither here nor there); I don't think that difficulty is solved by aggressively use-specific type modeling. I instead think it gives rise to the same severe cargo-culting that gets us the scarily prevalent String.Index.init(offset:) extensions in the current model.</span></div></blockquote></div><br class=""><div class="">This cuts both ways though. In the spirit of progressive disclosure, should we complicate String’s model for users in order for it to accommodate both UTF8 and UTF16 backing stores?</div><div class=""><br class=""></div><div class="">If String can be UTF8-backed, that would mean that we could not tag the UTF16 collection view as conforming to RandomAccessCollection. That would mean you couldn’t use algorithms that relied on random access on it. It would exhibit random access characteristics sometimes &nbsp;– UTF16View.index(:offsetBy) would run in constant time when the string was backed by UTF16, but when backed by UTF8, it would run in linear time. Given, as we’ve discussed here, you need to do these kind of index calculations sometimes to interoperate with APIs that traffic in code unit offsets, what do we need to tell users about performance when they need to do it? That "it’s probably OK unless caveat caveat caveat"?</div><div class=""><br class=""></div><div class="">On the other hand, if we separate UTF8-backed strings into another type, we can keep String simple. Then for those power users who really absolutely must operate on a UTF8-backed string because of their performance needs, they have another type, which they can progressively discover when they find they need it.</div><div class=""><br class=""></div><div class="">I’m not saying this is enough to rule out UTF8-backed strings, but I don’t think “it’ll be a simpler model for most users” is the argument in favor of it.</div><div class=""><br class=""></div><div class=""><br class=""></div></div></div></blockquote></div><br class=""></body></html>