<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 24, 2017, at 8:16 PM, Zach Waldowski via swift-evolution &lt;<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><span style="font-family: Arial; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I strongly want Swift to have world-class string processing, but I believe even more strongly in the language's spirit of progressive disclosure. Newcomers to Swift's current String API find it difficult (something I personally disagree with, but that's neither here nor there); I don't think that difficulty is solved by aggressively use-specific type modeling. I instead think it gives rise to the same severe cargo-culting that gets us the scarily prevalent String.Index.init(offset:) extensions in the current model.</span></div></blockquote></div><br class=""><div class="">This cuts both ways though. In the spirit of progressive disclosure, should we complicate String’s model for users in order for it to accommodate both UTF8 and UTF16 backing stores?</div><div class=""><br class=""></div><div class="">If String can be UTF8-backed, that would mean that we could not tag the UTF16 collection view as conforming to RandomAccessCollection. That would mean you couldn’t use algorithms that relied on random access on it. It would exhibit random access characteristics sometimes &nbsp;– UTF16View.index(:offsetBy) would run in constant time when the string was backed by UTF16, but when backed by UTF8, it would run in linear time. Given, as we’ve discussed here, you need to do these kind of index calculations sometimes to interoperate with APIs that traffic in code unit offsets, what do we need to tell users about performance when they need to do it? That "it’s probably OK unless caveat caveat caveat"?</div><div class=""><br class=""></div><div class="">On the other hand, if we separate UTF8-backed strings into another type, we can keep String simple. Then for those power users who really absolutely must operate on a UTF8-backed string because of their performance needs, they have another type, which they can progressively discover when they find they need it.</div><div class=""><br class=""></div><div class="">I’m not saying this is enough to rule out UTF8-backed strings, but I don’t think “it’ll be a simpler model for most users” is the argument in favor of it.</div><div class=""><br class=""></div><div class=""><br class=""></div></body></html>