<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 20, 2017, at 8:28 AM, Dave Abrahams <<a href="mailto:dabrahams@apple.com" class="">dabrahams@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="content-type" content="text/html; charset=utf-8" class=""><div dir="auto" class=""><div class=""><span class=""></span></div><div class=""><span class=""></span><br class=""><span class=""></span><br class=""><span class="">Sent from my iPad</span><br class=""><span class=""></span><br class=""><div class=""><br class=""><br class="">Sent from my iPad</div><blockquote type="cite" class=""><span class="">On Jan 20, 2017, at 5:48 AM, Jonathan Hull <<a href="mailto:jhull@gbis.com" class="">jhull@gbis.com</a>> wrote:</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class="">Thanks for all the hard work!</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class="">Still digesting, but I definitely support the goal of string processing even better than Perl. Some random thoughts:</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class="">• I also like the suggestion of implicit conversion from substring slices to strings based on a subtype relationship, since I keep running into that issue when trying to use array slices. </span><br class=""></blockquote><span class=""></span><br class=""><span class="">Interesting. Could you offer some examples?</span><br class=""></div></div></div></blockquote><div><br class=""></div><div>Nothing catastrophic. Mainly just having to wrap all of my slices in Array() to actually use them, which obfuscates the purpose of my code. It also took me an embarrassingly long time to figure out that was what I had to do to make it work. For the longest time, I couldn’t understand why anyone would use slices because I couldn’t actually use them with any API… and then someone mentioned wrapping it in Array() here on Evolution and I finally got it. Though it still feels like internal details that I shouldn’t have to worry about leaking out...</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><span class=""></span><blockquote type="cite" class=""><span class="">It would be nice to be able to specify that conversion behavior with other types that have a similar subtype relationship.</span><br class=""></blockquote><span class=""></span><br class=""><span class="">Indeed.</span><br class=""><span class=""></span><br class=""><blockquote type="cite" class=""><span class="">• One thing that stood out was the interpolation format syntax, which seemed a bit convoluted and difficult to parse:</span><br class=""></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">"Something with leading zeroes: \(x.format(fill: zero, width:8))"</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class="">Have you considered treating the interpolation parenthesis more like the function call syntax? It should be a familiar pattern and easily parseable to someone versed in other areas of swift:</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class=""> “Something with leading zeroes: \(x, fill: .zero, width: 8)"</span><br class=""></blockquote><span class=""></span><br class=""><span class="">Yes, we've considered it</span></div><div class=""><br class=""></div><div class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""> 1. "\(f(expr1, label2: expr2, label3: expr3))" <br class=""><br class=""> String(describing: f(expr1, label2: expr2, label3: expr3))<br class=""><br class=""> 2. "\(expr0 + expr1(label2: expr2, label3: expr3))"<br class=""><br class=""> String(describing: expr0 + expr1(label2: expr2, label3: expr3)<br class=""><br class=""> 3. "\((expr1, label2: expr2, label3: expr3))"<br class=""><br class=""> String(describing: (expr1, label2: expr2, label3: expr3))<br class=""><br class=""> 4. "\(expr1, label2: expr2, label3: expr3)"<br class=""><br class=""> String(describing: expr1, label2: expr2, label3: expr3)<br class=""><br class="">I think I'm primarily concerned with the differences among cases 1, 3,<br class="">and 4, which are extremely minor. 3 and 4 differ by just a set of<br class="">parentheses, though that might be mitigated by the ${...} suggestion someone else posted. The point of using string interpolation is to improve<br class="">readability, and I fear these cases make too many things look alike that<br class="">have very different meanings. Using a common term like "format" calls<br class="">out what is being done.<br class=""><br class="">It's possible to produce terser versions of the syntax that don't suffer<br class="">from this problem by using a dedicated operator:<br class=""><br class=""> "Column 1: \(n⛄(radix:16, width:8)) *** \(message)"<br class=""> "Something with leading zeroes: \(x⛄(fill: zero, width:8))"<br class=""><br class="">or even<br class=""><br class=""> "Column 1: \(n⛄radix:16⛄width:8) *** \(message)"<br class=""> "Something with leading zeroes: \(x⛄fill:zero⛄width:8)”</span></div></div></div></blockquote><div><br class=""></div><div>There is still too much going on here to be readable, though I suppose you could put part of it on a previous line. I really like Joe’s suggestion of how to handle this using an ExpressableByStringInterpolation protocol + sugar. One of my favorite things about the current \() is how readable it makes my strings compared to every other language I have used! I definitely want to make sure we keep that advantage...</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class="">I think that should work for the common cases (e.g. padding, truncating, and alignment), with string-returning methods on the type (or even formatting objects ala NSNumberFormatter) being used for more exotic formatting needs (e.g. outputting a number as Hex instead of Decimal)</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class="">• Have you considered having an explicit .machine locale which means that the function should treat the string as machine readable? (as opposed to the lack of a locale)</span><br class=""></blockquote><div class=""><br class=""></div>No, we hadn't. What would be the goal of such a design?</div></div></div></blockquote><div><br class=""></div><div>Just to be able to explicitly spell the desired behavior (as opposed to only being able to rely on the lack of something). The end usage would most likely be the same in most places as .machine could probably be the default value of the ‘locale’ parameter. However, it would allow me to express my intention to other programmers (and future Jon) that this function is explicitly handling the string as machine readable. It is a difference in feeling and expression as opposed to a difference in ability.</div><div><br class=""></div><div>Also, it may help future proof a bit. If we do end up adding the concept of “human readable” strings down the line, one could see the current locale being a really good default for them. There may still be times where you want to override that and treat them as machine readable for one reason or another. Having .machine allows you to make that override explicitly, while still providing the reasonable default.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><blockquote type="cite" class=""><span class=""></span></blockquote><blockquote type="cite" class=""><span class="">• I almost feel like the machine readableness vs human readableness of a string is information that should travel with the string itself. It would be nice to have an extremely terse way to specify that a string is localizable (strawman syntax below), and that might also classify the string as human readable.</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class=""> let myLocalizedStr = $”This is localizable” //This gets used as the comment in the localization file</span></blockquote><div class=""><br class=""></div>Yes, there are also arguments for encoding "human readable" in the type system. But as noted in <a href="https://github.com/apple/swift/blob/master/docs/StringManifesto.md#future-directions" class="">https://github.com/apple/swift/blob/master/docs/StringManifesto.md#future-directions</a> those ideas are scoped out of Swift 4.</div></div></div></blockquote><div><br class=""></div><div>Yes, I could see that too. I guess the main question is: Do we want to be able to add methods that apply ONLY to human readable strings? If so, then encoding it in the type system makes more sense.</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><blockquote type="cite" class=""><span class=""></span></blockquote><blockquote type="cite" class=""><span class="">• Looking forward to RegEx literals!</span></blockquote><blockquote type="cite" class=""><span class="">Thanks,</span><br class=""></blockquote><blockquote type="cite" class=""><span class="">Jon</span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">On Jan 19, 2017, at 6:56 PM, Ben Cohen via swift-evolution <<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>> wrote:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Hi all,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Below is our take on a design manifesto for Strings in Swift 4 and beyond.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Probably best read in rendered markdown on GitHub:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><a href="https://github.com/apple/swift/blob/master/docs/StringManifesto.md" class="">https://github.com/apple/swift/blob/master/docs/StringManifesto.md</a></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We’re eager to hear everyone’s thoughts.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Regards,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Ben and Dave</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""># String Processing For Swift 4</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Authors: [Dave Abrahams](<a href="https://github.com/dabrahams" class="">https://github.com/dabrahams</a>), [Ben Cohen](<a href="https://github.com/airspeedswift" class="">https://github.com/airspeedswift</a>)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The goal of re-evaluating Strings for Swift 4 has been fairly ill-defined thus</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">far, with just this short blurb in the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[list of goals](<a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025676.html" class="">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025676.html</a>):</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**String re-evaluation**: String is one of the most important fundamental</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">types in the language. The standard library leads have numerous ideas of how</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">to improve the programming model for it, without jeopardizing the goals of</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">providing a unicode-correct-by-default model. Our goal is to be better at</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">string processing than Perl!</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">For Swift 4 and beyond we want to improve three dimensions of text processing:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. Ergonomics</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. Correctness</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">3. Performance</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This document is meant to both provide a sense of the long-term vision </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">(including undecided issues and possible approaches), and to define the scope of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">work that could be done in the Swift 4 timeframe.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">## General Principles</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Ergonomics</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">It's worth noting that ergonomics and correctness are mutually-reinforcing. An</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">API that is easy to use—but incorrectly—cannot be considered an ergonomic</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">success. Conversely, an API that's simply hard to use is also hard to use</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">correctly. Acheiving optimal performance without compromising ergonomics or</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">correctness is a greater challenge.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Consistency with the Swift language and idioms is also important for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">ergonomics. There are several places both in the standard library and in the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">foundation additions to `String` where patterns and practices found elsewhere</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">could be applied to improve usability and familiarity.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### API Surface Area</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Primary data types such as `String` should have APIs that are easily understood</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">given a signature and a one-line summary. Today, `String` fails that test. As</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">you can see, the Standard Library and Foundation both contribute significantly to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">its overall complexity.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**Method Arity** | **Standard Library** | **Foundation**</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">---|:---:|:---:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">0: `ƒ()` | 5 | 7</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1: `ƒ(:)` | 19 | 48</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2: `ƒ(::)` | 13 | 19</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">3: `ƒ(:::)` | 5 | 11</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">4: `ƒ(::::)` | 1 | 7</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">5: `ƒ(:::::)` | - | 2</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">6: `ƒ(::::::)` | - | 1</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**API Kind** | **Standard Library** | **Foundation**</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">---|:---:|:---:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`init` | 41 | 18</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`func` | 42 | 55</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`subscript` | 9 | 0</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`var` | 26 | 14</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**Total: 205 APIs**</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">By contrast, `Int` has 80 APIs, none with more than two parameters.[0] String processing is complex enough; users shouldn't have</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">to press through physical API sprawl just to get started.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Many of the choices detailed below contribute to solving this problem,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">including:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Restoring `Collection` conformance and dropping the `.characters` view.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Providing a more general, composable slicing syntax.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Altering `Comparable` so that parameterized</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> (e.g. case-insensitive) comparison fits smoothly into the basic syntax.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Clearly separating language-dependent operations on text produced </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> by and for humans from language-independent</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> operations on text produced by and for machine processing.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Relocating APIs that fall outside the domain of basic string processing and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> discouraging the proliferation of ad-hoc extensions.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Batteries Included</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">While `String` is available to all programs out-of-the-box, crucial APIs for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">basic string processing tasks are still inaccessible until `Foundation` is</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">imported. While it makes sense that `Foundation` is needed for domain-specific</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">jobs such as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[linguistic tagging](<a href="https://developer.apple.com/reference/foundation/nslinguistictagger" class="">https://developer.apple.com/reference/foundation/nslinguistictagger</a>),</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">one should not need to import anything to, for example, do case-insensitive</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">comparison.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Unicode Compliance and Platform Support</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The Unicode standard provides a crucial objective reference point for what</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">constitutes correct behavior in an extremely complex domain, so</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Unicode-correctness is, and will remain, a fundamental design principle behind</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Swift's `String`. That said, the Unicode standard is an evolving document, so</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">this objective reference-point is not fixed.[1] While</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">many of the most important operations—e.g. string hashing, equality, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">non-localized comparison—will be stable, the semantics</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of others, such as grapheme breaking and localized comparison and case</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">conversion, are expected to change as platforms are updated, so programs should</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">be written so their correctness does not depend on precise stability of these</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">semantics across OS versions or platforms. Although it may be possible to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">imagine static and/or dynamic analysis tools that will help users find such</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">errors, the only sure way to deal with this fact of life is to educate users.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">## Design Points</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Internationalization</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">There is strong evidence that developers cannot determine how to use</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">internationalization APIs correctly. Although documentation could and should be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">improved, the sheer size, complexity, and diversity of these APIs is a major</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">contributor to the problem, causing novices to tune out, and more experienced</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">programmers to make avoidable mistakes.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The first step in improving this situation is to regularize all localized</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">operations as invocations of normal string operations with extra</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">parameters. Among other things, this means:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. Doing away with `localizedXXX` methods </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. Providing a terse way to name the current locale as a parameter</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">3. Automatically adjusting defaults for options such</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> as case sensitivity based on whether the operation is localized.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">4. Removing correctness traps like `localizedCaseInsensitiveCompare` (see</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> guidance in the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> [Internationalization and Localization Guide](<a href="https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPInternational/InternationalizingYourCode/InternationalizingYourCode.html" class="">https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPInternational/InternationalizingYourCode/InternationalizingYourCode.html</a>).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Along with appropriate documentation updates, these changes will make localized</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">operations more teachable, comprehensible, and approachable, thereby lowering a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">barrier that currently leads some developers to ignore localization issues</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">altogether.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### The Default Behavior of `String`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Although this isn't well-known, the most accessible form of many operations on</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Swift `String` (and `NSString`) are really only appropriate for text that is</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">intended to be processed for, and consumed by, machines. The semantics of the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">operations with the simplest spellings are always non-localized and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">language-agnostic.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Two major factors play into this design choice:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. Machine processing of text is important, so we should have first-class,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> accessible functions appropriate to that use case.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. The most general localized operations require a locale parameter not required</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> by their un-localized counterparts. This naturally skews complexity towards</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> localized operations.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Reaffirming that `String`'s simplest APIs have</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">language-independent/machine-processed semantics has the benefit of clarifying</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the proper default behavior of operations such as comparison, and allows us to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">make [significant optimizations](#collation-semantics) that were previously</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">thought to conflict with Unicode.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Future Directions</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">One of the most common internationalization errors is the unintentional</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">presentation to users of text that has not been localized, but regularizing APIs</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">and improving documentation can go only so far in preventing this error.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Combined with the fact that `String` operations are non-localized by default,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the environment for processing human-readable text may still be somewhat</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">error-prone in Swift 4.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">For an audience of mostly non-experts, it is especially important that naïve</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">code is very likely to be correct if it compiles, and that more sophisticated</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">issues can be revealed progressively. For this reason, we intend to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">specifically and separately target localization and internationalization</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">problems in the Swift 5 timeframe.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Operations With Options</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">There are three categories of common string operation that commonly need to be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">tuned in various dimensions:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**Operation**|**Applicable Options**</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">---|---</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">sort ordering | locale, case/diacritic/width-insensitivity</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">case conversion | locale</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">pattern matching | locale, case/diacritic/width-insensitivity</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The defaults for case-, diacritic-, and width-insensitivity are different for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">localized operations than for non-localized operations, so for example a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">localized sort should be case-insensitive by default, and a non-localized sort</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">should be case-sensitive by default. We propose a standard “language” of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">defaulted parameters to be used for these purposes, with usage roughly like this:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">x.compared(to: y, case: .sensitive, in: swissGerman)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">x.lowercased(in: .currentLocale)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">x.allMatches(</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> somePattern, case: .insensitive, diacritic: .insensitive)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This usage might be supported by code like this:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">enum StringSensitivity {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">case sensitive</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">case insensitive</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension Locale {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">static var currentLocale: Locale { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension Unicode {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// An example of the option language in declaration context,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// with nil defaults indicating unspecified, so defaults can be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// driven by the presence/absence of a specific Locale</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func frobnicated(</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> case caseSensitivity: StringSensitivity? = nil,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> diacritic diacriticSensitivity: StringSensitivity? = nil,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> width widthSensitivity: StringSensitivity? = nil,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> in locale: Locale? = nil</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">) -> Self { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Comparing and Hashing Strings</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Collation Semantics</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">What Unicode says about collation—which is used in `<`, `==`, and hashing— turns</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">out to be quite interesting, once you pick it apart. The full Unicode Collation</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Algorithm (UCA) works like this:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. Fully normalize both strings</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. Convert each string to a sequence of numeric triples to form a collation key</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">3. “Flatten” the key by concatenating the sequence of first elements to the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> sequence of second elements to the sequence of third elements</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">4. Lexicographically compare the flattened keys </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">While step 1 can usually</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">be [done quickly](<a href="http://unicode.org/reports/tr15/#Description_Norm" class="">http://unicode.org/reports/tr15/#Description_Norm</a>) and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">incrementally, step 2 uses a collation table that maps matching *sequences* of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">unicode scalars in the normalized string to *sequences* of triples, which get</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">accumulated into a collation key. Predictably, this is where the real costs</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">lie.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">*However*, there are some bright spots to this story. First, as it turns out,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">string sorting (localized or not) should be done down to what's called</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[“identical” level](<a href="http://unicode.org/reports/tr10/#Multi_Level_Comparison" class="">http://unicode.org/reports/tr10/#Multi_Level_Comparison</a>),</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">which adds a step 3a: append the string's normalized form to the flattened</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">collation key. At first blush this just adds work, but consider what it does</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">for equality: two strings that normalize the same, naturally, will collate the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">same. But also, *strings that normalize differently will always collate</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">differently*. In other words, for equality, it is sufficient to compare the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">strings' normalized forms and see if they are the same. We can therefore</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">entirely skip the expensive part of collation for equality comparison.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Next, naturally, anything that applies to equality also applies to hashing: it</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">is sufficient to hash the string's normalized form, bypassing collation keys.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This should provide significant speedups over the current implementation.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Perhaps more importantly, since comparison down to the “identical” level applies</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">even to localized strings, it means that hashing and equality can be implemented</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">exactly the same way for localized and non-localized text, and hash tables with</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">localized keys will remain valid across current-locale changes.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Finally, once it is agreed that the *default* role for `String` is to handle</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">machine-generated and machine-readable text, the default ordering of `String`s</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">need no longer use the UCA at all. It is sufficient to order them in any way</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that's consistent with equality, so `String` ordering can simply be a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">lexicographical comparison of normalized forms,[4]</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">(which is equivalent to lexicographically comparing the sequences of grapheme</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">clusters), again bypassing step 2 and offering another speedup.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This leaves us executing the full UCA *only* for localized sorting, and ICU's</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">implementation has apparently been very well optimized.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Following this scheme everywhere would also allow us to make sorting behavior</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">consistent across platforms. Currently, we sort `String` according to the UCA,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">except that—*only on Apple platforms*—pairs of ASCII characters are ordered by</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">unicode scalar value.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Syntax</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Because the current `Comparable` protocol expresses all comparisons with binary</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">operators, string comparisons—which may require</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">additional [options](#operations-with-options)—do not fit smoothly into the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">existing syntax. At the same time, we'd like to solve other problems with</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">comparison, as outlined</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[this proposal](<a href="https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e" class="">https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e</a>)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">(implemented by changes at the head</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[this branch](<a href="https://github.com/CodaFi/swift/commits/space-the-final-frontier" class="">https://github.com/CodaFi/swift/commits/space-the-final-frontier</a>)).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We should adopt a modification of that proposal that uses a method rather than</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">an operator `<=>`:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">enum SortOrder { case before, same, after }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">protocol Comparable : Equatable {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func compared(to: Self) -> SortOrder</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">...</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This change will give us a syntactic platform on which to implement methods with</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">additional, defaulted arguments, thereby unifying and regularizing comparison</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">across the library.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension String {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func compared(to: Self) -> SortOrder</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**Note:** `SortOrder` should bridge to `NSComparisonResult`. It's also possible</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that the standard library simply adopts Foundation's `ComparisonResult` as is,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">but we believe the community should at least consider alternate naming before</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that happens. There will be an opportunity to discuss the choices in detail</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">when the modified</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[Comparison Proposal](<a href="https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e" class="">https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e</a>) comes</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">up for review.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### `String` should be a `Collection` of `Character`s Again</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In Swift 2.0, `String`'s `Collection` conformance was dropped, because we</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">convinced ourselves that its semantics differed from those of `Collection` too</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">significantly.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">It was always well understood that if strings were treated as sequences of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`UnicodeScalar`s, algorithms such as `lexicographicalCompare`, `elementsEqual`,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">and `reversed` would produce nonsense results. Thus, in Swift 1.0, `String` was</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">a collection of `Character` (extended grapheme clusters). During 2.0</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">development, though, we realized that correct string concatenation could</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">occasionally merge distinct grapheme clusters at the start and end of combined</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">strings.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This quirk aside, every aspect of strings-as-collections-of-graphemes appears to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">comport perfectly with Unicode. We think the concatenation problem is tolerable,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">because the cases where it occurs all represent partially-formed constructs. The</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">largest class—isolated combining characters such as ◌́ (U+0301 COMBINING ACUTE</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">ACCENT)—are explicitly called out in the Unicode standard as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">“[degenerate](<a href="http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries" class="">http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries</a>)” or</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">“[defective](<a href="http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf" class="">http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf</a>)”. The other</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">cases—such as a string ending in a zero-width joiner or half of a regional</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">indicator—appear to be equally transient and unlikely outside of a text editor.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Admitting these cases encourages exploration of grapheme composition and is</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">consistent with what appears to be an overall Unicode philosophy that “no</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">special provisions are made to get marginally better behavior for… cases that</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">never occur in practice.”[2] Furthermore, it seems</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">unlikely to disturb the semantics of any plausible algorithms. We can handle</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">these cases by documenting them, explicitly stating that the elements of a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`String` are an emergent property based on Unicode rules.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The benefits of restoring `Collection` conformance are substantial: </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Collection-like operations encourage experimentation with strings to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> investigate and understand their behavior. This is useful for teaching new</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> programmers, but also good for experienced programmers who want to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> understand more about strings/unicode.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Extended grapheme clusters form a natural element boundary for Unicode</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> strings. For example, searching and matching operations will always produce</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> results that line up on grapheme cluster boundaries.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Character-by-character processing is a legitimate thing to do in many real</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> use-cases, including parsing, pattern matching, and language-specific</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> transformations such as transliteration.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* `Collection` conformance makes a wide variety of powerful operations</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> available that are appropriate to `String`'s default role as the vehicle for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> machine processed text.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> The methods `String` would inherit from `Collection`, where similar to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> higher-level string algorithms, have the right semantics. For example,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> grapheme-wise `lexicographicalCompare`, `elementsEqual`, and application of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `flatMap` with case-conversion, produce the same results one would expect</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> from whole-string ordering comparison, equality comparison, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> case-conversion, respectively. `reverse` operates correctly on graphemes,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> keeping diacritics moored to their base characters and leaving emoji intact.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> Other methods such as `indexOf` and `contains` make obvious sense. A few</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `Collection` methods, like `min` and `max`, may not be particularly useful</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> on `String`, but we don't consider that to be a problem worth solving, in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> the same way that we wouldn't try to suppress `min` and `max` on a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `Set([UInt8])` that was used to store IP addresses.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Many of the higher-level operations that we want to provide for `String`s,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> such as parsing and pattern matching, should apply to any `Collection`, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> many of the benefits we want for `Collections`, such</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> as unified slicing, should accrue</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> equally to `String`. Making `String` part of the same protocol hierarchy</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> allows us to write these operations once and not worry about keeping the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> benefits in sync.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Slicing strings into substrings is a crucial part of the vocabulary of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> string processing, and all other sliceable things are `Collection`s.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> Because of its collection-like behavior, users naturally think of `String`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> in collection terms, but run into frustrating limitations where it fails to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> conform and are left to wonder where all the differences lie. Many simply</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> “correct” this limitation by declaring a trivial conformance:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension String : BidirectionalCollection {}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> Even if we removed indexing-by-element from `String`, users could still do</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> this:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> extension String : BidirectionalCollection {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> subscript(i: Index) -> Character { return characters[i] }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> It would be much better to legitimize the conformance to `Collection` and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> simply document the oddity of any concatenation corner-cases, than to deny</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> users the benefits on the grounds that a few cases are confusing.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Note that the fact that `String` is a collection of graphemes does *not* mean</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that string operations will necessarily have to do grapheme boundary</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">recognition. See the Unicode protocol section for details.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### `Character` and `CharacterSet`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Character`, which represents a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Unicode</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[extended grapheme cluster](<a href="http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries" class="">http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries</a>),</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">is a bit of a black box, requiring conversion to `String` in order to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">do any introspection, including interoperation with ASCII. To fix this, we should:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Add a `unicodeScalars` view much like `String`'s, so that the sub-structure</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> of grapheme clusters is discoverable.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Add a failable `init` from sequences of scalars (returning nil for sequences</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> that contain 0 or 2+ graphemes).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- (Lower priority) expose some operations, such as `func uppercase() -></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> String`, `var isASCII: Bool`, and, to the extent they can be sensibly</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> generalized, queries of unicode properties that should also be exposed on</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `UnicodeScalar` such as `isAlphabetic` and `isGraphemeBase` .</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Despite its name, `CharacterSet` currently operates on the Swift `UnicodeScalar`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">type. This means it is usable on `String`, but only by going through the unicode</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">scalar view. To deal with this clash in the short term, `CharacterSet` should be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">renamed to `UnicodeScalarSet`. In the longer term, it may be appropriate to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">introduce a `CharacterSet` that provides similar functionality for extended</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">grapheme clusters.[5]</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Unification of Slicing Operations</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Creating substrings is a basic part of String processing, but the slicing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">operations that we have in Swift are inconsistent in both their spelling and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">their naming: </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Slices with two explicit endpoints are done with subscript, and support</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> in-place mutation:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> s[i..<j].mutate()</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Slicing from an index to the end, or from the start to an index, is done</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> with a method and does not support in-place mutation:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> s.prefix(upTo: i).readOnly()</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Prefix and suffix operations should be migrated to be subscripting operations</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">with one-sided ranges i.e. `s.prefix(upTo: i)` should become `s[..<i]`, as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[this proposal](<a href="https://github.com/apple/swift-evolution/blob/9cf2685293108ea3efcbebb7ee6a8618b83d4a90/proposals/0132-sequence-end-ops.md" class="">https://github.com/apple/swift-evolution/blob/9cf2685293108ea3efcbebb7ee6a8618b83d4a90/proposals/0132-sequence-end-ops.md</a>).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">With generic subscripting in the language, that will allow us to collapse a wide</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">variety of methods and subscript overloads into a single implementation, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">give users an easy-to-use and composable way to describe subranges.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Further extending this EDSL to integrate use-cases like `s.prefix(maxLength: 5)`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">is an ongoing research project that can be considered part of the potential</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">long-term vision of text (and collection) processing.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Substrings</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">When implementing substring slicing, languages are faced with three options:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. Make the substrings the same type as string, and share storage.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. Make the substrings the same type as string, and copy storage when making the substring.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">3. Make substrings a different type, with a storage copy on conversion to string.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We think number 3 is the best choice. A walk-through of the tradeoffs follows.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Same type, shared storage</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In Swift 3.0, slicing a `String` produces a new `String` that is a view into a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">subrange of the original `String`'s storage. This is why `String` is 3 words in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">size (the start, length and buffer owner), unlike the similar `Array` type</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">which is only one.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This is a simple model with big efficiency gains when chopping up strings into</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">multiple smaller strings. But it does mean that a stored substring keeps the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">entire original string buffer alive even after it would normally have been</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">released.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This arrangement has proven to be problematic in other programming languages,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">because applications sometimes extract small strings from large ones and keep</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">those small strings long-term. That is considered a memory leak and was enough</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of a problem in Java that they changed from substrings sharing storage to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">making a copy in 1.7.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Same type, copied storage</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Copying of substrings is also the choice made in C#, and in the default</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`NSString` implementation. This approach avoids the memory leak issue, but has</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">obvious performance overhead in performing the copies.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This in turn encourages trafficking in string/range pairs instead of in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">substrings, for performance reasons, leading to API challenges. For example:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">foo.compare(bar, range: start..<end)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Here, it is not clear whether `range` applies to `foo` or `bar`. This</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">relationship is better expressed in Swift as a slicing operation:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">foo[start..<end].compare(bar)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Not only does this clarify to which string the range applies, it also brings</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">this sub-range capability to any API that operates on `String` "for free". So</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">these other combinations also work equally well:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// apply range on argument rather than target</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">foo.compare(bar[start..<end])</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// apply range on both</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">foo[start..<end].compare(bar[start1..<end1])</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// compare two strings ignoring first character</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">foo.dropFirst().compare(bar.dropFirst())</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In all three cases, an explicit range argument need not appear on the `compare`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">method itself. The implementation of `compare` does not need to know anything</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">about ranges. Methods need only take range arguments when that was an</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">integral part of their purpose (for example, setting the start and end of a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">user's current selection in a text box).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Different type, shared storage</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The desire to share underlying storage while preventing accidental memory leaks</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">occurs with slices of `Array`. For this reason we have an `ArraySlice` type.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The inconvenience of a separate type is mitigated by most operations used on</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Array` from the standard library being generic over `Sequence` or `Collection`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We should apply the same approach for `String` by introducing a distinct</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`SubSequence` type, `Substring`. Similar advice given for `ArraySlice` would apply to `Substring`:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Important: Long-term storage of `Substring` instances is discouraged. A</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">substring holds a reference to the entire storage of a larger string, not</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">just to the portion it presents, even after the original string's lifetime</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">ends. Long-term storage of a `Substring` may therefore prolong the lifetime</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of large strings that are no longer otherwise accessible, which can appear</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">to be memory leakage.</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">When assigning a `Substring` to a longer-lived variable (usually a stored</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">property) explicitly of type `String`, a type conversion will be performed, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">at this point the substring buffer is copied and the original string's storage</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">can be released.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">A `String` that was not its own `Substring` could be one word—a single tagged</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">pointer—without requiring additional allocations. `Substring`s would be a view</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">onto a `String`, so are 3 words - pointer to owner, pointer to start, and a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">length. The small string optimization for `Substring` would take advantage of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the larger size, probably with a less compressed encoding for speed.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The downside of having two types is the inconvenience of sometimes having a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Substring` when you need a `String`, and vice-versa. It is likely this would</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">be a significantly bigger problem than with `Array` and `ArraySlice`, as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">slicing of `String` is such a common operation. It is especially relevant to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">existing code that assumes `String` is the currency type. To ease the pain of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">type mismatches, `Substring` should be a subtype of `String` in the same way</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that `Int` is a subtype of `Optional<Int>`. This would give users an implicit</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">conversion from `Substring` to `String`, as well as the usual implicit</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">conversions such as `[Substring]` to `[String]` that other subtype</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">relationships receive.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In most cases, type inference combined with the subtype relationship should</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">make the type difference a non-issue and users will not care which type they</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">are using. For flexibility and optimizability, most operations from the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">standard library will traffic in generic models of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[`Unicode`](#the--code-unicode--code--protocol).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">##### Guidance for API Designers</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In this model, **if a user is unsure about which type to use, `String` is always</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">a reasonable default**. A `Substring` passed where `String` is expected will be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">implicitly copied. When compared to the “same type, copied storage” model, we</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">have effectively deferred the cost of copying from the point where a substring</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">is created until it must be converted to `String` for use with an API.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">A user who needs to optimize away copies altogether should use this guideline:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">if for performance reasons you are tempted to add a `Range` argument to your</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">method as well as a `String` to avoid unnecessary copies, you should instead</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">use `Substring`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">##### The “Empty Subscript”</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">To make it easy to call such an optimized API when you only have a `String` (or</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">to call any API that takes a `Collection`'s `SubSequence` when all you have is</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the `Collection`), we propose the following “empty subscript” operation,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension Collection {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">subscript() -> SubSequence { </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> return self[startIndex..<endIndex] </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">which allows the following usage:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">funcThatIsJustLooking(at: person.name[]) // pass person.name as Substring</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The `[]` syntax can be offered as a fixit when needed, similar to `&` for an</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`inout` argument. While it doesn't help a user to convert `[String]` to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`[Substring]`, the need for such conversions is extremely rare, can be done with</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">a simple `map` (which could also be offered by a fixit):</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">takesAnArrayOfSubstring(arrayOfString.map { $0[] })</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Other Options Considered</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">As we have seen, all three options above have downsides, but it's possible</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">these downsides could be eliminated/mitigated by the compiler. We are proposing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">one such mitigation—implicit conversion—as part of the the "different type,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">shared storage" option, to help avoid the cognitive load on developers of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">having to deal with a separate `Substring` type.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">To avoid the memory leak issues of a "same type, shared storage" substring</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">option, we considered whether the compiler could perform an implicit copy of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the underlying storage when it detects the string is being "stored" for long</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">term usage, say when it is assigned to a stored property. The trouble with this</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">approach is it is very difficult for the compiler to distinguish between</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">long-term storage versus short-term in the case of abstractions that rely on</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">stored properties. For example, should the storing of a substring inside an</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Optional` be considered long-term? Or the storing of multiple substrings</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">inside an array? The latter would not work well in the case of a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`components(separatedBy:)` implementation that intended to return an array of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">substrings. It would also be difficult to distinguish intentional medium-term</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">storage of substrings, say by a lexer. There does not appear to be an effective</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">consistent rule that could be applied in the general case for detecting when a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">substring is truly being stored long-term.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">To avoid the cost of copying substrings under "same type, copied storage", the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">optimizer could be enhanced to to reduce the impact of some of those copies.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">For example, this code could be optimized to pull the invariant substring out</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of the loop:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">for _ in 0..<lots { </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">someFunc(takingString: bigString[bigRange]) </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">It's worth noting that a similar optimization is needed to avoid an equivalent</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">problem with implicit conversion in the "different type, shared storage" case:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let substring = bigString[bigRange]</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">for _ in 0..<lots { someFunc(takingString: substring) }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">However, in the case of "same type, copied storage" there are many use cases</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that cannot be optimized as easily. Consider the following simple definition of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">a recursive `contains` algorithm, which when substring slicing is linear makes</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the overall algorithm quadratic:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension String {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> func containsChar(_ x: Character) -> Bool {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> return !isEmpty && (first == x || dropFirst().containsChar(x))</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">For the optimizer to eliminate this problem is unrealistic, forcing the user to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">remember to optimize the code to not use string slicing if they want it to be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">efficient (assuming they remember):</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension String {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> // add optional argument tracking progress through the string</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> func containsCharacter(_ x: Character, atOrAfter idx: Index? = nil) -> Bool {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> let idx = idx ?? startIndex</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> return idx != endIndex</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> && (self[idx] == x || containsCharacter(x, atOrAfter: index(after: idx)))</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Substrings, Ranges and Objective-C Interop</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The pattern of passing a string/range pair is common in several Objective-C</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">APIs, and is made especially awkward in Swift by the non-interchangeability of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Range<String.Index>` and `NSRange`. </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">s2.find(s2, sourceRange: NSRange(j..<s2.endIndex, in: s2))</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In general, however, the Swift idiom for operating on a sub-range of a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Collection` is to *slice* the collection and operate on that:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">s2.find(s2[j..<s2.endIndex])</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Therefore, APIs that operate on an `NSString`/`NSRange` pair should be imported</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">without the `NSRange` argument. The Objective-C importer should be changed to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">give these APIs special treatment so that when a `Substring` is passed, instead</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of being converted to a `String`, the full `NSString` and range are passed to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the Objective-C method, thereby avoiding a copy.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">As a result, you would never need to pass an `NSRange` to these APIs, which</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">solves the impedance problem by eliminating the argument, resulting in more</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">idiomatic Swift code while retaining the performance benefit. To help users</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">manually handle any cases that remain, Foundation should be augmented to allow</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the following syntax for converting to and from `NSRange`:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let nsr = NSRange(i..<j, in: s) // An NSRange corresponding to s[i..<j]</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let iToJ = Range(nsr, in: s) // Equivalent to i..<j</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### The `Unicode` protocol</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">With `Substring` and `String` being distinct types and sharing almost all</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">interface and semantics, and with the highest-performance string processing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">requiring knowledge of encoding and layout that the currency types can't</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">provide, it becomes important to capture the common “string API” in a protocol.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Since Unicode conformance is a key feature of string processing in swift, we</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">call that protocol `Unicode`:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**Note:** The following assumes several features that are planned but not yet implemented in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Swift, and should be considered a sketch rather than a final design.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">protocol Unicode </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">: Comparable, BidirectionalCollection where Element == Character {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">associatedtype Encoding : UnicodeEncoding</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var encoding: Encoding { get }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">associatedtype CodeUnits </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> : RandomAccessCollection where Element == Encoding.CodeUnit</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var codeUnits: CodeUnits { get }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">associatedtype UnicodeScalars </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> : BidirectionalCollection where Element == UnicodeScalar</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var unicodeScalars: UnicodeScalars { get }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">associatedtype ExtendedASCII </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> : BidirectionalCollection where Element == UInt32</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var extendedASCII: ExtendedASCII { get }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var unicodeScalars: UnicodeScalars { get }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension Unicode {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// ... define high-level non-mutating string operations, e.g. search ...</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func compared<Other: Unicode>(</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> to rhs: Other,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> case caseSensitivity: StringSensitivity? = nil,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> diacritic diacriticSensitivity: StringSensitivity? = nil,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> width widthSensitivity: StringSensitivity? = nil,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> in locale: Locale? = nil</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">) -> SortOrder { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension Unicode : RangeReplaceableCollection where CodeUnits :</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">RangeReplaceableCollection {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> // Satisfy protocol requirement</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> mutating func replaceSubrange<C : Collection>(_: Range<Index>, with: C) </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> where C.Element == Element</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// ... define high-level mutating string operations, e.g. replace ...</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The goal is that `Unicode` exposes the underlying encoding and code units in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">such a way that for types with a known representation (e.g. a high-performance</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`UTF8String`) that information can be known at compile-time and can be used to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">generate a single path, while still allowing types like `String` that admit</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">multiple representations to use runtime queries and branches to fast path</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">specializations.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">**Note:** `Unicode` would make a fantastic namespace for much of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">what's in this proposal if we could get the ability to nest types and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">protocols in protocols.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Scanning, Matching, and Tokenization</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Low-Level Textual Analysis</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We should provide convenient APIs processing strings by character. For example,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">it should be easy to cleanly express, “if this string starts with `"f"`, process</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the rest of the string as follows…” Swift is well-suited to expressing this</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">common pattern beautifully, but we need to add the APIs. Here are two examples</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">of the sort of code that might be possible given such APIs:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">if let firstLetter = input.droppingPrefix(alphabeticCharacter) {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">somethingWith(input) // process the rest of input</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">if let (number, restOfInput) = input.parsingPrefix(Int.self) {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> ...</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The specific spelling and functionality of APIs like this are TBD. The larger</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">point is to make sure matching-and-consuming jobs are well-supported.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Unified Pattern Matcher Protocol</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Many of the current methods that do matching are overloaded to do the same</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">logical operations in different ways, with the following axes:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Logical Operation: `find`, `split`, `replace`, match at start</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Kind of pattern: `CharacterSet`, `String`, a regex, a closure</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Options, e.g. case/diacritic sensitivity, locale. Sometimes a part of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the method name, and sometimes an argument</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Whole string or subrange.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We should represent these aspects as orthogonal, composable components,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">abstracting pattern matchers into a protocol like</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[this one](<a href="https://github.com/apple/swift/blob/master/test/Prototypes/PatternMatching.swift#L33" class="">https://github.com/apple/swift/blob/master/test/Prototypes/PatternMatching.swift#L33</a>),</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that can allow us to define logical operations once, without introducing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">overloads, and massively reducing API surface area.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">For example, using the strawman prefix `%` syntax to turn string literals into</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">patterns, the following pairs would all invoke the same generic methods:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">if let found = s.firstMatch(%"searchString") { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">if let found = s.firstMatch(someRegex) { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">for m in s.allMatches((%"searchString"), case: .insensitive) { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">for m in s.allMatches(someRegex) { ... }</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let items = s.split(separatedBy: ", ")</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let tokens = s.split(separatedBy: CharacterSet.whitespace)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Note that, because Swift requires the indices of a slice to match the indices of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the range from which it was sliced, operations like `firstMatch` can return a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Substring?` in lieu of a `Range<String.Index>?`: the indices of the match in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the string being searched, if needed, can easily be recovered as the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`startIndex` and `endIndex` of the `Substring`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Note also that matching operations are useful for collections in general, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">would fall out of this proposal:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// replace subsequences of contiguous NaNs with zero</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">forces.replace(oneOrMore([Float.nan]), [0.0])</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Regular Expressions</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Addressing regular expressions is out of scope for this proposal.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">That said, it is important that to note the pattern matching protocol mentioned</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">above provides a suitable foundation for regular expressions, and types such as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`NSRegularExpression` can easily be retrofitted to conform to it. In the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">future, support for regular expression literals in the compiler could allow for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">compile-time syntax checking and optimization.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### String Indices</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`String` currently has four views—`characters`, `unicodeScalars`, `utf8`, and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`utf16`—each with its own opaque index type. The APIs used to translate indices</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">between views add needless complexity, and the opacity of indices makes them</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">difficult to serialize.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The index translation problem has two aspects:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. `String` views cannot consume one anothers' indices without a cumbersome</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> conversion step. An index into a `String`'s `characters` must be translated</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> before it can be used as a position in its `unicodeScalars`. Although these</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> translations are rarely needed, they add conceptual and API complexity.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. Many APIs in the core libraries and other frameworks still expose `String`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> positions as `Int`s and regions as `NSRange`s, which can only reference a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `utf16` view and interoperate poorly with `String` itself.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Index Interchange Among Views</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">String's need for flexible backing storage and reasonably-efficient indexing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">(i.e. without dynamically allocating and reference-counting the indices</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">themselves) means indices need an efficient underlying storage type. Although</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">we do not wish to expose `String`'s indices *as* integers, `Int` offsets into</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">underlying code unit storage makes a good underlying storage type, provided</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`String`'s underlying storage supports random-access. We think random-access</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">*code-unit storage* is a reasonable requirement to impose on all `String`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">instances.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Making these `Int` code unit offsets conveniently accessible and constructible</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">solves the serialization problem:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">clipboard.write(s.endIndex.codeUnitOffset)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let offset = clipboard.read(Int.self)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let i = String.Index(codeUnitOffset: offset)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Index interchange between `String` and its `unicodeScalars`, `codeUnits`,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">and [`extendedASCII`](#parsing-ascii-structure) views can be made entirely</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">seamless by having them share an index type (semantics of indexing a `String`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">between grapheme cluster boundaries are TBD—it can either trap or be forgiving).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Having a common index allows easy traversal into the interior of graphemes,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">something that is often needed, without making it likely that someone will do it</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">by accident.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- `String.index(after:)` should advance to the next grapheme, even when the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> index points partway through a grapheme.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- `String.index(before:)` should move to the start of the grapheme before</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> the current position.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Seamless index interchange between `String` and its UTF-8 or UTF-16 views is not</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">crucial, as the specifics of encoding should not be a concern for most use</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">cases, and would impose needless costs on the indices of other views. That</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">said, we can make translation much more straightforward by exposing simple</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">bidirectional converting `init`s on both index types:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let u8Position = String.UTF8.Index(someStringIndex)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let originalPosition = String.Index(u8Position)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Index Interchange with Cocoa</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">We intend to address `NSRange`s that denote substrings in Cocoa APIs as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">described [later in this document](#substrings--ranges-and-objective-c-interop).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">That leaves the interchange of bare indices with Cocoa APIs trafficking in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`Int`. Hopefully such APIs will be rare, but when needed, the following</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension, which would be useful for all `Collections`, can help:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension Collection {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func index(offset: IndexDistance) -> Index {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> return index(startIndex, offsetBy: offset)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func offset(of i: Index) -> IndexDistance {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> return distance(from: startIndex, to: i)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Then integers can easily be translated into offsets into a `String`'s `utf16`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">view for consumption by Cocoa:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let cocoaIndex = s.utf16.offset(of: String.UTF16Index(i))</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">let swiftIndex = s.utf16.index(offset: cocoaIndex)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Formatting</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">A full treatment of formatting is out of scope of this proposal, but</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">we believe it's crucial for completing the text processing picture. This</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">section details some of the existing issues and thinking that may guide future</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">development.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Printf-Style Formatting</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`String.format` is designed on the `printf` model: it takes a format string with</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">textual placeholders for substitution, and an arbitrary list of other arguments.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The syntax and meaning of these placeholders has a long history in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">C, but for anyone who doesn't use them regularly they are cryptic and complex,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">as the `printf (3)` man page attests.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Aside from complexity, this style of API has two major problems: First, the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">spelling of these placeholders must match up to the types of the arguments, in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the right order, or the behavior is undefined. Some limited support for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">compile-time checking of this correspondence could be implemented, but only for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the cases where the format string is a literal. Second, there's no reasonable</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">way to extend the formatting vocabulary to cover the needs of new types: you are</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">stuck with what's in the box.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### Foundation Formatters</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">The formatters supplied by Foundation are highly capable and versatile, offering</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">both formatting and parsing services. When used for formatting, though, the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">design pattern demands more from users than it should:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Matching the type of data being formatted to a formatter type</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Creating an instance of that type</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Setting stateful options (`currency`, `dateStyle`) on the type. Note: the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> need for this step prevents the instance from being used and discarded in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> the same expression where it is created.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Overall, introduction of needless verbosity into source</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">These may seem like small issues, but the experience of Apple localization</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">experts is that the total drag of these factors on programmers is such that they</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">tend to reach for `String.format` instead.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">#### String Interpolation</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Swift string interpolation provides a user-friendly alternative to printf's</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">domain-specific language (just write ordinary swift code!) and its type safety</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">problems (put the data right where it belongs!) but the following issues prevent</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">it from being useful for localized formatting (among other jobs):</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* [SR-2303](<a href="https://bugs.swift.org/browse/SR-2303" class="">https://bugs.swift.org/browse/SR-2303</a>) We are unable to restrict</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> types used in string interpolation.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* [SR-1260](<a href="https://bugs.swift.org/browse/SR-1260" class="">https://bugs.swift.org/browse/SR-1260</a>) String interpolation can't</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> distinguish (fragments of) the base string from the string substitutions.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In the long run, we should improve Swift string interpolation to the point where</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">it can participate in most any formatting job. Mostly this centers around</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">fixing the interpolation protocols per the previous item, and supporting</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">localization.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">To be able to use formatting effectively inside interpolations, it needs to be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">both lightweight (because it all happens in-situ) and discoverable. One </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">approach would be to standardize on `format` methods, e.g.:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">"Column 1: \(n.format(radix:16, width:8)) *** \(message)"</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">"Something with leading zeroes: \(x.format(fill: zero, width:8))"</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### C String Interop</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Our support for interoperation with nul-terminated C strings is scattered and</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">incoherent, with 6 ways to transform a C string into a `String` and four ways to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">do the inverse. These APIs should be replaced with the following</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">extension String {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// Constructs a `String` having the same contents as `nulTerminatedUTF8`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">///</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// - Parameter nulTerminatedUTF8: a sequence of contiguous UTF-8 encoded </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// bytes ending just before the first zero byte (NUL character).</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">init(cString nulTerminatedUTF8: UnsafePointer<CChar>)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// Constructs a `String` having the same contents as `nulTerminatedCodeUnits`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">///</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// - Parameter nulTerminatedCodeUnits: a sequence of contiguous code units in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// the given `encoding`, ending just before the first zero code unit.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// - Parameter encoding: describes the encoding in which the code units</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// should be interpreted.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">init<Encoding: UnicodeEncoding>(</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> cString nulTerminatedCodeUnits: UnsafePointer<Encoding.CodeUnit>,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> encoding: Encoding)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// Invokes the given closure on the contents of the string, represented as a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">/// pointer to a null-terminated sequence of UTF-8 code units.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">func withCString<Result>(</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> _ body: (UnsafePointer<CChar>) throws -> Result) rethrows -> Result</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">In both of the construction APIs, any invalid encoding sequence detected will</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">have its longest valid prefix replaced by U+FFFD, the Unicode replacement</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">character, per Unicode specification. This covers the common case. The</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">replacement is done *physically* in the underlying storage and the validity of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the result is recorded in the `String`'s `encoding` such that future accesses</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">need not be slowed down by possible error repair separately.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Construction that is aborted when encoding errors are detected can be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">accomplished using APIs on the `encoding`. String types that retain their</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">physical encoding even in the presence of errors and are repaired on-the-fly can</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">be built as different instances of the `Unicode` protocol.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Unicode 9 Conformance</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Unicode 9 (and MacOS 10.11) brought us support for family emoji, which changes</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the process of properly identifying `Character` boundaries. We need to update</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`String` to account for this change.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### High-Performance String Processing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Many strings are short enough to store in 64 bits, many can be stored using only</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">8 bits per unicode scalar, others are best encoded in UTF-16, and some come to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">us already in some other encoding, such as UTF-8, that would be costly to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">translate. Supporting these formats while maintaining usability for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">general-purpose APIs demands that a single `String` type can be backed by many</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">different representations.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">That said, the highest performance code always requires static knowledge of the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">data structures on which it operates, and for this code, dynamic selection of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">representation comes at too high a cost. Heavy-duty text processing demands a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">way to opt out of dynamism and directly use known encodings. Having this</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">ability can also make it easy to cleanly specialize code that handles dynamic</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">cases for maximal efficiency on the most common representations.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">To address this need, we can build models of the `Unicode` protocol that encode</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">representation information into the type, such as `NFCNormalizedUTF16String`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Parsing ASCII Structure</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Although many machine-readable formats support the inclusion of arbitrary</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Unicode text, it is also common that their fundamental structure lies entirely</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">within the ASCII subset (JSON, YAML, many XML formats). These formats are often</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">processed most efficiently by recognizing ASCII structural elements as ASCII,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">and capturing the arbitrary sections between them in more-general strings. The</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">current String API offers no way to efficiently recognize ASCII and skip past</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">everything else without the overhead of full decoding into unicode scalars.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">For these purposes, strings should supply an `extendedASCII` view that is a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">collection of `UInt32`, where values less than `0x80` represent the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">corresponding ASCII character, and other values represent data that is specific</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">to the underlying encoding of the string.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">## Language Support</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This proposal depends on two new features in the Swift language:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">1. **Generic subscripts**, to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> enable unified slicing syntax.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">2. **A subtype relationship** between</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `Substring` and `String`, enabling framework APIs to traffic solely in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `String` while still making it possible to avoid copies by handling</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> `Substring`s where necessary.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Additionally, **the ability to nest types and protocols inside</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">protocols** could significantly shrink the footprint of this proposal</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">on the top-level Swift namespace.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">## Open Questions</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Must `String` be limited to storing UTF-16 subset encodings?</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- The ability to handle `UTF-8`-encoded strings (models of `Unicode`) is not in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">question here; this is about what encodings must be storable, without</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">transcoding, in the common currency type called “`String`”.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- ASCII, Latin-1, UCS-2, and UTF-16 are UTF-16 subsets. UTF-8 is not.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- If we have a way to get at a `String`'s code units, we need a concrete type in</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">which to express them in the API of `String`, which is a concrete type</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- If String needs to be able to represent UTF-32, presumably the code units need</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">to be `UInt32`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Not supporting UTF-32-encoded text seems like one reasonable design choice.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Maybe we can allow UTF-8 storage in `String` and expose its code units as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`UInt16`, just as we would for Latin-1.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">- Supporting only UTF-16-subset encodings would imply that `String` indices can</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">be serialized without recording the `String`'s underlying encoding.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Do we need a type-erasable base protocol for UnicodeEncoding?</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">UnicodeEncoding has an associated type, but it may be important to be able to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">traffic in completely dynamic encoding values, e.g. for “tell me the most</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">efficient encoding for this string.”</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### Should there be a string “facade?”</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">One possible design alternative makes `Unicode` a vehicle for expressing</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the storage and encoding of code units, but does not attempt to give it an API</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">appropriate for `String`. Instead, string APIs would be provided by a generic</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">wrapper around an instance of `Unicode`:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">struct StringFacade<U: Unicode> : BidirectionalCollection {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// ...APIs for high-level string processing here...</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var unicode: U // access to lower-level unicode details</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">typealias String = StringFacade<StringStorage></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">typealias Substring = StringFacade<StringStorage.SubSequence></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This design would allow us to de-emphasize lower-level `String` APIs such as</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">access to the specific encoding, by putting them behind a `.unicode` property.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">A similar effect in a facade-less design would require a new top-level</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`StringProtocol` playing the role of the facade with an an `associatedtype</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Storage : Unicode`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">An interesting variation on this design is possible if defaulted generic</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">parameters are introduced to the language:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```swift</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">struct String<U: Unicode = StringStorage> </span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">: BidirectionalCollection {</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">// ...APIs for high-level string processing here...</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">var unicode: U // access to lower-level unicode details</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">}</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">typealias Substring = String<StringStorage.SubSequence></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">```</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">One advantage of such a design is that naïve users will always extend “the right</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">type” (`String`) without thinking, and the new APIs will show up on `Substring`,</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`MyUTF8String`, etc. That said, it also has downsides that should not be</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">overlooked, not least of which is the confusability of the meaning of the word</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">“string.” Is it referring to the generic or the concrete type?</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### `TextOutputStream` and `TextOutputStreamable`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`TextOutputStreamable` is intended to provide a vehicle for</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">efficiently transporting formatted representations to an output stream</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">without forcing the allocation of storage. Its use of `String`, a</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">type with multiple representations, at the lowest-level unit of</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">communication, conflicts with this goal. It might be sufficient to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">change `TextOutputStream` and `TextOutputStreamable` to traffic in an</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">associated type conforming to `Unicode`, but that is not yet clear.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">This area will require some design work.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### `description` and `debugDescription`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Should these be creating localized or non-localized representations?</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Is returning a `String` efficient enough?</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">* Is `debugDescription` pulling the weight of the API surface area it adds?</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">### `StaticString`</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">`StaticString` was added as a byproduct of standard library developed and kept</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">around because it seemed useful, but it was never truly *designed* for client</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">programmers. We need to decide what happens with it. Presumably *something*</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">should fill its role, and that should conform to `Unicode`.</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">## Footnotes</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><b id="f0">0</b> The integers rewrite currently underway is expected to</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> substantially reduce the scope of `Int`'s API by using more</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""> generics. [↩](#a0)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><b id="f1">1</b> In practice, these semantics will usually be tied to the</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">version of the installed [ICU](<a href="http://icu-project.org/" class="">http://icu-project.org</a>) library, which</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">programmatically encodes the most complex rules of the Unicode Standard and its</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">de-facto extension, CLDR.[↩](#a1)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><b id="f2">2</b></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">See</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[<a href="http://unicode.org/reports/tr29/#Notation" class="">http://unicode.org/reports/tr29/#Notation</a>](<a href="http://unicode.org/reports/tr29/#Notation" class="">http://unicode.org/reports/tr29/#Notation</a>). Note</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">that inserting Unicode scalar values to prevent merging of grapheme clusters would</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">also constitute a kind of misbehavior (one of the clusters at the boundary would</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">not be found in the result), so would be relatively costly to implement, with</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">little benefit. [↩](#a2)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><b id="f4">4</b> The use of non-UCA-compliant ordering is fully sanctioned by</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the Unicode standard for this purpose. In fact there's</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">a [whole chapter](<a href="http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf" class="">http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf</a>)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">dedicated to it. In particular, §5.17 says:</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">When comparing text that is visible to end users, a correct linguistic sort</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">should be used, as described in _Section 5.16, Sorting and</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">Searching_. However, in many circumstances the only requirement is for a</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">fast, well-defined ordering. In such cases, a binary ordering can be used.</span><br class=""></blockquote></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">[↩](#a4)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><b id="f5">5</b> The queries supported by `NSCharacterSet` map directly onto</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">properties in a table that's indexed by unicode scalar value. This table is</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">part of the Unicode standard. Some of these queries (e.g., “is this an</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">uppercase character?”) may have fairly obvious generalizations to grapheme</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">clusters, but exactly how to do it is a research topic and *ideally* we'd either</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">establish the existing practice that the Unicode committee would standardize, or</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">the Unicode committee would do the research and we'd implement their</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">result.[↩](#a5)</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">_______________________________________________</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class="">swift-evolution mailing list</span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><blockquote type="cite" class=""><span class=""><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" class="">https://lists.swift.org/mailman/listinfo/swift-evolution</a></span><br class=""></blockquote></blockquote><blockquote type="cite" class=""><span class=""></span><br class=""></blockquote></div></div></div></blockquote></div><br class=""></body></html>