<div dir="ltr"><div>Great document! Pleasure to read and see the excellence design powers that go into Swift. </div><div><br></div>One ask - make string interpolation great again?<div><br></div><div>Taking from examples supplied at <a href="https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string-interpolation">https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string-interpolation</a></div><div><br></div><div><pre style="box-sizing:border-box;font-family:consolas,&#39;liberation mono&#39;,menlo,courier,monospace;font-size:13.600000381469727px;margin-top:0px;margin-bottom:0px;line-height:1.45;word-wrap:normal;padding:16px;overflow:auto;background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;word-break:normal;color:rgb(51,51,51)"><span class="inbox-pl-s" style="box-sizing:border-box;color:rgb(24,54,145)"><span class="inbox-pl-pds" style="box-sizing:border-box">&quot;</span>Column 1: <span class="inbox-pl-pse" style="box-sizing:border-box">\(</span><span class="inbox-pl-s1" style="box-sizing:border-box;color:rgb(51,51,51)">n.<span class="inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">format</span>(<span class="inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">radix</span>:<span class="inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">16</span>, <span class="inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">width</span>:<span class="inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">8</span>))</span> *** <span class="inbox-pl-pse" style="box-sizing:border-box">\(</span><span class="inbox-pl-s1" style="box-sizing:border-box;color:rgb(51,51,51)">message)</span><span class="inbox-pl-pds" style="box-sizing:border-box">&quot;</span></span></pre></div><div><br></div><div>Why not use:</div><div><br></div><div><pre style="box-sizing:border-box;font-family:consolas,&#39;liberation mono&#39;,menlo,courier,monospace;font-size:13.600000381469727px;margin-top:0px;margin-bottom:0px;line-height:1.45;word-wrap:normal;padding:16px;overflow:auto;background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;word-break:normal;color:rgb(51,51,51)"><span class="inbox-inbox-pl-s" style="box-sizing:border-box;color:rgb(24,54,145)"><span class="inbox-inbox-pl-pds" style="box-sizing:border-box">&quot;</span>Column 1: <span class="inbox-inbox-pl-pse" style="box-sizing:border-box">${</span><span class="inbox-inbox-pl-s1" style="box-sizing:border-box;color:rgb(51,51,51)">n.<span class="inbox-inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">format</span>(<span class="inbox-inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">radix</span>:<span class="inbox-inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">16</span>, <span class="inbox-inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">width</span>:<span class="inbox-inbox-pl-c1" style="box-sizing:border-box;color:rgb(0,134,179)">8</span>)}</span> *** <span class="inbox-inbox-pl-pse" style="box-sizing:border-box">$</span><span class="inbox-inbox-pl-s1" style="box-sizing:border-box;color:rgb(51,51,51)">message</span><span class="inbox-inbox-pl-pds" style="box-sizing:border-box">&quot;</span></span></pre></div><div><br></div><div>Which for my preference makes the syntax feel more readable, avoids the &quot;double ))&quot; in terms of string interpolation termination and function termination points. And if that&#39;s not enough brings the &quot;feel&quot; of the language to be scriptable in nature common in bash, sh, zsh and co.. scripting interpreters and has been adopted as part of ES6 interpolation syntax[1]. </div><div><br></div><div>[1] <a href="https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Template_literals">https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Template_literals</a></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, Jan 20, 2017 at 9:19 AM Rien via swift-evolution &lt;<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Wow, I fully support the intention (becoming better than Perl) but I cannot comment on the contents without studying it for a couple of days…<br class="gmail_msg">

<br class="gmail_msg">

Regards,<br class="gmail_msg">

Rien<br class="gmail_msg">

<br class="gmail_msg">

Site: <a href="http://balancingrock.nl" rel="noreferrer" class="gmail_msg" target="_blank">http://balancingrock.nl</a><br class="gmail_msg">

Blog: <a href="http://swiftrien.blogspot.com" rel="noreferrer" class="gmail_msg" target="_blank">http://swiftrien.blogspot.com</a><br class="gmail_msg">

Github: <a href="http://github.com/Swiftrien" rel="noreferrer" class="gmail_msg" target="_blank">http://github.com/Swiftrien</a><br class="gmail_msg">

Project: <a href="http://swiftfire.nl" rel="noreferrer" class="gmail_msg" target="_blank">http://swiftfire.nl</a><br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

&gt; On 20 Jan 2017, at 03:56, Ben Cohen via swift-evolution &lt;<a href="mailto:swift-evolution@swift.org" class="gmail_msg" target="_blank">swift-evolution@swift.org</a>&gt; wrote:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Hi all,<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Below is our take on a design manifesto for Strings in Swift 4 and beyond.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Probably best read in rendered markdown on GitHub:<br class="gmail_msg">

&gt; <a href="https://github.com/apple/swift/blob/master/docs/StringManifesto.md" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/apple/swift/blob/master/docs/StringManifesto.md</a><br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; We’re eager to hear everyone’s thoughts.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Regards,<br class="gmail_msg">

&gt; Ben and Dave<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; # String Processing For Swift 4<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; * Authors: [Dave Abrahams](<a href="https://github.com/dabrahams" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/dabrahams</a>), [Ben Cohen](<a href="https://github.com/airspeedswift" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/airspeedswift</a>)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The goal of re-evaluating Strings for Swift 4 has been fairly ill-defined thus<br class="gmail_msg">

&gt; far, with just this short blurb in the<br class="gmail_msg">

&gt; [list of goals](<a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025676.html" rel="noreferrer" class="gmail_msg" target="_blank">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025676.html</a>):<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;&gt; **String re-evaluation**: String is one of the most important fundamental<br class="gmail_msg">

&gt;&gt; types in the language.  The standard library leads have numerous ideas of how<br class="gmail_msg">

&gt;&gt; to improve the programming model for it, without jeopardizing the goals of<br class="gmail_msg">

&gt;&gt; providing a unicode-correct-by-default model.  Our goal is to be better at<br class="gmail_msg">

&gt;&gt; string processing than Perl!<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; For Swift 4 and beyond we want to improve three dimensions of text processing:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  1. Ergonomics<br class="gmail_msg">

&gt;  2. Correctness<br class="gmail_msg">

&gt;  3. Performance<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This document is meant to both provide a sense of the long-term vision<br class="gmail_msg">

&gt; (including undecided issues and possible approaches), and to define the scope of<br class="gmail_msg">

&gt; work that could be done in the Swift 4 timeframe.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ## General Principles<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Ergonomics<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; It&#39;s worth noting that ergonomics and correctness are mutually-reinforcing.  An<br class="gmail_msg">

&gt; API that is easy to use—but incorrectly—cannot be considered an ergonomic<br class="gmail_msg">

&gt; success.  Conversely, an API that&#39;s simply hard to use is also hard to use<br class="gmail_msg">

&gt; correctly.  Acheiving optimal performance without compromising ergonomics or<br class="gmail_msg">

&gt; correctness is a greater challenge.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Consistency with the Swift language and idioms is also important for<br class="gmail_msg">

&gt; ergonomics. There are several places both in the standard library and in the<br class="gmail_msg">

&gt; foundation additions to `String` where patterns and practices found elsewhere<br class="gmail_msg">

&gt; could be applied to improve usability and familiarity.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### API Surface Area<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Primary data types such as `String` should have APIs that are easily understood<br class="gmail_msg">

&gt; given a signature and a one-line summary.  Today, `String` fails that test.  As<br class="gmail_msg">

&gt; you can see, the Standard Library and Foundation both contribute significantly to<br class="gmail_msg">

&gt; its overall complexity.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **Method Arity** | **Standard Library** | **Foundation**<br class="gmail_msg">

&gt; ---|:---:|:---:<br class="gmail_msg">

&gt; 0: `ƒ()` | 5 | 7<br class="gmail_msg">

&gt; 1: `ƒ(:)` | 19 | 48<br class="gmail_msg">

&gt; 2: `ƒ(::)` | 13 | 19<br class="gmail_msg">

&gt; 3: `ƒ(:::)` | 5 | 11<br class="gmail_msg">

&gt; 4: `ƒ(::::)` | 1 | 7<br class="gmail_msg">

&gt; 5: `ƒ(:::::)` | - | 2<br class="gmail_msg">

&gt; 6: `ƒ(::::::)` | - | 1<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **API Kind** | **Standard Library** | **Foundation**<br class="gmail_msg">

&gt; ---|:---:|:---:<br class="gmail_msg">

&gt; `init` | 41 | 18<br class="gmail_msg">

&gt; `func` | 42 | 55<br class="gmail_msg">

&gt; `subscript` | 9 | 0<br class="gmail_msg">

&gt; `var` | 26 | 14<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **Total: 205 APIs**<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; By contrast, `Int` has 80 APIs, none with more than two parameters.[0] String processing is complex enough; users shouldn&#39;t have<br class="gmail_msg">

&gt; to press through physical API sprawl just to get started.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Many of the choices detailed below contribute to solving this problem,<br class="gmail_msg">

&gt; including:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Restoring `Collection` conformance and dropping the `.characters` view.<br class="gmail_msg">

&gt;  * Providing a more general, composable slicing syntax.<br class="gmail_msg">

&gt;  * Altering `Comparable` so that parameterized<br class="gmail_msg">

&gt;    (e.g. case-insensitive) comparison fits smoothly into the basic syntax.<br class="gmail_msg">

&gt;  * Clearly separating language-dependent operations on text produced<br class="gmail_msg">

&gt;    by and for humans from language-independent<br class="gmail_msg">

&gt;    operations on text produced by and for machine processing.<br class="gmail_msg">

&gt;  * Relocating APIs that fall outside the domain of basic string processing and<br class="gmail_msg">

&gt;    discouraging the proliferation of ad-hoc extensions.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Batteries Included<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; While `String` is available to all programs out-of-the-box, crucial APIs for<br class="gmail_msg">

&gt; basic string processing tasks are still inaccessible until `Foundation` is<br class="gmail_msg">

&gt; imported.  While it makes sense that `Foundation` is needed for domain-specific<br class="gmail_msg">

&gt; jobs such as<br class="gmail_msg">

&gt; [linguistic tagging](<a href="https://developer.apple.com/reference/foundation/nslinguistictagger" rel="noreferrer" class="gmail_msg" target="_blank">https://developer.apple.com/reference/foundation/nslinguistictagger</a>),<br class="gmail_msg">

&gt; one should not need to import anything to, for example, do case-insensitive<br class="gmail_msg">

&gt; comparison.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Unicode Compliance and Platform Support<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The Unicode standard provides a crucial objective reference point for what<br class="gmail_msg">

&gt; constitutes correct behavior in an extremely complex domain, so<br class="gmail_msg">

&gt; Unicode-correctness is, and will remain, a fundamental design principle behind<br class="gmail_msg">

&gt; Swift&#39;s `String`.  That said, the Unicode standard is an evolving document, so<br class="gmail_msg">

&gt; this objective reference-point is not fixed.[1] While<br class="gmail_msg">

&gt; many of the most important operations—e.g. string hashing, equality, and<br class="gmail_msg">

&gt; non-localized comparison—will be stable, the semantics<br class="gmail_msg">

&gt; of others, such as grapheme breaking and localized comparison and case<br class="gmail_msg">

&gt; conversion, are expected to change as platforms are updated, so programs should<br class="gmail_msg">

&gt; be written so their correctness does not depend on precise stability of these<br class="gmail_msg">

&gt; semantics across OS versions or platforms.  Although it may be possible to<br class="gmail_msg">

&gt; imagine static and/or dynamic analysis tools that will help users find such<br class="gmail_msg">

&gt; errors, the only sure way to deal with this fact of life is to educate users.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ## Design Points<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Internationalization<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; There is strong evidence that developers cannot determine how to use<br class="gmail_msg">

&gt; internationalization APIs correctly.  Although documentation could and should be<br class="gmail_msg">

&gt; improved, the sheer size, complexity, and diversity of these APIs is a major<br class="gmail_msg">

&gt; contributor to the problem, causing novices to tune out, and more experienced<br class="gmail_msg">

&gt; programmers to make avoidable mistakes.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The first step in improving this situation is to regularize all localized<br class="gmail_msg">

&gt; operations as invocations of normal string operations with extra<br class="gmail_msg">

&gt; parameters. Among other things, this means:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 1. Doing away with `localizedXXX` methods<br class="gmail_msg">

&gt; 2. Providing a terse way to name the current locale as a parameter<br class="gmail_msg">

&gt; 3. Automatically adjusting defaults for options such<br class="gmail_msg">

&gt;   as case sensitivity based on whether the operation is localized.<br class="gmail_msg">

&gt; 4. Removing correctness traps like `localizedCaseInsensitiveCompare` (see<br class="gmail_msg">

&gt;    guidance in the<br class="gmail_msg">

&gt;    [Internationalization and Localization Guide](<a href="https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPInternational/InternationalizingYourCode/InternationalizingYourCode.html" rel="noreferrer" class="gmail_msg" target="_blank">https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPInternational/InternationalizingYourCode/InternationalizingYourCode.html</a>).<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Along with appropriate documentation updates, these changes will make localized<br class="gmail_msg">

&gt; operations more teachable, comprehensible, and approachable, thereby lowering a<br class="gmail_msg">

&gt; barrier that currently leads some developers to ignore localization issues<br class="gmail_msg">

&gt; altogether.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ####  The Default Behavior of `String`<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Although this isn&#39;t well-known, the most accessible form of many operations on<br class="gmail_msg">

&gt; Swift `String` (and `NSString`) are really only appropriate for text that is<br class="gmail_msg">

&gt; intended to be processed for, and consumed by, machines.  The semantics of the<br class="gmail_msg">

&gt; operations with the simplest spellings are always non-localized and<br class="gmail_msg">

&gt; language-agnostic.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Two major factors play into this design choice:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 1. Machine processing of text is important, so we should have first-class,<br class="gmail_msg">

&gt;   accessible functions appropriate to that use case.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 2. The most general localized operations require a locale parameter not required<br class="gmail_msg">

&gt;   by their un-localized counterparts.  This naturally skews complexity towards<br class="gmail_msg">

&gt;   localized operations.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Reaffirming that `String`&#39;s simplest APIs have<br class="gmail_msg">

&gt; language-independent/machine-processed semantics has the benefit of clarifying<br class="gmail_msg">

&gt; the proper default behavior of operations such as comparison, and allows us to<br class="gmail_msg">

&gt; make [significant optimizations](#collation-semantics) that were previously<br class="gmail_msg">

&gt; thought to conflict with Unicode.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Future Directions<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; One of the most common internationalization errors is the unintentional<br class="gmail_msg">

&gt; presentation to users of text that has not been localized, but regularizing APIs<br class="gmail_msg">

&gt; and improving documentation can go only so far in preventing this error.<br class="gmail_msg">

&gt; Combined with the fact that `String` operations are non-localized by default,<br class="gmail_msg">

&gt; the environment for processing human-readable text may still be somewhat<br class="gmail_msg">

&gt; error-prone in Swift 4.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; For an audience of mostly non-experts, it is especially important that naïve<br class="gmail_msg">

&gt; code is very likely to be correct if it compiles, and that more sophisticated<br class="gmail_msg">

&gt; issues can be revealed progressively.  For this reason, we intend to<br class="gmail_msg">

&gt; specifically and separately target localization and internationalization<br class="gmail_msg">

&gt; problems in the Swift 5 timeframe.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Operations With Options<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; There are three categories of common string operation that commonly need to be<br class="gmail_msg">

&gt; tuned in various dimensions:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **Operation**|**Applicable Options**<br class="gmail_msg">

&gt; ---|---<br class="gmail_msg">

&gt; sort ordering | locale, case/diacritic/width-insensitivity<br class="gmail_msg">

&gt; case conversion | locale<br class="gmail_msg">

&gt; pattern matching | locale, case/diacritic/width-insensitivity<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The defaults for case-, diacritic-, and width-insensitivity are different for<br class="gmail_msg">

&gt; localized operations than for non-localized operations, so for example a<br class="gmail_msg">

&gt; localized sort should be case-insensitive by default, and a non-localized sort<br class="gmail_msg">

&gt; should be case-sensitive by default.  We propose a standard “language” of<br class="gmail_msg">

&gt; defaulted parameters to be used for these purposes, with usage roughly like this:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt;  x.compared(to: y, case: .sensitive, in: swissGerman)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  x.lowercased(in: .currentLocale)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  x.allMatches(<br class="gmail_msg">

&gt;    somePattern, case: .insensitive, diacritic: .insensitive)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This usage might be supported by code like this:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; enum StringSensitivity {<br class="gmail_msg">

&gt; case sensitive<br class="gmail_msg">

&gt; case insensitive<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; extension Locale {<br class="gmail_msg">

&gt;  static var currentLocale: Locale { ... }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; extension Unicode {<br class="gmail_msg">

&gt;  // An example of the option language in declaration context,<br class="gmail_msg">

&gt;  // with nil defaults indicating unspecified, so defaults can be<br class="gmail_msg">

&gt;  // driven by the presence/absence of a specific Locale<br class="gmail_msg">

&gt;  func frobnicated(<br class="gmail_msg">

&gt;    case caseSensitivity: StringSensitivity? = nil,<br class="gmail_msg">

&gt;    diacritic diacriticSensitivity: StringSensitivity? = nil,<br class="gmail_msg">

&gt;    width widthSensitivity: StringSensitivity? = nil,<br class="gmail_msg">

&gt;    in locale: Locale? = nil<br class="gmail_msg">

&gt;  ) -&gt; Self { ... }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Comparing and Hashing Strings<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Collation Semantics<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; What Unicode says about collation—which is used in `&lt;`, `==`, and hashing— turns<br class="gmail_msg">

&gt; out to be quite interesting, once you pick it apart.  The full Unicode Collation<br class="gmail_msg">

&gt; Algorithm (UCA) works like this:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 1. Fully normalize both strings<br class="gmail_msg">

&gt; 2. Convert each string to a sequence of numeric triples to form a collation key<br class="gmail_msg">

&gt; 3. “Flatten” the key by concatenating the sequence of first elements to the<br class="gmail_msg">

&gt;   sequence of second elements to the sequence of third elements<br class="gmail_msg">

&gt; 4. Lexicographically compare the flattened keys<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; While step 1 can usually<br class="gmail_msg">

&gt; be [done quickly](<a href="http://unicode.org/reports/tr15/#Description_Norm" rel="noreferrer" class="gmail_msg" target="_blank">http://unicode.org/reports/tr15/#Description_Norm</a>) and<br class="gmail_msg">

&gt; incrementally, step 2 uses a collation table that maps matching *sequences* of<br class="gmail_msg">

&gt; unicode scalars in the normalized string to *sequences* of triples, which get<br class="gmail_msg">

&gt; accumulated into a collation key.  Predictably, this is where the real costs<br class="gmail_msg">

&gt; lie.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; *However*, there are some bright spots to this story.  First, as it turns out,<br class="gmail_msg">

&gt; string sorting (localized or not) should be done down to what&#39;s called<br class="gmail_msg">

&gt; the<br class="gmail_msg">

&gt; [“identical” level](<a href="http://unicode.org/reports/tr10/#Multi_Level_Comparison" rel="noreferrer" class="gmail_msg" target="_blank">http://unicode.org/reports/tr10/#Multi_Level_Comparison</a>),<br class="gmail_msg">

&gt; which adds a step 3a: append the string&#39;s normalized form to the flattened<br class="gmail_msg">

&gt; collation key.  At first blush this just adds work, but consider what it does<br class="gmail_msg">

&gt; for equality: two strings that normalize the same, naturally, will collate the<br class="gmail_msg">

&gt; same.  But also, *strings that normalize differently will always collate<br class="gmail_msg">

&gt; differently*.  In other words, for equality, it is sufficient to compare the<br class="gmail_msg">

&gt; strings&#39; normalized forms and see if they are the same.  We can therefore<br class="gmail_msg">

&gt; entirely skip the expensive part of collation for equality comparison.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Next, naturally, anything that applies to equality also applies to hashing: it<br class="gmail_msg">

&gt; is sufficient to hash the string&#39;s normalized form, bypassing collation keys.<br class="gmail_msg">

&gt; This should provide significant speedups over the current implementation.<br class="gmail_msg">

&gt; Perhaps more importantly, since comparison down to the “identical” level applies<br class="gmail_msg">

&gt; even to localized strings, it means that hashing and equality can be implemented<br class="gmail_msg">

&gt; exactly the same way for localized and non-localized text, and hash tables with<br class="gmail_msg">

&gt; localized keys will remain valid across current-locale changes.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Finally, once it is agreed that the *default* role for `String` is to handle<br class="gmail_msg">

&gt; machine-generated and machine-readable text, the default ordering of `String`s<br class="gmail_msg">

&gt; need no longer use the UCA at all.  It is sufficient to order them in any way<br class="gmail_msg">

&gt; that&#39;s consistent with equality, so `String` ordering can simply be a<br class="gmail_msg">

&gt; lexicographical comparison of normalized forms,[4]<br class="gmail_msg">

&gt; (which is equivalent to lexicographically comparing the sequences of grapheme<br class="gmail_msg">

&gt; clusters), again bypassing step 2 and offering another speedup.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This leaves us executing the full UCA *only* for localized sorting, and ICU&#39;s<br class="gmail_msg">

&gt; implementation has apparently been very well optimized.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Following this scheme everywhere would also allow us to make sorting behavior<br class="gmail_msg">

&gt; consistent across platforms.  Currently, we sort `String` according to the UCA,<br class="gmail_msg">

&gt; except that—*only on Apple platforms*—pairs of ASCII characters are ordered by<br class="gmail_msg">

&gt; unicode scalar value.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Syntax<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Because the current `Comparable` protocol expresses all comparisons with binary<br class="gmail_msg">

&gt; operators, string comparisons—which may require<br class="gmail_msg">

&gt; additional [options](#operations-with-options)—do not fit smoothly into the<br class="gmail_msg">

&gt; existing syntax.  At the same time, we&#39;d like to solve other problems with<br class="gmail_msg">

&gt; comparison, as outlined<br class="gmail_msg">

&gt; in<br class="gmail_msg">

&gt; [this proposal](<a href="https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e" rel="noreferrer" class="gmail_msg" target="_blank">https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e</a>)<br class="gmail_msg">

&gt; (implemented by changes at the head<br class="gmail_msg">

&gt; of<br class="gmail_msg">

&gt; [this branch](<a href="https://github.com/CodaFi/swift/commits/space-the-final-frontier)" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/CodaFi/swift/commits/space-the-final-frontier)</a>).<br class="gmail_msg">

&gt; We should adopt a modification of that proposal that uses a method rather than<br class="gmail_msg">

&gt; an operator `&lt;=&gt;`:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; enum SortOrder { case before, same, after }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; protocol Comparable : Equatable {<br class="gmail_msg">

&gt; func compared(to: Self) -&gt; SortOrder<br class="gmail_msg">

&gt; ...<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This change will give us a syntactic platform on which to implement methods with<br class="gmail_msg">

&gt; additional, defaulted arguments, thereby unifying and regularizing comparison<br class="gmail_msg">

&gt; across the library.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; extension String {<br class="gmail_msg">

&gt; func compared(to: Self) -&gt; SortOrder<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **Note:** `SortOrder` should bridge to `NSComparisonResult`.  It&#39;s also possible<br class="gmail_msg">

&gt; that the standard library simply adopts Foundation&#39;s `ComparisonResult` as is,<br class="gmail_msg">

&gt; but we believe the community should at least consider alternate naming before<br class="gmail_msg">

&gt; that happens.  There will be an opportunity to discuss the choices in detail<br class="gmail_msg">

&gt; when the modified<br class="gmail_msg">

&gt; [Comparison Proposal](<a href="https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e" rel="noreferrer" class="gmail_msg" target="_blank">https://gist.github.com/CodaFi/f0347bd37f1c407bf7ea0c429ead380e</a>) comes<br class="gmail_msg">

&gt; up for review.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### `String` should be a `Collection` of `Character`s Again<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In Swift 2.0, `String`&#39;s `Collection` conformance was dropped, because we<br class="gmail_msg">

&gt; convinced ourselves that its semantics differed from those of `Collection` too<br class="gmail_msg">

&gt; significantly.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; It was always well understood that if strings were treated as sequences of<br class="gmail_msg">

&gt; `UnicodeScalar`s, algorithms such as `lexicographicalCompare`, `elementsEqual`,<br class="gmail_msg">

&gt; and `reversed` would produce nonsense results. Thus, in Swift 1.0, `String` was<br class="gmail_msg">

&gt; a collection of `Character` (extended grapheme clusters). During 2.0<br class="gmail_msg">

&gt; development, though, we realized that correct string concatenation could<br class="gmail_msg">

&gt; occasionally merge distinct grapheme clusters at the start and end of combined<br class="gmail_msg">

&gt; strings.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This quirk aside, every aspect of strings-as-collections-of-graphemes appears to<br class="gmail_msg">

&gt; comport perfectly with Unicode. We think the concatenation problem is tolerable,<br class="gmail_msg">

&gt; because the cases where it occurs all represent partially-formed constructs. The<br class="gmail_msg">

&gt; largest class—isolated combining characters such as ◌́ (U+0301 COMBINING ACUTE<br class="gmail_msg">

&gt; ACCENT)—are explicitly called out in the Unicode standard as<br class="gmail_msg">

&gt; “[degenerate](<a href="http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)" rel="noreferrer" class="gmail_msg" target="_blank">http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)</a>” or<br class="gmail_msg">

&gt; “[defective](<a href="http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf)" rel="noreferrer" class="gmail_msg" target="_blank">http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf)</a>”. The other<br class="gmail_msg">

&gt; cases—such as a string ending in a zero-width joiner or half of a regional<br class="gmail_msg">

&gt; indicator—appear to be equally transient and unlikely outside of a text editor.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Admitting these cases encourages exploration of grapheme composition and is<br class="gmail_msg">

&gt; consistent with what appears to be an overall Unicode philosophy that “no<br class="gmail_msg">

&gt; special provisions are made to get marginally better behavior for… cases that<br class="gmail_msg">

&gt; never occur in practice.”[2] Furthermore, it seems<br class="gmail_msg">

&gt; unlikely to disturb the semantics of any plausible algorithms. We can handle<br class="gmail_msg">

&gt; these cases by documenting them, explicitly stating that the elements of a<br class="gmail_msg">

&gt; `String` are an emergent property based on Unicode rules.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The benefits of restoring `Collection` conformance are substantial:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Collection-like operations encourage experimentation with strings to<br class="gmail_msg">

&gt;    investigate and understand their behavior. This is useful for teaching new<br class="gmail_msg">

&gt;    programmers, but also good for experienced programmers who want to<br class="gmail_msg">

&gt;    understand more about strings/unicode.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Extended grapheme clusters form a natural element boundary for Unicode<br class="gmail_msg">

&gt;    strings.  For example, searching and matching operations will always produce<br class="gmail_msg">

&gt;    results that line up on grapheme cluster boundaries.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Character-by-character processing is a legitimate thing to do in many real<br class="gmail_msg">

&gt;    use-cases, including parsing, pattern matching, and language-specific<br class="gmail_msg">

&gt;    transformations such as transliteration.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * `Collection` conformance makes a wide variety of powerful operations<br class="gmail_msg">

&gt;    available that are appropriate to `String`&#39;s default role as the vehicle for<br class="gmail_msg">

&gt;    machine processed text.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;    The methods `String` would inherit from `Collection`, where similar to<br class="gmail_msg">

&gt;    higher-level string algorithms, have the right semantics.  For example,<br class="gmail_msg">

&gt;    grapheme-wise `lexicographicalCompare`, `elementsEqual`, and application of<br class="gmail_msg">

&gt;    `flatMap` with case-conversion, produce the same results one would expect<br class="gmail_msg">

&gt;    from whole-string ordering comparison, equality comparison, and<br class="gmail_msg">

&gt;    case-conversion, respectively.  `reverse` operates correctly on graphemes,<br class="gmail_msg">

&gt;    keeping diacritics moored to their base characters and leaving emoji intact.<br class="gmail_msg">

&gt;    Other methods such as `indexOf` and `contains` make obvious sense. A few<br class="gmail_msg">

&gt;    `Collection` methods, like `min` and `max`, may not be particularly useful<br class="gmail_msg">

&gt;    on `String`, but we don&#39;t consider that to be a problem worth solving, in<br class="gmail_msg">

&gt;    the same way that we wouldn&#39;t try to suppress `min` and `max` on a<br class="gmail_msg">

&gt;    `Set([UInt8])` that was used to store IP addresses.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Many of the higher-level operations that we want to provide for `String`s,<br class="gmail_msg">

&gt;    such as parsing and pattern matching, should apply to any `Collection`, and<br class="gmail_msg">

&gt;    many of the benefits we want for `Collections`, such<br class="gmail_msg">

&gt;    as unified slicing, should accrue<br class="gmail_msg">

&gt;    equally to `String`.  Making `String` part of the same protocol hierarchy<br class="gmail_msg">

&gt;    allows us to write these operations once and not worry about keeping the<br class="gmail_msg">

&gt;    benefits in sync.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Slicing strings into substrings is a crucial part of the vocabulary of<br class="gmail_msg">

&gt;    string processing, and all other sliceable things are `Collection`s.<br class="gmail_msg">

&gt;    Because of its collection-like behavior, users naturally think of `String`<br class="gmail_msg">

&gt;    in collection terms, but run into frustrating limitations where it fails to<br class="gmail_msg">

&gt;    conform and are left to wonder where all the differences lie.  Many simply<br class="gmail_msg">

&gt;    “correct” this limitation by declaring a trivial conformance:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;    ```swift<br class="gmail_msg">

&gt;  extension String : BidirectionalCollection {}<br class="gmail_msg">

&gt;    ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;    Even if we removed indexing-by-element from `String`, users could still do<br class="gmail_msg">

&gt;    this:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;    ```swift<br class="gmail_msg">

&gt;      extension String : BidirectionalCollection {<br class="gmail_msg">

&gt;        subscript(i: Index) -&gt; Character { return characters[i] }<br class="gmail_msg">

&gt;      }<br class="gmail_msg">

&gt;    ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;    It would be much better to legitimize the conformance to `Collection` and<br class="gmail_msg">

&gt;    simply document the oddity of any concatenation corner-cases, than to deny<br class="gmail_msg">

&gt;    users the benefits on the grounds that a few cases are confusing.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Note that the fact that `String` is a collection of graphemes does *not* mean<br class="gmail_msg">

&gt; that string operations will necessarily have to do grapheme boundary<br class="gmail_msg">

&gt; recognition.  See the Unicode protocol section for details.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### `Character` and `CharacterSet`<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; `Character`, which represents a<br class="gmail_msg">

&gt; Unicode<br class="gmail_msg">

&gt; [extended grapheme cluster](<a href="http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries" rel="noreferrer" class="gmail_msg" target="_blank">http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries</a>),<br class="gmail_msg">

&gt; is a bit of a black box, requiring conversion to `String` in order to<br class="gmail_msg">

&gt; do any introspection, including interoperation with ASCII.  To fix this, we should:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; - Add a `unicodeScalars` view much like `String`&#39;s, so that the sub-structure<br class="gmail_msg">

&gt;   of grapheme clusters is discoverable.<br class="gmail_msg">

&gt; - Add a failable `init` from sequences of scalars (returning nil for sequences<br class="gmail_msg">

&gt;   that contain 0 or 2+ graphemes).<br class="gmail_msg">

&gt; - (Lower priority) expose some operations, such as `func uppercase() -&gt;<br class="gmail_msg">

&gt;   String`, `var isASCII: Bool`, and, to the extent they can be sensibly<br class="gmail_msg">

&gt;   generalized, queries of unicode properties that should also be exposed on<br class="gmail_msg">

&gt;   `UnicodeScalar` such as `isAlphabetic` and `isGraphemeBase` .<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Despite its name, `CharacterSet` currently operates on the Swift `UnicodeScalar`<br class="gmail_msg">

&gt; type. This means it is usable on `String`, but only by going through the unicode<br class="gmail_msg">

&gt; scalar view. To deal with this clash in the short term, `CharacterSet` should be<br class="gmail_msg">

&gt; renamed to `UnicodeScalarSet`.  In the longer term, it may be appropriate to<br class="gmail_msg">

&gt; introduce a `CharacterSet` that provides similar functionality for extended<br class="gmail_msg">

&gt; grapheme clusters.[5]<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Unification of Slicing Operations<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Creating substrings is a basic part of String processing, but the slicing<br class="gmail_msg">

&gt; operations that we have in Swift are inconsistent in both their spelling and<br class="gmail_msg">

&gt; their naming:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Slices with two explicit endpoints are done with subscript, and support<br class="gmail_msg">

&gt;    in-place mutation:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;    ```swift<br class="gmail_msg">

&gt;        s[i..&lt;j].mutate()<br class="gmail_msg">

&gt;    ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Slicing from an index to the end, or from the start to an index, is done<br class="gmail_msg">

&gt;    with a method and does not support in-place mutation:<br class="gmail_msg">

&gt;    ```swift<br class="gmail_msg">

&gt;        s.prefix(upTo: i).readOnly()<br class="gmail_msg">

&gt;    ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Prefix and suffix operations should be migrated to be subscripting operations<br class="gmail_msg">

&gt; with one-sided ranges i.e. `s.prefix(upTo: i)` should become `s[..&lt;i]`, as<br class="gmail_msg">

&gt; in<br class="gmail_msg">

&gt; [this proposal](<a href="https://github.com/apple/swift-evolution/blob/9cf2685293108ea3efcbebb7ee6a8618b83d4a90/proposals/0132-sequence-end-ops.md" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/apple/swift-evolution/blob/9cf2685293108ea3efcbebb7ee6a8618b83d4a90/proposals/0132-sequence-end-ops.md</a>).<br class="gmail_msg">

&gt; With generic subscripting in the language, that will allow us to collapse a wide<br class="gmail_msg">

&gt; variety of methods and subscript overloads into a single implementation, and<br class="gmail_msg">

&gt; give users an easy-to-use and composable way to describe subranges.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Further extending this EDSL to integrate use-cases like `s.prefix(maxLength: 5)`<br class="gmail_msg">

&gt; is an ongoing research project that can be considered part of the potential<br class="gmail_msg">

&gt; long-term vision of text (and collection) processing.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Substrings<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; When implementing substring slicing, languages are faced with three options:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 1. Make the substrings the same type as string, and share storage.<br class="gmail_msg">

&gt; 2. Make the substrings the same type as string, and copy storage when making the substring.<br class="gmail_msg">

&gt; 3. Make substrings a different type, with a storage copy on conversion to string.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; We think number 3 is the best choice. A walk-through of the tradeoffs follows.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Same type, shared storage<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In Swift 3.0, slicing a `String` produces a new `String` that is a view into a<br class="gmail_msg">

&gt; subrange of the original `String`&#39;s storage. This is why `String` is 3 words in<br class="gmail_msg">

&gt; size (the start, length and buffer owner), unlike the similar `Array` type<br class="gmail_msg">

&gt; which is only one.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This is a simple model with big efficiency gains when chopping up strings into<br class="gmail_msg">

&gt; multiple smaller strings. But it does mean that a stored substring keeps the<br class="gmail_msg">

&gt; entire original string buffer alive even after it would normally have been<br class="gmail_msg">

&gt; released.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This arrangement has proven to be problematic in other programming languages,<br class="gmail_msg">

&gt; because applications sometimes extract small strings from large ones and keep<br class="gmail_msg">

&gt; those small strings long-term. That is considered a memory leak and was enough<br class="gmail_msg">

&gt; of a problem in Java that they changed from substrings sharing storage to<br class="gmail_msg">

&gt; making a copy in 1.7.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Same type, copied storage<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Copying of substrings is also the choice made in C#, and in the default<br class="gmail_msg">

&gt; `NSString` implementation. This approach avoids the memory leak issue, but has<br class="gmail_msg">

&gt; obvious performance overhead in performing the copies.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This in turn encourages trafficking in string/range pairs instead of in<br class="gmail_msg">

&gt; substrings, for performance reasons, leading to API challenges. For example:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; foo.compare(bar, range: start..&lt;end)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Here, it is not clear whether `range` applies to `foo` or `bar`. This<br class="gmail_msg">

&gt; relationship is better expressed in Swift as a slicing operation:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; foo[start..&lt;end].compare(bar)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Not only does this clarify to which string the range applies, it also brings<br class="gmail_msg">

&gt; this sub-range capability to any API that operates on `String` &quot;for free&quot;. So<br class="gmail_msg">

&gt; these other combinations also work equally well:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; // apply range on argument rather than target<br class="gmail_msg">

&gt; foo.compare(bar[start..&lt;end])<br class="gmail_msg">

&gt; // apply range on both<br class="gmail_msg">

&gt; foo[start..&lt;end].compare(bar[start1..&lt;end1])<br class="gmail_msg">

&gt; // compare two strings ignoring first character<br class="gmail_msg">

&gt; foo.dropFirst().compare(bar.dropFirst())<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In all three cases, an explicit range argument need not appear on the `compare`<br class="gmail_msg">

&gt; method itself. The implementation of `compare` does not need to know anything<br class="gmail_msg">

&gt; about ranges. Methods need only take range arguments when that was an<br class="gmail_msg">

&gt; integral part of their purpose (for example, setting the start and end of a<br class="gmail_msg">

&gt; user&#39;s current selection in a text box).<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Different type, shared storage<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The desire to share underlying storage while preventing accidental memory leaks<br class="gmail_msg">

&gt; occurs with slices of `Array`. For this reason we have an `ArraySlice` type.<br class="gmail_msg">

&gt; The inconvenience of a separate type is mitigated by most operations used on<br class="gmail_msg">

&gt; `Array` from the standard library being generic over `Sequence` or `Collection`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; We should apply the same approach for `String` by introducing a distinct<br class="gmail_msg">

&gt; `SubSequence` type, `Substring`. Similar advice given for `ArraySlice` would apply to `Substring`:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;&gt; Important: Long-term storage of `Substring` instances is discouraged. A<br class="gmail_msg">

&gt;&gt; substring holds a reference to the entire storage of a larger string, not<br class="gmail_msg">

&gt;&gt; just to the portion it presents, even after the original string&#39;s lifetime<br class="gmail_msg">

&gt;&gt; ends. Long-term storage of a `Substring` may therefore prolong the lifetime<br class="gmail_msg">

&gt;&gt; of large strings that are no longer otherwise accessible, which can appear<br class="gmail_msg">

&gt;&gt; to be memory leakage.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; When assigning a `Substring` to a longer-lived variable (usually a stored<br class="gmail_msg">

&gt; property) explicitly of type `String`, a type conversion will be performed, and<br class="gmail_msg">

&gt; at this point the substring buffer is copied and the original string&#39;s storage<br class="gmail_msg">

&gt; can be released.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; A `String` that was not its own `Substring` could be one word—a single tagged<br class="gmail_msg">

&gt; pointer—without requiring additional allocations. `Substring`s would be a view<br class="gmail_msg">

&gt; onto a `String`, so are 3 words - pointer to owner, pointer to start, and a<br class="gmail_msg">

&gt; length. The small string optimization for `Substring` would take advantage of<br class="gmail_msg">

&gt; the larger size, probably with a less compressed encoding for speed.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The downside of having two types is the inconvenience of sometimes having a<br class="gmail_msg">

&gt; `Substring` when you need a `String`, and vice-versa. It is likely this would<br class="gmail_msg">

&gt; be a significantly bigger problem than with `Array` and `ArraySlice`, as<br class="gmail_msg">

&gt; slicing of `String` is such a common operation. It is especially relevant to<br class="gmail_msg">

&gt; existing code that assumes `String` is the currency type. To ease the pain of<br class="gmail_msg">

&gt; type mismatches, `Substring` should be a subtype of `String` in the same way<br class="gmail_msg">

&gt; that `Int` is a subtype of `Optional&lt;Int&gt;`. This would give users an implicit<br class="gmail_msg">

&gt; conversion from `Substring` to `String`, as well as the usual implicit<br class="gmail_msg">

&gt; conversions such as `[Substring]` to `[String]` that other subtype<br class="gmail_msg">

&gt; relationships receive.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In most cases, type inference combined with the subtype relationship should<br class="gmail_msg">

&gt; make the type difference a non-issue and users will not care which type they<br class="gmail_msg">

&gt; are using. For flexibility and optimizability, most operations from the<br class="gmail_msg">

&gt; standard library will traffic in generic models of<br class="gmail_msg">

&gt; [`Unicode`](#the--code-unicode--code--protocol).<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ##### Guidance for API Designers<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In this model, **if a user is unsure about which type to use, `String` is always<br class="gmail_msg">

&gt; a reasonable default**. A `Substring` passed where `String` is expected will be<br class="gmail_msg">

&gt; implicitly copied. When compared to the “same type, copied storage” model, we<br class="gmail_msg">

&gt; have effectively deferred the cost of copying from the point where a substring<br class="gmail_msg">

&gt; is created until it must be converted to `String` for use with an API.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; A user who needs to optimize away copies altogether should use this guideline:<br class="gmail_msg">

&gt; if for performance reasons you are tempted to add a `Range` argument to your<br class="gmail_msg">

&gt; method as well as a `String` to avoid unnecessary copies, you should instead<br class="gmail_msg">

&gt; use `Substring`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ##### The “Empty Subscript”<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; To make it easy to call such an optimized API when you only have a `String` (or<br class="gmail_msg">

&gt; to call any API that takes a `Collection`&#39;s `SubSequence` when all you have is<br class="gmail_msg">

&gt; the `Collection`), we propose the following “empty subscript” operation,<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; extension Collection {<br class="gmail_msg">

&gt;  subscript() -&gt; SubSequence {<br class="gmail_msg">

&gt;    return self[startIndex..&lt;endIndex]<br class="gmail_msg">

&gt;  }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; which allows the following usage:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; funcThatIsJustLooking(at: <a href="http://person.name" rel="noreferrer" class="gmail_msg" target="_blank">person.name</a>[]) // pass <a href="http://person.name" rel="noreferrer" class="gmail_msg" target="_blank">person.name</a> as Substring<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The `[]` syntax can be offered as a fixit when needed, similar to `&amp;` for an<br class="gmail_msg">

&gt; `inout` argument. While it doesn&#39;t help a user to convert `[String]` to<br class="gmail_msg">

&gt; `[Substring]`, the need for such conversions is extremely rare, can be done with<br class="gmail_msg">

&gt; a simple `map` (which could also be offered by a fixit):<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; takesAnArrayOfSubstring(arrayOfString.map { $0[] })<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Other Options Considered<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; As we have seen, all three options above have downsides, but it&#39;s possible<br class="gmail_msg">

&gt; these downsides could be eliminated/mitigated by the compiler. We are proposing<br class="gmail_msg">

&gt; one such mitigation—implicit conversion—as part of the the &quot;different type,<br class="gmail_msg">

&gt; shared storage&quot; option, to help avoid the cognitive load on developers of<br class="gmail_msg">

&gt; having to deal with a separate `Substring` type.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; To avoid the memory leak issues of a &quot;same type, shared storage&quot; substring<br class="gmail_msg">

&gt; option, we considered whether the compiler could perform an implicit copy of<br class="gmail_msg">

&gt; the underlying storage when it detects the string is being &quot;stored&quot; for long<br class="gmail_msg">

&gt; term usage, say when it is assigned to a stored property. The trouble with this<br class="gmail_msg">

&gt; approach is it is very difficult for the compiler to distinguish between<br class="gmail_msg">

&gt; long-term storage versus short-term in the case of abstractions that rely on<br class="gmail_msg">

&gt; stored properties. For example, should the storing of a substring inside an<br class="gmail_msg">

&gt; `Optional` be considered long-term? Or the storing of multiple substrings<br class="gmail_msg">

&gt; inside an array? The latter would not work well in the case of a<br class="gmail_msg">

&gt; `components(separatedBy:)` implementation that intended to return an array of<br class="gmail_msg">

&gt; substrings. It would also be difficult to distinguish intentional medium-term<br class="gmail_msg">

&gt; storage of substrings, say by a lexer. There does not appear to be an effective<br class="gmail_msg">

&gt; consistent rule that could be applied in the general case for detecting when a<br class="gmail_msg">

&gt; substring is truly being stored long-term.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; To avoid the cost of copying substrings under &quot;same type, copied storage&quot;, the<br class="gmail_msg">

&gt; optimizer could be enhanced to to reduce the impact of some of those copies.<br class="gmail_msg">

&gt; For example, this code could be optimized to pull the invariant substring out<br class="gmail_msg">

&gt; of the loop:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; for _ in 0..&lt;lots {<br class="gmail_msg">

&gt;  someFunc(takingString: bigString[bigRange])<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; It&#39;s worth noting that a similar optimization is needed to avoid an equivalent<br class="gmail_msg">

&gt; problem with implicit conversion in the &quot;different type, shared storage&quot; case:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; let substring = bigString[bigRange]<br class="gmail_msg">

&gt; for _ in 0..&lt;lots { someFunc(takingString: substring) }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; However, in the case of &quot;same type, copied storage&quot; there are many use cases<br class="gmail_msg">

&gt; that cannot be optimized as easily. Consider the following simple definition of<br class="gmail_msg">

&gt; a recursive `contains` algorithm, which when substring slicing is linear makes<br class="gmail_msg">

&gt; the overall algorithm quadratic:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; extension String {<br class="gmail_msg">

&gt;    func containsChar(_ x: Character) -&gt; Bool {<br class="gmail_msg">

&gt;        return !isEmpty &amp;&amp; (first == x || dropFirst().containsChar(x))<br class="gmail_msg">

&gt;    }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; For the optimizer to eliminate this problem is unrealistic, forcing the user to<br class="gmail_msg">

&gt; remember to optimize the code to not use string slicing if they want it to be<br class="gmail_msg">

&gt; efficient (assuming they remember):<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; extension String {<br class="gmail_msg">

&gt;    // add optional argument tracking progress through the string<br class="gmail_msg">

&gt;    func containsCharacter(_ x: Character, atOrAfter idx: Index? = nil) -&gt; Bool {<br class="gmail_msg">

&gt;        let idx = idx ?? startIndex<br class="gmail_msg">

&gt;        return idx != endIndex<br class="gmail_msg">

&gt;            &amp;&amp; (self[idx] == x || containsCharacter(x, atOrAfter: index(after: idx)))<br class="gmail_msg">

&gt;    }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Substrings, Ranges and Objective-C Interop<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The pattern of passing a string/range pair is common in several Objective-C<br class="gmail_msg">

&gt; APIs, and is made especially awkward in Swift by the non-interchangeability of<br class="gmail_msg">

&gt; `Range&lt;String.Index&gt;` and `NSRange`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; s2.find(s2, sourceRange: NSRange(j..&lt;s2.endIndex, in: s2))<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In general, however, the Swift idiom for operating on a sub-range of a<br class="gmail_msg">

&gt; `Collection` is to *slice* the collection and operate on that:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; s2.find(s2[j..&lt;s2.endIndex])<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Therefore, APIs that operate on an `NSString`/`NSRange` pair should be imported<br class="gmail_msg">

&gt; without the `NSRange` argument.  The Objective-C importer should be changed to<br class="gmail_msg">

&gt; give these APIs special treatment so that when a `Substring` is passed, instead<br class="gmail_msg">

&gt; of being converted to a `String`, the full `NSString` and range are passed to<br class="gmail_msg">

&gt; the Objective-C method, thereby avoiding a copy.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; As a result, you would never need to pass an `NSRange` to these APIs, which<br class="gmail_msg">

&gt; solves the impedance problem by eliminating the argument, resulting in more<br class="gmail_msg">

&gt; idiomatic Swift code while retaining the performance benefit.  To help users<br class="gmail_msg">

&gt; manually handle any cases that remain, Foundation should be augmented to allow<br class="gmail_msg">

&gt; the following syntax for converting to and from `NSRange`:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; let nsr = NSRange(i..&lt;j, in: s) // An NSRange corresponding to s[i..&lt;j]<br class="gmail_msg">

&gt; let iToJ = Range(nsr, in: s)    // Equivalent to i..&lt;j<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### The `Unicode` protocol<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; With `Substring` and `String` being distinct types and sharing almost all<br class="gmail_msg">

&gt; interface and semantics, and with the highest-performance string processing<br class="gmail_msg">

&gt; requiring knowledge of encoding and layout that the currency types can&#39;t<br class="gmail_msg">

&gt; provide, it becomes important to capture the common “string API” in a protocol.<br class="gmail_msg">

&gt; Since Unicode conformance is a key feature of string processing in swift, we<br class="gmail_msg">

&gt; call that protocol `Unicode`:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **Note:** The following assumes several features that are planned but not yet implemented in<br class="gmail_msg">

&gt;  Swift, and should be considered a sketch rather than a final design.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; protocol Unicode<br class="gmail_msg">

&gt;  : Comparable, BidirectionalCollection where Element == Character {<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  associatedtype Encoding : UnicodeEncoding<br class="gmail_msg">

&gt;  var encoding: Encoding { get }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  associatedtype CodeUnits<br class="gmail_msg">

&gt;    : RandomAccessCollection where Element == Encoding.CodeUnit<br class="gmail_msg">

&gt;  var codeUnits: CodeUnits { get }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  associatedtype UnicodeScalars<br class="gmail_msg">

&gt;    : BidirectionalCollection  where Element == UnicodeScalar<br class="gmail_msg">

&gt;  var unicodeScalars: UnicodeScalars { get }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  associatedtype ExtendedASCII<br class="gmail_msg">

&gt;    : BidirectionalCollection where Element == UInt32<br class="gmail_msg">

&gt;  var extendedASCII: ExtendedASCII { get }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  var unicodeScalars: UnicodeScalars { get }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; extension Unicode {<br class="gmail_msg">

&gt;  // ... define high-level non-mutating string operations, e.g. search ...<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  func compared&lt;Other: Unicode&gt;(<br class="gmail_msg">

&gt;    to rhs: Other,<br class="gmail_msg">

&gt;    case caseSensitivity: StringSensitivity? = nil,<br class="gmail_msg">

&gt;    diacritic diacriticSensitivity: StringSensitivity? = nil,<br class="gmail_msg">

&gt;    width widthSensitivity: StringSensitivity? = nil,<br class="gmail_msg">

&gt;    in locale: Locale? = nil<br class="gmail_msg">

&gt;  ) -&gt; SortOrder { ... }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; extension Unicode : RangeReplaceableCollection where CodeUnits :<br class="gmail_msg">

&gt;  RangeReplaceableCollection {<br class="gmail_msg">

&gt;    // Satisfy protocol requirement<br class="gmail_msg">

&gt;    mutating func replaceSubrange&lt;C : Collection&gt;(_: Range&lt;Index&gt;, with: C)<br class="gmail_msg">

&gt;      where C.Element == Element<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  // ... define high-level mutating string operations, e.g. replace ...<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The goal is that `Unicode` exposes the underlying encoding and code units in<br class="gmail_msg">

&gt; such a way that for types with a known representation (e.g. a high-performance<br class="gmail_msg">

&gt; `UTF8String`) that information can be known at compile-time and can be used to<br class="gmail_msg">

&gt; generate a single path, while still allowing types like `String` that admit<br class="gmail_msg">

&gt; multiple representations to use runtime queries and branches to fast path<br class="gmail_msg">

&gt; specializations.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; **Note:** `Unicode` would make a fantastic namespace for much of<br class="gmail_msg">

&gt; what&#39;s in this proposal if we could get the ability to nest types and<br class="gmail_msg">

&gt; protocols in protocols.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Scanning, Matching, and Tokenization<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Low-Level Textual Analysis<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; We should provide convenient APIs processing strings by character.  For example,<br class="gmail_msg">

&gt; it should be easy to cleanly express, “if this string starts with `&quot;f&quot;`, process<br class="gmail_msg">

&gt; the rest of the string as follows…”  Swift is well-suited to expressing this<br class="gmail_msg">

&gt; common pattern beautifully, but we need to add the APIs.  Here are two examples<br class="gmail_msg">

&gt; of the sort of code that might be possible given such APIs:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; if let firstLetter = input.droppingPrefix(alphabeticCharacter) {<br class="gmail_msg">

&gt;  somethingWith(input) // process the rest of input<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; if let (number, restOfInput) = input.parsingPrefix(Int.self) {<br class="gmail_msg">

&gt;   ...<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The specific spelling and functionality of APIs like this are TBD.  The larger<br class="gmail_msg">

&gt; point is to make sure matching-and-consuming jobs are well-supported.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Unified Pattern Matcher Protocol<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Many of the current methods that do matching are overloaded to do the same<br class="gmail_msg">

&gt; logical operations in different ways, with the following axes:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; - Logical Operation: `find`, `split`, `replace`, match at start<br class="gmail_msg">

&gt; - Kind of pattern: `CharacterSet`, `String`, a regex, a closure<br class="gmail_msg">

&gt; - Options, e.g. case/diacritic sensitivity, locale.  Sometimes a part of<br class="gmail_msg">

&gt;  the method name, and sometimes an argument<br class="gmail_msg">

&gt; - Whole string or subrange.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; We should represent these aspects as orthogonal, composable components,<br class="gmail_msg">

&gt; abstracting pattern matchers into a protocol like<br class="gmail_msg">

&gt; [this one](<a href="https://github.com/apple/swift/blob/master/test/Prototypes/PatternMatching.swift#L33" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/apple/swift/blob/master/test/Prototypes/PatternMatching.swift#L33</a>),<br class="gmail_msg">

&gt; that can allow us to define logical operations once, without introducing<br class="gmail_msg">

&gt; overloads, and massively reducing API surface area.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; For example, using the strawman prefix `%` syntax to turn string literals into<br class="gmail_msg">

&gt; patterns, the following pairs would all invoke the same generic methods:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; if let found = s.firstMatch(%&quot;searchString&quot;) { ... }<br class="gmail_msg">

&gt; if let found = s.firstMatch(someRegex) { ... }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; for m in s.allMatches((%&quot;searchString&quot;), case: .insensitive) { ... }<br class="gmail_msg">

&gt; for m in s.allMatches(someRegex) { ... }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; let items = s.split(separatedBy: &quot;, &quot;)<br class="gmail_msg">

&gt; let tokens = s.split(separatedBy: CharacterSet.whitespace)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Note that, because Swift requires the indices of a slice to match the indices of<br class="gmail_msg">

&gt; the range from which it was sliced, operations like `firstMatch` can return a<br class="gmail_msg">

&gt; `Substring?` in lieu of a `Range&lt;String.Index&gt;?`: the indices of the match in<br class="gmail_msg">

&gt; the string being searched, if needed, can easily be recovered as the<br class="gmail_msg">

&gt; `startIndex` and `endIndex` of the `Substring`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Note also that matching operations are useful for collections in general, and<br class="gmail_msg">

&gt; would fall out of this proposal:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt; // replace subsequences of contiguous NaNs with zero<br class="gmail_msg">

&gt; forces.replace(oneOrMore([Float.nan]), [0.0])<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Regular Expressions<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Addressing regular expressions is out of scope for this proposal.<br class="gmail_msg">

&gt; That said, it is important that to note the pattern matching protocol mentioned<br class="gmail_msg">

&gt; above provides a suitable foundation for regular expressions, and types such as<br class="gmail_msg">

&gt; `NSRegularExpression` can easily be retrofitted to conform to it.  In the<br class="gmail_msg">

&gt; future, support for regular expression literals in the compiler could allow for<br class="gmail_msg">

&gt; compile-time syntax checking and optimization.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### String Indices<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; `String` currently has four views—`characters`, `unicodeScalars`, `utf8`, and<br class="gmail_msg">

&gt; `utf16`—each with its own opaque index type.  The APIs used to translate indices<br class="gmail_msg">

&gt; between views add needless complexity, and the opacity of indices makes them<br class="gmail_msg">

&gt; difficult to serialize.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The index translation problem has two aspects:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  1. `String` views cannot consume one anothers&#39; indices without a cumbersome<br class="gmail_msg">

&gt;    conversion step.  An index into a `String`&#39;s `characters` must be translated<br class="gmail_msg">

&gt;    before it can be used as a position in its `unicodeScalars`.  Although these<br class="gmail_msg">

&gt;    translations are rarely needed, they add conceptual and API complexity.<br class="gmail_msg">

&gt;  2. Many APIs in the core libraries and other frameworks still expose `String`<br class="gmail_msg">

&gt;    positions as `Int`s and regions as `NSRange`s, which can only reference a<br class="gmail_msg">

&gt;    `utf16` view and interoperate poorly with `String` itself.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Index Interchange Among Views<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; String&#39;s need for flexible backing storage and reasonably-efficient indexing<br class="gmail_msg">

&gt; (i.e. without dynamically allocating and reference-counting the indices<br class="gmail_msg">

&gt; themselves) means indices need an efficient underlying storage type.  Although<br class="gmail_msg">

&gt; we do not wish to expose `String`&#39;s indices *as* integers, `Int` offsets into<br class="gmail_msg">

&gt; underlying code unit storage makes a good underlying storage type, provided<br class="gmail_msg">

&gt; `String`&#39;s underlying storage supports random-access.  We think random-access<br class="gmail_msg">

&gt; *code-unit storage* is a reasonable requirement to impose on all `String`<br class="gmail_msg">

&gt; instances.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Making these `Int` code unit offsets conveniently accessible and constructible<br class="gmail_msg">

&gt; solves the serialization problem:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; clipboard.write(s.endIndex.codeUnitOffset)<br class="gmail_msg">

&gt; let offset = clipboard.read(Int.self)<br class="gmail_msg">

&gt; let i = String.Index(codeUnitOffset: offset)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Index interchange between `String` and its `unicodeScalars`, `codeUnits`,<br class="gmail_msg">

&gt; and [`extendedASCII`](#parsing-ascii-structure) views can be made entirely<br class="gmail_msg">

&gt; seamless by having them share an index type (semantics of indexing a `String`<br class="gmail_msg">

&gt; between grapheme cluster boundaries are TBD—it can either trap or be forgiving).<br class="gmail_msg">

&gt; Having a common index allows easy traversal into the interior of graphemes,<br class="gmail_msg">

&gt; something that is often needed, without making it likely that someone will do it<br class="gmail_msg">

&gt; by accident.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; - `String.index(after:)` should advance to the next grapheme, even when the<br class="gmail_msg">

&gt;   index points partway through a grapheme.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; - `String.index(before:)` should move to the start of the grapheme before<br class="gmail_msg">

&gt;   the current position.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Seamless index interchange between `String` and its UTF-8 or UTF-16 views is not<br class="gmail_msg">

&gt; crucial, as the specifics of encoding should not be a concern for most use<br class="gmail_msg">

&gt; cases, and would impose needless costs on the indices of other views.  That<br class="gmail_msg">

&gt; said, we can make translation much more straightforward by exposing simple<br class="gmail_msg">

&gt; bidirectional converting `init`s on both index types:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; let u8Position = String.UTF8.Index(someStringIndex)<br class="gmail_msg">

&gt; let originalPosition = String.Index(u8Position)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Index Interchange with Cocoa<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; We intend to address `NSRange`s that denote substrings in Cocoa APIs as<br class="gmail_msg">

&gt; described [later in this document](#substrings--ranges-and-objective-c-interop).<br class="gmail_msg">

&gt; That leaves the interchange of bare indices with Cocoa APIs trafficking in<br class="gmail_msg">

&gt; `Int`.  Hopefully such APIs will be rare, but when needed, the following<br class="gmail_msg">

&gt; extension, which would be useful for all `Collections`, can help:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; extension Collection {<br class="gmail_msg">

&gt;  func index(offset: IndexDistance) -&gt; Index {<br class="gmail_msg">

&gt;    return index(startIndex, offsetBy: offset)<br class="gmail_msg">

&gt;  }<br class="gmail_msg">

&gt;  func offset(of i: Index) -&gt; IndexDistance {<br class="gmail_msg">

&gt;    return distance(from: startIndex, to: i)<br class="gmail_msg">

&gt;  }<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Then integers can easily be translated into offsets into a `String`&#39;s `utf16`<br class="gmail_msg">

&gt; view for consumption by Cocoa:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; let cocoaIndex = s.utf16.offset(of: String.UTF16Index(i))<br class="gmail_msg">

&gt; let swiftIndex = s.utf16.index(offset: cocoaIndex)<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Formatting<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; A full treatment of formatting is out of scope of this proposal, but<br class="gmail_msg">

&gt; we believe it&#39;s crucial for completing the text processing picture.  This<br class="gmail_msg">

&gt; section details some of the existing issues and thinking that may guide future<br class="gmail_msg">

&gt; development.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Printf-Style Formatting<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; `String.format` is designed on the `printf` model: it takes a format string with<br class="gmail_msg">

&gt; textual placeholders for substitution, and an arbitrary list of other arguments.<br class="gmail_msg">

&gt; The syntax and meaning of these placeholders has a long history in<br class="gmail_msg">

&gt; C, but for anyone who doesn&#39;t use them regularly they are cryptic and complex,<br class="gmail_msg">

&gt; as the `printf (3)` man page attests.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Aside from complexity, this style of API has two major problems: First, the<br class="gmail_msg">

&gt; spelling of these placeholders must match up to the types of the arguments, in<br class="gmail_msg">

&gt; the right order, or the behavior is undefined.  Some limited support for<br class="gmail_msg">

&gt; compile-time checking of this correspondence could be implemented, but only for<br class="gmail_msg">

&gt; the cases where the format string is a literal. Second, there&#39;s no reasonable<br class="gmail_msg">

&gt; way to extend the formatting vocabulary to cover the needs of new types: you are<br class="gmail_msg">

&gt; stuck with what&#39;s in the box.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### Foundation Formatters<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; The formatters supplied by Foundation are highly capable and versatile, offering<br class="gmail_msg">

&gt; both formatting and parsing services.  When used for formatting, though, the<br class="gmail_msg">

&gt; design pattern demands more from users than it should:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * Matching the type of data being formatted to a formatter type<br class="gmail_msg">

&gt;  * Creating an instance of that type<br class="gmail_msg">

&gt;  * Setting stateful options (`currency`, `dateStyle`) on the type.  Note: the<br class="gmail_msg">

&gt;    need for this step prevents the instance from being used and discarded in<br class="gmail_msg">

&gt;    the same expression where it is created.<br class="gmail_msg">

&gt;  * Overall, introduction of needless verbosity into source<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; These may seem like small issues, but the experience of Apple localization<br class="gmail_msg">

&gt; experts is that the total drag of these factors on programmers is such that they<br class="gmail_msg">

&gt; tend to reach for `String.format` instead.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; #### String Interpolation<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Swift string interpolation provides a user-friendly alternative to printf&#39;s<br class="gmail_msg">

&gt; domain-specific language (just write ordinary swift code!) and its type safety<br class="gmail_msg">

&gt; problems (put the data right where it belongs!) but the following issues prevent<br class="gmail_msg">

&gt; it from being useful for localized formatting (among other jobs):<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  * [SR-2303](<a href="https://bugs.swift.org/browse/SR-2303" rel="noreferrer" class="gmail_msg" target="_blank">https://bugs.swift.org/browse/SR-2303</a>) We are unable to restrict<br class="gmail_msg">

&gt;    types used in string interpolation.<br class="gmail_msg">

&gt;  * [SR-1260](<a href="https://bugs.swift.org/browse/SR-1260" rel="noreferrer" class="gmail_msg" target="_blank">https://bugs.swift.org/browse/SR-1260</a>) String interpolation can&#39;t<br class="gmail_msg">

&gt;    distinguish (fragments of) the base string from the string substitutions.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In the long run, we should improve Swift string interpolation to the point where<br class="gmail_msg">

&gt; it can participate in most any formatting job.  Mostly this centers around<br class="gmail_msg">

&gt; fixing the interpolation protocols per the previous item, and supporting<br class="gmail_msg">

&gt; localization.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; To be able to use formatting effectively inside interpolations, it needs to be<br class="gmail_msg">

&gt; both lightweight (because it all happens in-situ) and discoverable.  One<br class="gmail_msg">

&gt; approach would be to standardize on `format` methods, e.g.:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; &quot;Column 1: \(n.format(radix:16, width:8)) *** \(message)&quot;<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; &quot;Something with leading zeroes: \(x.format(fill: zero, width:8))&quot;<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### C String Interop<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Our support for interoperation with nul-terminated C strings is scattered and<br class="gmail_msg">

&gt; incoherent, with 6 ways to transform a C string into a `String` and four ways to<br class="gmail_msg">

&gt; do the inverse.  These APIs should be replaced with the following<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; extension String {<br class="gmail_msg">

&gt;  /// Constructs a `String` having the same contents as `nulTerminatedUTF8`.<br class="gmail_msg">

&gt;  ///<br class="gmail_msg">

&gt;  /// - Parameter nulTerminatedUTF8: a sequence of contiguous UTF-8 encoded<br class="gmail_msg">

&gt;  ///   bytes ending just before the first zero byte (NUL character).<br class="gmail_msg">

&gt;  init(cString nulTerminatedUTF8: UnsafePointer&lt;CChar&gt;)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  /// Constructs a `String` having the same contents as `nulTerminatedCodeUnits`.<br class="gmail_msg">

&gt;  ///<br class="gmail_msg">

&gt;  /// - Parameter nulTerminatedCodeUnits: a sequence of contiguous code units in<br class="gmail_msg">

&gt;  ///   the given `encoding`, ending just before the first zero code unit.<br class="gmail_msg">

&gt;  /// - Parameter encoding: describes the encoding in which the code units<br class="gmail_msg">

&gt;  ///   should be interpreted.<br class="gmail_msg">

&gt;  init&lt;Encoding: UnicodeEncoding&gt;(<br class="gmail_msg">

&gt;    cString nulTerminatedCodeUnits: UnsafePointer&lt;Encoding.CodeUnit&gt;,<br class="gmail_msg">

&gt;    encoding: Encoding)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  /// Invokes the given closure on the contents of the string, represented as a<br class="gmail_msg">

&gt;  /// pointer to a null-terminated sequence of UTF-8 code units.<br class="gmail_msg">

&gt;  func withCString&lt;Result&gt;(<br class="gmail_msg">

&gt;    _ body: (UnsafePointer&lt;CChar&gt;) throws -&gt; Result) rethrows -&gt; Result<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; In both of the construction APIs, any invalid encoding sequence detected will<br class="gmail_msg">

&gt; have its longest valid prefix replaced by U+FFFD, the Unicode replacement<br class="gmail_msg">

&gt; character, per Unicode specification.  This covers the common case.  The<br class="gmail_msg">

&gt; replacement is done *physically* in the underlying storage and the validity of<br class="gmail_msg">

&gt; the result is recorded in the `String`&#39;s `encoding` such that future accesses<br class="gmail_msg">

&gt; need not be slowed down by possible error repair separately.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Construction that is aborted when encoding errors are detected can be<br class="gmail_msg">

&gt; accomplished using APIs on the `encoding`.  String types that retain their<br class="gmail_msg">

&gt; physical encoding even in the presence of errors and are repaired on-the-fly can<br class="gmail_msg">

&gt; be built as different instances of the `Unicode` protocol.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Unicode 9 Conformance<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Unicode 9 (and MacOS 10.11) brought us support for family emoji, which changes<br class="gmail_msg">

&gt; the process of properly identifying `Character` boundaries.  We need to update<br class="gmail_msg">

&gt; `String` to account for this change.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### High-Performance String Processing<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Many strings are short enough to store in 64 bits, many can be stored using only<br class="gmail_msg">

&gt; 8 bits per unicode scalar, others are best encoded in UTF-16, and some come to<br class="gmail_msg">

&gt; us already in some other encoding, such as UTF-8, that would be costly to<br class="gmail_msg">

&gt; translate.  Supporting these formats while maintaining usability for<br class="gmail_msg">

&gt; general-purpose APIs demands that a single `String` type can be backed by many<br class="gmail_msg">

&gt; different representations.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; That said, the highest performance code always requires static knowledge of the<br class="gmail_msg">

&gt; data structures on which it operates, and for this code, dynamic selection of<br class="gmail_msg">

&gt; representation comes at too high a cost.  Heavy-duty text processing demands a<br class="gmail_msg">

&gt; way to opt out of dynamism and directly use known encodings.  Having this<br class="gmail_msg">

&gt; ability can also make it easy to cleanly specialize code that handles dynamic<br class="gmail_msg">

&gt; cases for maximal efficiency on the most common representations.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; To address this need, we can build models of the `Unicode` protocol that encode<br class="gmail_msg">

&gt; representation information into the type, such as `NFCNormalizedUTF16String`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Parsing ASCII Structure<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Although many machine-readable formats support the inclusion of arbitrary<br class="gmail_msg">

&gt; Unicode text, it is also common that their fundamental structure lies entirely<br class="gmail_msg">

&gt; within the ASCII subset (JSON, YAML, many XML formats).  These formats are often<br class="gmail_msg">

&gt; processed most efficiently by recognizing ASCII structural elements as ASCII,<br class="gmail_msg">

&gt; and capturing the arbitrary sections between them in more-general strings.  The<br class="gmail_msg">

&gt; current String API offers no way to efficiently recognize ASCII and skip past<br class="gmail_msg">

&gt; everything else without the overhead of full decoding into unicode scalars.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; For these purposes, strings should supply an `extendedASCII` view that is a<br class="gmail_msg">

&gt; collection of `UInt32`, where values less than `0x80` represent the<br class="gmail_msg">

&gt; corresponding ASCII character, and other values represent data that is specific<br class="gmail_msg">

&gt; to the underlying encoding of the string.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ## Language Support<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This proposal depends on two new features in the Swift language:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 1. **Generic subscripts**, to<br class="gmail_msg">

&gt;   enable unified slicing syntax.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; 2. **A subtype relationship** between<br class="gmail_msg">

&gt;   `Substring` and `String`, enabling framework APIs to traffic solely in<br class="gmail_msg">

&gt;   `String` while still making it possible to avoid copies by handling<br class="gmail_msg">

&gt;   `Substring`s where necessary.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; Additionally, **the ability to nest types and protocols inside<br class="gmail_msg">

&gt; protocols** could significantly shrink the footprint of this proposal<br class="gmail_msg">

&gt; on the top-level Swift namespace.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ## Open Questions<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Must `String` be limited to storing UTF-16 subset encodings?<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; - The ability to handle `UTF-8`-encoded strings (models of `Unicode`) is not in<br class="gmail_msg">

&gt;  question here; this is about what encodings must be storable, without<br class="gmail_msg">

&gt;  transcoding, in the common currency type called “`String`”.<br class="gmail_msg">

&gt; - ASCII, Latin-1, UCS-2, and UTF-16 are UTF-16 subsets.  UTF-8 is not.<br class="gmail_msg">

&gt; - If we have a way to get at a `String`&#39;s code units, we need a concrete type in<br class="gmail_msg">

&gt;  which to express them in the API of `String`, which is a concrete type<br class="gmail_msg">

&gt; - If String needs to be able to represent UTF-32, presumably the code units need<br class="gmail_msg">

&gt;  to be `UInt32`.<br class="gmail_msg">

&gt; - Not supporting UTF-32-encoded text seems like one reasonable design choice.<br class="gmail_msg">

&gt; - Maybe we can allow UTF-8 storage in `String` and expose its code units as<br class="gmail_msg">

&gt;  `UInt16`, just as we would for Latin-1.<br class="gmail_msg">

&gt; - Supporting only UTF-16-subset encodings would imply that `String` indices can<br class="gmail_msg">

&gt;  be serialized without recording the `String`&#39;s underlying encoding.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Do we need a type-erasable base protocol for UnicodeEncoding?<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; UnicodeEncoding has an associated type, but it may be important to be able to<br class="gmail_msg">

&gt; traffic in completely dynamic encoding values, e.g. for “tell me the most<br class="gmail_msg">

&gt; efficient encoding for this string.”<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### Should there be a string “facade?”<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; One possible design alternative makes `Unicode` a vehicle for expressing<br class="gmail_msg">

&gt; the storage and encoding of code units, but does not attempt to give it an API<br class="gmail_msg">

&gt; appropriate for `String`.  Instead, string APIs would be provided by a generic<br class="gmail_msg">

&gt; wrapper around an instance of `Unicode`:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; struct StringFacade&lt;U: Unicode&gt; : BidirectionalCollection {<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  // ...APIs for high-level string processing here...<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  var unicode: U // access to lower-level unicode details<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; typealias String = StringFacade&lt;StringStorage&gt;<br class="gmail_msg">

&gt; typealias Substring = StringFacade&lt;StringStorage.SubSequence&gt;<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; This design would allow us to de-emphasize lower-level `String` APIs such as<br class="gmail_msg">

&gt; access to the specific encoding, by putting them behind a `.unicode` property.<br class="gmail_msg">

&gt; A similar effect in a facade-less design would require a new top-level<br class="gmail_msg">

&gt; `StringProtocol` playing the role of the facade with an an `associatedtype<br class="gmail_msg">

&gt; Storage : Unicode`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; An interesting variation on this design is possible if defaulted generic<br class="gmail_msg">

&gt; parameters are introduced to the language:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ```swift<br class="gmail_msg">

&gt; struct String&lt;U: Unicode = StringStorage&gt;<br class="gmail_msg">

&gt;  : BidirectionalCollection {<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  // ...APIs for high-level string processing here...<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  var unicode: U // access to lower-level unicode details<br class="gmail_msg">

&gt; }<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; typealias Substring = String&lt;StringStorage.SubSequence&gt;<br class="gmail_msg">

&gt; ```<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; One advantage of such a design is that naïve users will always extend “the right<br class="gmail_msg">

&gt; type” (`String`) without thinking, and the new APIs will show up on `Substring`,<br class="gmail_msg">

&gt; `MyUTF8String`, etc.  That said, it also has downsides that should not be<br class="gmail_msg">

&gt; overlooked, not least of which is the confusability of the meaning of the word<br class="gmail_msg">

&gt; “string.”  Is it referring to the generic or the concrete type?<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### `TextOutputStream` and `TextOutputStreamable`<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; `TextOutputStreamable` is intended to provide a vehicle for<br class="gmail_msg">

&gt; efficiently transporting formatted representations to an output stream<br class="gmail_msg">

&gt; without forcing the allocation of storage.  Its use of `String`, a<br class="gmail_msg">

&gt; type with multiple representations, at the lowest-level unit of<br class="gmail_msg">

&gt; communication, conflicts with this goal.  It might be sufficient to<br class="gmail_msg">

&gt; change `TextOutputStream` and `TextOutputStreamable` to traffic in an<br class="gmail_msg">

&gt; associated type conforming to `Unicode`, but that is not yet clear.<br class="gmail_msg">

&gt; This area will require some design work.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### `description` and `debugDescription`<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; * Should these be creating localized or non-localized representations?<br class="gmail_msg">

&gt; * Is returning a `String` efficient enough?<br class="gmail_msg">

&gt; * Is `debugDescription` pulling the weight of the API surface area it adds?<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ### `StaticString`<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; `StaticString` was added as a byproduct of standard library developed and kept<br class="gmail_msg">

&gt; around because it seemed useful, but it was never truly *designed* for client<br class="gmail_msg">

&gt; programmers.  We need to decide what happens with it.  Presumably *something*<br class="gmail_msg">

&gt; should fill its role, and that should conform to `Unicode`.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; ## Footnotes<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; &lt;b id=&quot;f0&quot;&gt;0&lt;/b&gt; The integers rewrite currently underway is expected to<br class="gmail_msg">

&gt;    substantially reduce the scope of `Int`&#39;s API by using more<br class="gmail_msg">

&gt;    generics. [↩](#a0)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; &lt;b id=&quot;f1&quot;&gt;1&lt;/b&gt; In practice, these semantics will usually be tied to the<br class="gmail_msg">

&gt; version of the installed [ICU](<a href="http://icu-project.org" rel="noreferrer" class="gmail_msg" target="_blank">http://icu-project.org</a>) library, which<br class="gmail_msg">

&gt; programmatically encodes the most complex rules of the Unicode Standard and its<br class="gmail_msg">

&gt; de-facto extension, CLDR.[↩](#a1)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; &lt;b id=&quot;f2&quot;&gt;2&lt;/b&gt;<br class="gmail_msg">

&gt; See<br class="gmail_msg">

&gt; [<a href="http://unicode.org/reports/tr29/#Notation](http://unicode.org/reports/tr29/%23Notation)" rel="noreferrer" class="gmail_msg" target="_blank">http://unicode.org/reports/tr29/#Notation](http://unicode.org/reports/tr29/#Notation)</a>. Note<br class="gmail_msg">

&gt; that inserting Unicode scalar values to prevent merging of grapheme clusters would<br class="gmail_msg">

&gt; also constitute a kind of misbehavior (one of the clusters at the boundary would<br class="gmail_msg">

&gt; not be found in the result), so would be relatively costly to implement, with<br class="gmail_msg">

&gt; little benefit. [↩](#a2)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; &lt;b id=&quot;f4&quot;&gt;4&lt;/b&gt; The use of non-UCA-compliant ordering is fully sanctioned by<br class="gmail_msg">

&gt;  the Unicode standard for this purpose.  In fact there&#39;s<br class="gmail_msg">

&gt;  a [whole chapter](<a href="http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf</a>)<br class="gmail_msg">

&gt;  dedicated to it.  In particular, §5.17 says:<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;&gt; When comparing text that is visible to end users, a correct linguistic sort<br class="gmail_msg">

&gt;&gt; should be used, as described in _Section 5.16, Sorting and<br class="gmail_msg">

&gt;&gt; Searching_. However, in many circumstances the only requirement is for a<br class="gmail_msg">

&gt;&gt; fast, well-defined ordering. In such cases, a binary ordering can be used.<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;  [↩](#a4)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; &lt;b id=&quot;f5&quot;&gt;5&lt;/b&gt; The queries supported by `NSCharacterSet` map directly onto<br class="gmail_msg">

&gt; properties in a table that&#39;s indexed by unicode scalar value.  This table is<br class="gmail_msg">

&gt; part of the Unicode standard.  Some of these queries (e.g., “is this an<br class="gmail_msg">

&gt; uppercase character?”) may have fairly obvious generalizations to grapheme<br class="gmail_msg">

&gt; clusters, but exactly how to do it is a research topic and *ideally* we&#39;d either<br class="gmail_msg">

&gt; establish the existing practice that the Unicode committee would standardize, or<br class="gmail_msg">

&gt; the Unicode committee would do the research and we&#39;d implement their<br class="gmail_msg">

&gt; result.[↩](#a5)<br class="gmail_msg">

&gt;<br class="gmail_msg">

&gt; _______________________________________________<br class="gmail_msg">

&gt; swift-evolution mailing list<br class="gmail_msg">

&gt; <a href="mailto:swift-evolution@swift.org" class="gmail_msg" target="_blank">swift-evolution@swift.org</a><br class="gmail_msg">

&gt; <a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" class="gmail_msg" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br class="gmail_msg">

<br class="gmail_msg">

_______________________________________________<br class="gmail_msg">

swift-evolution mailing list<br class="gmail_msg">

<a href="mailto:swift-evolution@swift.org" class="gmail_msg" target="_blank">swift-evolution@swift.org</a><br class="gmail_msg">

<a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" class="gmail_msg" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br class="gmail_msg">

</blockquote></div>