<div>I’m coming to this conversation rather late, so forgive the naive question:</div><div><br></div><div>Your proposal claims that current code with failable APIs is needlessly awkward and that most code only interchanges indices that are known to succeed. So, why is it not simply a precondition of string slicing that the index be correctly aligned? It seems like this would simplify the behavior greatly.</div><div><br></div><div><br><div class="gmail_quote"><div>On Tue, Jun 13, 2017 at 19:04 Dave Abrahams via swift-evolution &lt;<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

on Tue Jun 06 2017, Dave Abrahams &lt;<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>&gt; wrote:<br>

<br>

&gt;&gt; Overall it looks pretty good. But unfortunately the answer to &quot;Will<br>

&gt;&gt; applications still compile but produce different behavior than they<br>

&gt;&gt; used to?&quot; is actually &quot;Yes&quot;, when using APIs provided by<br>

&gt;&gt; Foundation. This is because Foundation is currently able to return<br>

&gt;&gt; String.Index values that don&#39;t point to Character boundaries.<br>

&gt;&gt;<br>

&gt;&gt; Specifically, in Swift 3, the following code:<br>

&gt;&gt;<br>

&gt;&gt; import Foundation<br>

&gt;&gt;<br>

&gt;&gt; let str = &quot;e\u{301}galite\u{301}&quot;<br>

&gt;&gt; let r = str.rangeOfCharacter(from: [&quot;\u{301}&quot;])!<br>

&gt;&gt; print(str[r] == &quot;\u{301}&quot;)<br>

&gt;&gt;<br>

&gt;&gt; will print “true”, because the returned range identifies the combining<br>

&gt;&gt; acute accent only. But with the proposed String.Index revisions, the<br>

&gt;&gt; `str[r]` subscript will return the whole &quot;e\u{301}” combined<br>

&gt;&gt; character.<br>

&gt;<br>

&gt; Hmm, true.<br>

&gt;<br>

&gt; This doesn&#39;t totally invalidate the concern, but...<br>

&gt;<br>

&gt; The existing behavior is a bug in the way Foundation interfaces with the<br>

&gt; 3.0 standard library.  str.rangeOfCharacter (which should be<br>

&gt; str.rangeOfUnicodeScalar) should be returning<br>

&gt; Range&lt;String.UnicodeScalarView.Index&gt; but is returning a misaligned<br>

&gt; Range&lt;String.Index&gt;.  Everything in the 3.0 standard library design is<br>

&gt; engineered to ensure that misaligned String indices don&#39;t happen at all<br>

&gt; (although they still can—just use an index from string1 in string2),<br>

&gt; thus the rigorous failable index conversion APIs.<br>

&gt;<br>

&gt; It&#39;s easy to produce results with this API that don&#39;t make sense in<br>

&gt; Swift 3:<br>

&gt;<br>

&gt;   let str = &quot;e\u{301}\u{302}galite\u{301}&quot;<br>

&gt;   str.rangeOfCharacter(from: [&quot;\u{301}&quot;])!<br>

&gt;   print(str[r.lowerBound] == &quot;\u{301}&quot;) // false<br>

&gt;<br>

&gt;&gt; This is, of course, an edge case, but we need to consider the<br>

&gt;&gt; implications of this and determine if it actually affects anything<br>

&gt;&gt; that’s likely to be a problem in practice.<br>

&gt;<br>

&gt; I agree.  It would also be reasonable to pick a different behavior for<br>

&gt; misaligned indices, for example:<br>

&gt;<br>

&gt;   Indices *that don&#39;t fall on a code unit boundary* are “rounded down”<br>

&gt;   before use.<br>

&gt;<br>

&gt; The existing behaviors for these cases are a cluster of coincidences,<br>

&gt; and were never designed.  I doubt that preserving them in their current<br>

&gt; form makes sense and will lead to a usable string semantics for the long<br>

&gt; term, but if they do in fact happen to make sense, we&#39;d still need to<br>

&gt; codify the rules so we can keep future behaviors consistent.<br>

<br>

Having considered this further, I&#39;d like to propose these revised semantics for<br>

misaligned indices, to preserve the behavior of rangeOfCharacter and its<br>

ilk:<br>

<br>

* Definition: an index i is aligned with respect to a string view v iff<br>

<br>

     v.indices.contains(i) || v.endIndex == i<br>

<br>

  If i is not aligned with respect to v it is *misaligned* with respect<br>

  to v.<br>

<br>

* When i is misaligned with respect to a String/Substring view s.xxx<br>

  (imagining s itself could also be spelled as s.xxx), combining s.xxx<br>

  and i is done in terms of underlying code units and i.encodedOffset.<br>

<br>

  It&#39;s very hard to write these semantics down precisely in terms of<br>

  existing constructs, but this should give you a sense of what I have<br>

  in mind:<br>

<br>

  1. the suffix beginning at i is formed by slicing the underlying<br>

    codeUnits at i.encodedOffset, forming a new Substring around that<br>

    slice, and getting its corresponding xxx view<br>

<br>

     s.xxx[i...]<br>

<br>

  is roughly equivalent to:<br>

<br>

    Substring(s.utf16[String.Index(encodedOffset: i.encodedOffset)...]).xxx<br>

<br>

  (given that we currently have UTF-16 code units)<br>

<br>

  2. similarly<br>

<br>

     s.xxx[..&lt;i]<br>

<br>

  is equivalent to something like:<br>

<br>

    Substring(s.utf16[..&lt;String.Index(encodedOffset: i.encodedOffset)]).xxx<br>

<br>

  3. s.xxx[i] is equivalent to s.xxx[i...].first!<br>

<br>

  4. s.xxx.index(after: i) is equivalent to s.xxx[i...].indices.dropFirst().first!<br>

<br>

  5. s.xxx.index(before: i) is equivalent to s.xxx[..&lt;i].indices.last!<br>

<br>

I&#39;m concerned that we have no precise way to specify the semantics of #1<br>

and #2, to the point where it might be better to implement them that way<br>

but leave the semantics unspecified.  Another alternative would be to<br>

add the APIs needed to make it possible to express a precise equivalence<br>

instead of a rough equivalence.  If anyone has better ideas, I&#39;m all ears.<br>

<br>

--<br>

-Dave<br>

<br>

_______________________________________________<br>

swift-evolution mailing list<br>

<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br>

<a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>

</blockquote></div></div>