[swift-evolution] [Review] SE-0180: String Index Overhaul

Dave Abrahams dabrahams at apple.com
Fri Jun 9 22:24:48 CDT 2017


on Fri Jun 09 2017, Kevin Ballard <swift-evolution at swift.org> wrote:

> On Tue, Jun 6, 2017, at 10:57 AM, Dave Abrahams via swift-evolution wrote:
>> 
>> on Mon Jun 05 2017, Kevin Ballard <swift-evolution at swift.org> wrote:
>> 
>> > There’s also the curious case where I can have two String.Index values
>> > that compare unequal but actually return the same value when used in a
>
>> > subscript. 
>> > For example, with the above string, if I have a
>> > String.Index(encodedOffset: 0) and a String.Index(encodedOffset:
>> > 1). This may not be a problem in practice, but it’s something to be
>> > aware of.
>> 
>> I don't think this one even rises to that level.
>> 
>> let s = "aaa"
>> var si = s.indices.makeIterator()
>> let i0 = si.next()!
>> let i1 = si.next()!
>> print(i0 == i1)       // false
>> print(s[i0] == s[i1]) // true.  Surprised?
>
> Good point.
>
>> > I’m also confused by the paragraph about index comparison. It talks
>> > about if two indices are valid in a single String view, comparison
>> > semantics are according to Collection, and otherwise indexes are
>> > compared using encodedOffsets, and this means indexes aren’t totally
>> > ordered. But I’m not sure what the first part is supposed to mean. How
>> > is comparing indices that are valid within a single view any different
>> > than comparing the encodedOffsets?
>> 
>> In today's String, encodedOffset is an offset in UTF-16.  Two indices
>> into a UTF-8 view may be unequal yet have the same encodedOffset.
>
> Ah, right. So a String.Index is actually something similar to
>
> public struct Index {
>     public var encodedOffset: Int
>     private var byteOffset: Int // UTF-8 offset into the UTF-8 code unit
> }

Similar.  I'd write it this way:

public struct Index {
   public var encodedOffset: Int

   // Offset into a UnicodeScalar represented in an encoding other
   // than the String's underlying encoding
   private var transcodedOffset: Int 
}

> In this case, can't we still define String.Index comparison as merely
> being the lexicographical comparison of (encodedOffset, byteOffset)?

Yes, and that's how it's implemented in the PR.  But byteOffset is not
part of the user model, so we can't specify it that way.

> Also, as a side note, the proposal implies that encodedOffset is
> mutable. Is this actually the case? If so, I assume that mutating it
> would also reset the byteOffset?

Yes, 

     i.encodedOffset = n

is equivalent to

     i = String.Index(encodedOffset: n)
     
-- 
-Dave



More information about the swift-evolution mailing list