[swift-evolution] [Review] SE-0180: String Index Overhaul
Kevin Ballard
kevin at sb.org
Fri Jun 9 19:10:44 CDT 2017
On Tue, Jun 6, 2017, at 10:57 AM, Dave Abrahams via swift-evolution wrote:
>
> on Mon Jun 05 2017, Kevin Ballard <swift-evolution at swift.org> wrote:
>
> > There’s also the curious case where I can have two String.Index values
> > that compare unequal but actually return the same value when used in a
> > subscript.
> > For example, with the above string, if I have a
> > String.Index(encodedOffset: 0) and a String.Index(encodedOffset:
> > 1). This may not be a problem in practice, but it’s something to be
> > aware of.
>
> I don't think this one even rises to that level.
>
> let s = "aaa"
> var si = s.indices.makeIterator()
> let i0 = si.next()!
> let i1 = si.next()!
> print(i0 == i1) // false
> print(s[i0] == s[i1]) // true. Surprised?
Good point.
> > I’m also confused by the paragraph about index comparison. It talks
> > about if two indices are valid in a single String view, comparison
> > semantics are according to Collection, and otherwise indexes are
> > compared using encodedOffsets, and this means indexes aren’t totally
> > ordered. But I’m not sure what the first part is supposed to mean. How
> > is comparing indices that are valid within a single view any different
> > than comparing the encodedOffsets?
>
> In today's String, encodedOffset is an offset in UTF-16. Two indices
> into a UTF-8 view may be unequal yet have the same encodedOffset.
Ah, right. So a String.Index is actually something similar to
public struct Index {
public var encodedOffset: Int
private var byteOffset: Int // UTF-8 offset into the UTF-8 code unit
}
In this case, can't we still define String.Index comparison as merely being the lexicographical comparison of (encodedOffset, byteOffset)?
Also, as a side note, the proposal implies that encodedOffset is mutable. Is this actually the case? If so, I assume that mutating it would also reset the byteOffset?
-Kevin Ballard
More information about the swift-evolution
mailing list