[swift-evolution] [Review] SE-0180: String Index Overhaul

Dave Abrahams dabrahams at apple.com
Wed Jun 14 13:42:02 CDT 2017


on Wed Jun 14 2017, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote:

> On Wed, Jun 14, 2017 at 12:01 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>
>>
>> on Wed Jun 14 2017, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote:
>>
>> > On Wed, Jun 14, 2017 at 09:26 Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>> >
>> >> If we leave aside for a moment the nomenclature issue where everything
>> in
>> >> Foundation referring to a character is really referring to a Unicode
>> >> scalar, Kevin’s example illustrates the whole problem in a nutshell,
>> >> doesn’t it? In that example, we have a straightforward attempt to slice
>> >> with a misaligned index. The totality of options here are:
>> >>
>> >> * return nil, an option the rejection of which is the premise of your
>> >> proposal
>> >> * return a partial character (i.e., \u{301}), an option which we haven’t
>> >> yet talked about in this thread–seems like this could have simpler
>> >> semantics, potentially yields garbage if the index is garbage but in the
>> >> case of Kevin’s example actually behaves as the user might expect
>>
>> I think that's exactly what I was proposing in
>> https://lists.swift.org/pipermail/swift-evolution/
>> Week-of-Mon-20170612/037466.html
>>
>> >> * return a whole character after “rounding down”–difficult semantics
>> >> to define and explain, always results in a whole character but in the
>> >> case of Kevin’s example gives an unexpected answer * returns a whole
>> >> character after “rounding up”–difficult semantics to define and
>> >> explain, always results in a whole character but when the index is
>> >> misaligned would result in a character or range of characters in
>> >> which the index is not found * trap–simple semantics, never returns
>> >> garbage, obvious disadvantage that execution will not proceed
>> >>
>> >> No clearly perfect answer here. However, _if_ we hew strictly to the
>> >> stated premise of your proposal that failable APIs are awkward enough to
>> >> justify a change, and moreover that the awkwardness is truly “needless”
>> >> because of the rarity of misaligned index usage, then at face value
>> >> trapping should be a perfectly acceptable solution.
>> >>
>> >> That Kevin’s example raises the specter of trapping being a realistic
>> >> occurrence in currently working code actually suggests a challenge to
>> your
>> >> stated premise. If we accept that this challenge is a substantial one,
>> then
>> >> it’s not clear to me that abandoning failable APIs should be ruled out
>> from
>> >> the outset.
>> >>
>> >> However, if this desire to remove failable APIs remains strong then I
>> >> wonder if the undiscussed second option above is worth at least some
>> >> consideration.
>> >>
>> >
>> > Having digested your revised proposed behavior a little better I see
>> you’re
>> > kind of getting at this exact issue, but I’m uncomfortable with how it’s
>> so
>> > tied to the underlying encoding, which is not guaranteed to be UTF-16 but
>> > is assumed to be for the purposes of slicing.
>>
>> I think there's some confusion here; probably I have failed to explain
>> myself.  Today a String happens to always be UTF-16, but there's no
>> intention to assume that it is UTF-16 for the purposes of slicing in the
>> future.  Any place you see something like s.utf16 in an example I've
>> used to illustrate semantics should be interpreted as a s.codeUnits,
>> where codeUnits is a collection of code units for whatever the
>> underlying encoding is.
>>
>> Tying this to underlying encoding actually reflects the true nature of
>> String, which is exposed by the semantics of concatenation and range
>> replacement, where multiple elements may merge into one element).  As
>> stated in
>> https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string-
>> should-be-a-collection-of-characters-again
>> the elements of a String (or any of its views other than native code
>> units) is an emergent property.  To anyone operating at Unicode scalar
>> granularity (which can result in misalignment with respect to
>> characters) or at the higher granularity of code units (native or
>> transcoded, which can result in misalignment with all other views), I
>> think this is actually unsurprising.
>>
>
> That's fair. It this is critical to the semantics, though, and you expect
> that some people will operate at that granularity, it seems incongruous
> that s.codeUnits isn't actually exposed to the user even if it'd be as a
> type-erased AnyCollection.

I agree.  Exposing .codeUnits is part of the longer-term plan, but I'm
trying to keep mostly-orthogonal issues out of this proposal.

>> > I’d like to propose an alternative that attempts to deliver on what
>> > I’ve called the second option above–somewhat similar:
>> >
>> > A string index will notionally or actually keep track of the view
>> > in which it was originally aligned, be it utf8, utf16,
>> > unicodeScalars, or characters. A slicing operation str.xxx[idx]
>> > will behave as expected if idx is not misaligned with respect to
>> > str.xxx. If it is misaligned, the operation would instead be
>> > notionally String(str.yyy[idx...]).xxx.  first!, where yyy is the
>> > original view in which idx was known aligned–if idx is not also
>> > misaligned with respect to str.yyy (as might be the case if idx was
>> > returned from an operation on a different string). If it is still
>> > misaligned, trap.
>>
>> That seems much more complicsted than what I'm proposing, but maybe
>> that's because I haven't yet explained myself clearly enough.
>>
>
> I think I catch your drift, and I'm converging on your way of thinking
> here.

:-)

-- 
-Dave


More information about the swift-evolution mailing list