[swift-evolution] [Review] SE-0180: String Index Overhaul
Xiaodi Wu
xiaodi.wu at gmail.com
Wed Jun 14 13:07:10 CDT 2017
On Wed, Jun 14, 2017 at 12:01 PM, Dave Abrahams <dabrahams at apple.com> wrote:
>
> on Wed Jun 14 2017, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote:
>
> > On Wed, Jun 14, 2017 at 09:26 Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
> >
> >> If we leave aside for a moment the nomenclature issue where everything
> in
> >> Foundation referring to a character is really referring to a Unicode
> >> scalar, Kevin’s example illustrates the whole problem in a nutshell,
> >> doesn’t it? In that example, we have a straightforward attempt to slice
> >> with a misaligned index. The totality of options here are:
> >>
> >> * return nil, an option the rejection of which is the premise of your
> >> proposal
> >> * return a partial character (i.e., \u{301}), an option which we haven’t
> >> yet talked about in this thread–seems like this could have simpler
> >> semantics, potentially yields garbage if the index is garbage but in the
> >> case of Kevin’s example actually behaves as the user might expect
>
> I think that's exactly what I was proposing in
> https://lists.swift.org/pipermail/swift-evolution/
> Week-of-Mon-20170612/037466.html
>
> >> * return a whole character after “rounding down”–difficult semantics
> >> to define and explain, always results in a whole character but in the
> >> case of Kevin’s example gives an unexpected answer * returns a whole
> >> character after “rounding up”–difficult semantics to define and
> >> explain, always results in a whole character but when the index is
> >> misaligned would result in a character or range of characters in
> >> which the index is not found * trap–simple semantics, never returns
> >> garbage, obvious disadvantage that execution will not proceed
> >>
> >> No clearly perfect answer here. However, _if_ we hew strictly to the
> >> stated premise of your proposal that failable APIs are awkward enough to
> >> justify a change, and moreover that the awkwardness is truly “needless”
> >> because of the rarity of misaligned index usage, then at face value
> >> trapping should be a perfectly acceptable solution.
> >>
> >> That Kevin’s example raises the specter of trapping being a realistic
> >> occurrence in currently working code actually suggests a challenge to
> your
> >> stated premise. If we accept that this challenge is a substantial one,
> then
> >> it’s not clear to me that abandoning failable APIs should be ruled out
> from
> >> the outset.
> >>
> >> However, if this desire to remove failable APIs remains strong then I
> >> wonder if the undiscussed second option above is worth at least some
> >> consideration.
> >>
> >
> > Having digested your revised proposed behavior a little better I see
> you’re
> > kind of getting at this exact issue, but I’m uncomfortable with how it’s
> so
> > tied to the underlying encoding, which is not guaranteed to be UTF-16 but
> > is assumed to be for the purposes of slicing.
>
> I think there's some confusion here; probably I have failed to explain
> myself. Today a String happens to always be UTF-16, but there's no
> intention to assume that it is UTF-16 for the purposes of slicing in the
> future. Any place you see something like s.utf16 in an example I've
> used to illustrate semantics should be interpreted as a s.codeUnits,
> where codeUnits is a collection of code units for whatever the
> underlying encoding is.
>
> Tying this to underlying encoding actually reflects the true nature of
> String, which is exposed by the semantics of concatenation and range
> replacement, where multiple elements may merge into one element). As
> stated in
> https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string-
> should-be-a-collection-of-characters-again
> the elements of a String (or any of its views other than native code
> units) is an emergent property. To anyone operating at Unicode scalar
> granularity (which can result in misalignment with respect to
> characters) or at the higher granularity of code units (native or
> transcoded, which can result in misalignment with all other views), I
> think this is actually unsurprising.
>
That's fair. It this is critical to the semantics, though, and you expect
that some people will operate at that granularity, it seems incongruous
that s.codeUnits isn't actually exposed to the user even if it'd be as a
type-erased AnyCollection.
> I’d like to propose an alternative that attempts to deliver on what
> > I’ve called the second option above–somewhat similar:
> >
> > A string index will notionally or actually keep track of the view in
> which
> > it was originally aligned, be it utf8, utf16, unicodeScalars, or
> > characters. A slicing operation str.xxx[idx] will behave as expected if
> idx
> > is not misaligned with respect to str.xxx. If it is misaligned, the
> > operation would instead be notionally String(str.yyy[idx...]).xxx.
> first!,
> > where yyy is the original view in which idx was known aligned–if idx is
> not
> > also misaligned with respect to str.yyy (as might be the case if idx was
> > returned from an operation on a different string). If it is still
> > misaligned, trap.
>
> That seems much more complicsted than what I'm proposing, but maybe
> that's because I haven't yet explained myself clearly enough.
>
I think I catch your drift, and I'm converging on your way of thinking here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170614/fd216ea7/attachment.html>
More information about the swift-evolution
mailing list