[swift-evolution] [swift-evolution-announce] [Revised and review extended] SE-0180 - String Index Overhaul

Mon Jun 26 17:43:25 CDT 2017

> On 23. Jun 2017, at 02:59, Kevin Ballard via swift-evolution <swift-evolution at swift.org> wrote:
> 
> https://github.com/apple/swift-evolution/blob/master/proposals/0180-string-index-overhaul.md <https://github.com/apple/swift-evolution/blob/master/proposals/0180-string-index-overhaul.md>
> 
> Given the discussion in the original thread about potentially having Strings backed by something other than utf16 code units, I'm somewhat concerned about having this kind of vague `encodedOffset` that happens to be UTF16 code units. If this is supposed to represent an offset into whatever code units the String is backed by, then it's going to be a problem because the user isn't supposed to know or care what the underlying storage for the String is.

Is that true? The String manifesto shows a design where the underlying Encoding and code-units are exposed.

From the talk about String’s being backed by something that isn’t UTF-16, I took that to mean that String might one-day become generic. Defaults for generic parameters have been mentioned on the list before, so “String” could still refer to “String<UTF16Encoding>” on OSX and maybe “String<UTF8Encoding>” on Linux.

I would support a definition of encodedOffset that removed mention of UTF-16 and phrased things in terms of String.Encoding and code-units. For example, I would like to be able to construct new String indices from a known index plus a quantity of code-units known to represent a sequence of characters:

var stringOne = “Hello,“
let stringTwo = “ world"

var idx = stringOne.endIndex
stringOne.append(contentsOf: stringTwo)
idx = String.Index(encodedOffset: idx.encodedOffset + stringTwo.codeUnits.count)
assert(idx == stringOne.endIndex)

- Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170627/ddf5b54c/attachment.html>