[swift-evolution] [swift-evolution-announce] [Revised and review extended] SE-0180 - String Index Overhaul

Tue Jun 27 19:06:16 CDT 2017

on Tue Jun 27 2017, Drew Crawford <swift-evolution at swift.org> wrote:

> On June 26, 2017 at 5:43:42 PM, Karl Wagner via swift-evolution
> (swift-evolution at swift.org) wrote:
>
> I would support a definition of encodedOffset that removed mention of
> UTF-16 and phrased things in terms of String.Encoding and
> code-units. For example, I would like to be able to construct new
> String indices from a known index plus a quantity of code-units known
> to represent a sequence of characters:
>
> var stringOne = “Hello,“
> let stringTwo = “ world"
>
> var idx = stringOne.endIndex
> stringOne.append(contentsOf: stringTwo)
> idx = String.Index(encodedOffset: idx.encodedOffset + stringTwo.codeUnits.count)
> assert(idx == stringOne.endIndex)
>
> I second this concern.  We currently use a non-Foundation library that prefers UTF8 encoding, I
> think UTF8-backed strings are important.
>
> The choice of UTF16 as string storage in Swift makes historical sense
> (e.g. runtime interop with ObjC-backed strings) but as Swift moves
> forward it makes less sense.  We need a string system that behaves
> more like a lightweight accessor for the underlying storage (e.g. if
> you like your input's encoding you can keep it) unless you do
> something (like peruse a view) that requires promotion to a new
> format.  That's a different proposal, but that's the direction I'd
> like to see us head.
>
> This proposal is in many ways the opposite of that, it specifies that
> we standardize on UTF16

Where did anyone get that idea?  That is not at all the intention.  The
intention of mentioning UTF-16 in the proposal is merely to
*acknowledge* that today, all strings have a UTF-16-compatible encoding,
to help people understand what encodedOffset will mean in practice with
today's string implementation.

-- 
-Dave