[swift-evolution] Strings in Swift 4
tony.allevato at gmail.com
Fri Jan 20 10:35:34 CST 2017
I'm excited to see this taking shape. Thanks for all the hard work putting
A few random thoughts I had while reading it:
* You talk about an integer `codeUnitOffset` property for indexes. Since
the current String implementation can switch between backing storage of
ASCII or UTF-16 depending on the content of the string and how it's
obtained, presumably this means that integer is not necessarily the same as
the offset into the buffer, correct? (In other words, for a UTF-16-stored
string, you would have to multiply it by 2.)
* You discuss the possibility of exposing some String methods, like
`uppercase()`, on Character. Since Swift abstracts away the encoding, it
seems like Characters are essentially Strings that are enforced at runtime
(and sometimes at compile time, in the case of initialization from
literals) to contain exactly 1 grapheme cluster. Given that, I think it
would be worthwhile for Character to support *any* method on String that
would be sensical to operate on a single character—case transformations
(though perhaps not titlecase?), accessing its UTF-8 or UTF-16 views, and
so forth. I would ask whether it makes sense to have a shared protocol
between Character and String that defines those methods, but I'll defer on
that because it feels like it would be a "bag of methods" rather than
On that same point, if I have a lightweight (<= 63 bit) Character, many of
those operations can only currently be performed by constructing a String
from it, which incurs a time and heap allocation penalty. (And indeed,
there are TODOs in the code base to avoid doing such things internally, in
the case of Character comparisons.) Which leads me to my next thought,
since I've been doing a lot with Swift String performance lately...
* Currently, Character and String have divergent internal implementations.
A Character can be "small" (<= 63 bits in UTF-8 packed into an integer) or
"large" (> 63 bits with a heap-allocated buffer). Strings are just backed
by a heap-allocated buffer. In this write-up, you say "Many strings are
short enough to store in 64 bits"—not just characters. If that's the case,
can those optimizations be lowered into _StringCore (or its new-world
counterpart), which would allow both Characters *and* small Strings to reap
the benefits of the more efficient implementation? This would let
Characters get implementations of common methods like `uppercase()` for
free, and there would be a zero-cost conversion from Characters to Strings.
The only real difference between the types would be the APIs they vend, the
semantic concept that they represent to users, and validation.
* The talk about implicit conversions between Substring and String bums me
out, even though I see the importance of it in this context and know that
it outweighs the alternatives. Given that the Swift team seems to prefer
explicit to implicit conversions in general, I would hope that if they feel
it's important enough to make a special case for the standard library, it
could be a language feature that you'd consider making available to anyone.
On Fri, Jan 20, 2017 at 7:35 AM Ben Cohen via swift-evolution <
swift-evolution at swift.org> wrote:
> On Jan 19, 2017, at 10:42 PM, Jose Cheyo Jimenez <cheyo at masters3d.com>
> I just have one concern about the slice of a string being called
> Substring. Why not StringSlice? The word substring can mean so many things,
> specially in cocoa.
> This idea has a lot of merit, as does the option of not giving them a
> top-level name at all e.g. they could be String.Slice or
> String.SubSequence. It would underscore that they really aren’t meant to be
> used except as the result of a slicing operation or to efficiently pass a
> slice. OTOH, Substring is a term of art so can help with clarity.
> swift-evolution mailing list
> swift-evolution at swift.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the swift-evolution