[swift-evolution] [Pitch] String revision proposal #1

Thu Mar 30 15:44:12 CDT 2017

> In order to be able to write extensions accross both String and Substring, a new Unicode protocol to which the two types will conform will be introduced. For the purposes of this proposal, Unicode will be defined as a protocol to be used whenver you would previously extend String. It should be possible to substitute extension Unicode { ... } in Swift 4 wherever extension String { ... } was written in Swift 3, with one exception: any passing of self into an API that takes a concrete String will need to be rewritten as String(self). If Self is a String then this should effectively optimize to a no-op, whereas if Self is a Substring then this will force a copy, helping to avoid the “memory leak” problems described above.

Did you consider an AnyUnicode<Encoding> wrapper? Then we could have a typealias called “AnyString”.

Also, regarding naming: “Unicode” is great if this was a namespace, and this proposal is a great example of why protocol nesting is badly needed in Swift code which defines (not even very complex) protocols. However, absent protocol nesting, I think “UnicodeEncoded” is better. It doesn’t roll off the tongue as nicely, perhaps, but it also doesn’t look as weird when written in code.

> The exact nature of the protocol – such as which methods should be protocol requirements vs which can be implemented as protocol extensions, are considered implementation details and so not covered in this proposal.
> 
I’d hope they do get a proposal at some stage, though. There are cases where I’d like to be able to write my own “Unicode” type and take advantage of generic (and existential when we can) text processing.

For example, maybe the thing I want to present as a single block of text is actually pieced together from multiple discontiguous regions of a buffer (i.e. the “buffer-gap” approach for faster random insertions/deletions, if I expect my code to be doing lots of that).

You could imagine that if something like CoreText (can’t speak for them, of course) were being rewritten in Swift, it would be able to compute layouts and render glyphs from any provider of unicode data and not just String or Substring. I mean, that’s my dream, anyway. It would mean you could go directly from a buffer-gap String to a rendered bitmap suitable for UI.

> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection conformance will be added directly onto the String and Substring types, as it is possible future Unicode-conforming types might not be range-replaceable (e.g. an immutable type that wraps a const char *).
> 
+1. Keep the protocol focussed.

> The standard library currently lacks a Latin1 codec, so a enum Latin1: UnicodeEncoding type will be added.
> 
I feel this is a call for better naming somewhere.

>   init<Encoding: UnicodeEncoding>(
>     cString nulTerminatedCodeUnits: UnsafePointer<Encoding.CodeUnit>,
>     encoding: Encoding)

So will this replace the stuff which Foundation puts in to String, which also decodes a C string in to Swift string?

Foundation includes more encodings (and also nests an “Encoding” enum in String itself, which makes things even more confusing), but totally ignores standard library decodes in favour of CF ones.

- Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170330/c4af2aa7/attachment.html>