[swift-evolution] Strings in Swift 4

Fri Jan 20 13:31:12 CST 2017

Very nice improvements overall!

> To ease the pain of type mismatches, Substring should be a subtype of String in the same way that Int is a subtype of Optional<Int>. This would give users an implicit conversion from Substring to String, as well as the usual implicit conversions such as [Substring] to [String] that other subtype relationships receive.

As others have said, it would be nice for this to be more general. Perhaps we can have a special type or protocol, something like RecursiveSlice?

> A Substring passed where String is expected will be implicitly copied. When compared to the “same type, copied storage” model, we have effectively deferred the cost of copying from the point where a substring is created until it must be converted to Stringfor use with an API.

Could noescape parameters/new memory model with borrowing make this more general? Again it seems very useful for all kinds of Collections.
> The “Empty Subscript”
> 
Empty subscript seems weird. IMO, it’s because of the asymmetry between subscripts and computed properties. I would favour a model which unifies computed properties and subscripts (e.g. computed properties could return “addressors” for in-place mutation).
Maybe this could be an “entireCollection”/“entireSlice" computed property?

> The goal is that Unicode exposes the underlying encoding and code units in such a way that for types with a known representation (e.g. a high-performance UTF8String) that information can be known at compile-time and can be used to generate a single path, while still allowing types like String that admit multiple representations to use runtime queries and branches to fast path specializations.

Typo: “unicodeScalars" is in the protocol twice.

If I understand it, CodeUnits is the thing which should always be defined by conformers to Unicode, and UnicodeScalars and ExtendedASCII could have default implementations (for example, UTF8String/UTF16String/3rd party conformers will use those), and String might decide to return its native buffer (e.g. if Encoding.CodeUnit == UnicodeScalar).

I’m just wondering how difficult it would be for a 3rd-party type to conform to Unicode. If you’re developing a text editor, for example, it’s possible that you may need to implement your own String-like type with some optimised storage model and it would be nice to be able to use generic algorithms with them. I’m thinking that you will have some kind of backing buffer, and you will want to expose regions of that to clients as Strings so that they can render them for UI or search through them, etc, without introducing a copy just for the semantic understanding that this data region contains some text content.

I’ll need to examine the generic String idea more, but it’s certainly very interesting...

> Indexes

One thing which I think it critical is the ability to advance an index by a given number of codeUnits. I was writing some code which interfaced with the Cocoa NSTextStorage class, tagging parts of a string that a user was editing. If this was an Array, when the user inserts some elements before your stored indexes, those indexes become invalid but you can easily advance by the difference to efficiently have your indexes pointing to the same characters.

Currently, that’s impossible with String. If the user inserts a string at a given index, your old indexes may not even point to the start of a grapheme cluster any more, and advancing the index is needlessly costly. For example:

var characters = "This is a test".characters
assert(characters.count == 14)

// Store an index to something.
let endBeforePrepending = characters.endIndex

// Insert some characters somewhere.
let insertedCharacters = "[PREPENDED]".characters
assert(insertedCharacters.count == 11)
characters.replaceSubrange(characters.startIndex..<characters.startIndex, with: insertedCharacters)

// This isn’t really correct.
let endAfterPrepending = characters.index(endBeforePrepending, offsetBy: insertedCharacters.count)
assert(endAfterPrepending == characters.endIndex) // Fails Anyway. 24 != 25

The manifesto is correct to emphasise machine processing of Strings, but it should also ensure that machine processing of mutable Strings is efficient. That way we can tag backing-Strings inside user-interface components and maintain those indices in a unicode-safe way.

The way to solve this would be that, when replacing or removing a portion of a String, you learn how many CodeUnits in the receiver’s encoding were inserted/removed so you can shift your indexes accordingly.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170120/d3201cef/attachment.html>