[swift-evolution] [Pitch] String revision proposal #1

Brent Royal-Gordon brent at architechies.com
Thu Mar 30 20:52:07 CDT 2017

> On Mar 30, 2017, at 2:36 PM, Ben Cohen <ben_cohen at apple.com> wrote:
> The big win for Unicode is it is short. We want to encourage people to write their extensions on this protocol. We want people who previously extended String to feel very comfortable extending Unicode. It also helps emphasis how important the Unicode-ness of Swift.String is. I like the idea of Unicode.Collection, but it is a little intimidating and making it even a tiny bit intimidating is worrying to me from an adoption perspective. 

Yeah, I understand why "Collection" might be intimidating. But I think "Unicode" would be too—it's opaque enough that people wouldn't be entirely sure whether they were extending the right thing.

I did a quick run-through of different language and the protocols/interfaces/whatever their string types conform to, but most don't seem to have anything that abstracts string types. The only similar things I could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` in Perl 6. And I'm sure you thought you were joking!

Honestly, I'd recommend just going with `StringProtocol` unless you can come up with an adjective form you like (`Stringlike`? `Textual`?). It's a bit clumsy, but it's crystal clear. Stupid name, but you'll never forget it.

>> I'm a little worried about this because it seems to imply that the protocol cannot include any mutation operations that aren't in `RangeReplaceableCollection`. For instance, it won't be possible to include an in-place `applyTransform` method in the protocol. Do you anticipate that being an issue? Might it be a good idea to define a parallel `Mutable` or `RangeReplaceable` protocol?
> You can always assign to self. Then provide more efficient implementations where RangeReplaceableCollection. We do this elsewhere in the std lib with collections e.g. https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277 <https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277>.
> Proliferating protocol combinations is problematic (looking at you, BidirectionalMutableRandomAccessSlice).

Nobody likes proliferation, but in this case it'd be because there genuinely were additional semantics that were only available on mutable strings.

(Once upon a time, I think I requested the ability to write `func index(of elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could such a feature be used for this? `func apply(_ transform: StringTransform, reverse: Bool) where Self: RangeReplaceableCollection`?)

>>> The C string interop methods will be updated to those described here: a single withCString operation and two init(cString:) constructors, one for UTF8 and one for arbitrary encodings.
>> Sorry if I'm repeating something that was already discussed, but is there a reason you don't include a `withCString` variant for arbitrary encodings? It seems like an odd asymmetry.
> Hmm. Is this a common use-case people have? Symmetry for the sake of it doesn’t seem enough. If uncommon, you can do it via an Array that you nul-terminate manually.

Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why the opposite wouldn't be.

> Yeah, it’s tempting to make ParseResult general, and the only reason we held off is because we don’t want making sure it’s generally useful to be a distraction.


I wonder if some part of the parsing algorithm could somehow be generalized so it was suitable for many purposes and then put on `Collection`, with the `UnicodeEncoding` then being passed as a parameter to it. If so, that would justify making `ParseResult` a top-level type.

> Ah, yes. Here it is:
> public protocol EncodedScalarProtocol : RandomAccessCollection {
>  init?(_ scalarValue: UnicodeScalar)
>  var utf8: UTF8.EncodedScalar { get }
>  var utf16: UTF16.EncodedScalar { get }
>  var utf32: UTF32.EncodedScalar { get }
> }

What is the `Element` type expected to be here?

I think what's missing is a holistic overview of the encoding system. So, please help me write this function:

	func unicodeScalars<Encoding: UnicodeEncoding>(in data: Data, using encoding: Encoding.Type) -> [UnicodeScalar] {
		var scalars: [UnicodeScalar] = []
		data.withUnsafeBytes { (bytes: UnsafePointer<$ParseInputElement>) in
			let buffer = UnsafeBufferPointer(start: bytes, count: data.count / MemoryLayout<$ParseInputElement>.size)
			encoding.parseForward(buffer) { encodedScalar in
				let unicodeScalar: UnicodeScalar = $doSomething(encodedScalar)
		return scalars

What type would I put for $ParseInputElement? What function or initializer do I call for $doSomething?

>>> @discardableResult
>>> public static func parseForward<C: Collection>(
>>>   _ input: C,
>>>   repairingIllFormedSequences makeRepairs: Bool = true,
>>>   into output: (EncodedScalar) throws->Void
>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int)
>> Are there constraints missing on `parseForward`?
> Yep – see the note that appears a little later. They’re really implementation details – so not something to capture in the proposal – which may or may not be needed depending on whether this lands before or after the generics features that make them redundant.

No, I mean because this says nothing about `C`'s element type. Presumably you can't parse a bunch of `UIView`s into Unicode scalars, so there must be some kind of constraint on the collection's elements. What is it?

...oh, I notice that `parseScalarForward(_:knownCount:)` has the clause `where C.Iterator.Element == EncodedScalar.Iterator.Element` attached. Should that also be attached to `parseForward(_:repairingIllFormedSequences:into:)`?

>> What do these do if `makeRepairs` is false? Would it be clearer if we made an enum that described the behaviors and changed the label to something like `ifIllFormed:`?
> The Unicode standard specifies values to substitute when making repairs.

I'm asking what happens if you *don't* want to make repairs. Does it, say, stop immediately, returning an `errorCount` of `1` and a `remainder` that starts at the site of the error? If so, would we better off having that parameter be something like `ifIllFormed: .stop` or `ifIllFormed: .repair`, rather than `repairingIllFormedSequences: false` or `repairingIllFormedSequences: true`?

>>> Due to the change in internal implementation, this means that these operations will be O(n) rather than O(1). This is not expected to be a major concern, based on experiences from a similar change made to Java, but projects will be able to work around performance issues without upgrading to Swift 4 by explicitly typing slices as Substring, which will call the Swift 4 variant, and which will be available but not invoked by default in Swift 3 mode.
>> Will there be a way to make this also work with a real Swift 3 compiler? For instance, can you define `typealias Substring = String` in such a way that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore it?
> Are you talking about this as a way for people to change their code, while still being able to compile their code with the old compiler? Yes, that might be a good strategy, will think about that.

Yes, that's what I'm talking about.

I guess the actual question is, does `#if swift(>=4)` come out as `true` for Swift 4 in Swift 3 mode? If not, is there some way to detect that you're using Swift 4 in Swift 3 mode? (I suppose one answer is "yes, Swift 4 in Swift 3 mode is called Swift 3.2"; I just haven't heard anyone mention anything like that yet.) In either case, if there's some way to distinguish, you could say:

	#if thisIsRealSwift3NotSwift4PretendingToBeSwift3()
	typealias Substring = String

And then you could write the rest of your code using `Substring` and it would compile using both Swift 3 and Swift 4 toolchains, never forcing an implicit copy.

Brent Royal-Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170330/337d8f8a/attachment.html>

More information about the swift-evolution mailing list