[swift-evolution] [Pitch] String revision proposal #1

Ben Cohen ben_cohen at apple.com
Wed Apr 5 14:16:25 CDT 2017


Hi Brent,

Sorry, I realized I failed to reply to these at the time. See below.

> On Mar 30, 2017, at 6:52 PM, Brent Royal-Gordon <brent at architechies.com> wrote:
> 
>> On Mar 30, 2017, at 2:36 PM, Ben Cohen <ben_cohen at apple.com <mailto:ben_cohen at apple.com>> wrote:
>> 
>> The big win for Unicode is it is short. We want to encourage people to write their extensions on this protocol. We want people who previously extended String to feel very comfortable extending Unicode. It also helps emphasis how important the Unicode-ness of Swift.String is. I like the idea of Unicode.Collection, but it is a little intimidating and making it even a tiny bit intimidating is worrying to me from an adoption perspective. 
> 
> Yeah, I understand why "Collection" might be intimidating. But I think "Unicode" would be too—it's opaque enough that people wouldn't be entirely sure whether they were extending the right thing.
> 
> I did a quick run-through of different language and the protocols/interfaces/whatever their string types conform to, but most don't seem to have anything that abstracts string types. The only similar things I could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` in Perl 6. And I'm sure you thought you were joking!
> 

Ha!

> Honestly, I'd recommend just going with `StringProtocol` unless you can come up with an adjective form you like (`Stringlike`? `Textual`?). It's a bit clumsy, but it's crystal clear. Stupid name, but you'll never forget it.
> 

I think it’s kind of evenly balanced between Unicode and StringProtocol. Neither are perfect.

>>> I'm a little worried about this because it seems to imply that the protocol cannot include any mutation operations that aren't in `RangeReplaceableCollection`. For instance, it won't be possible to include an in-place `applyTransform` method in the protocol. Do you anticipate that being an issue? Might it be a good idea to define a parallel `Mutable` or `RangeReplaceable` protocol?
>>> 
>> 
>> You can always assign to self. Then provide more efficient implementations where RangeReplaceableCollection. We do this elsewhere in the std lib with collections e.g. https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277 <https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277>.
>> 
>> Proliferating protocol combinations is problematic (looking at you, BidirectionalMutableRandomAccessSlice).
> 
> Nobody likes proliferation, but in this case it'd be because there genuinely were additional semantics that were only available on mutable strings.
> 
> (Once upon a time, I think I requested the ability to write `func index(of elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could such a feature be used for this? `func apply(_ transform: StringTransform, reverse: Bool) where Self: RangeReplaceableCollection`?)
> 
>>>> The C string interop methods will be updated to those described here: a single withCString operation and two init(cString:) constructors, one for UTF8 and one for arbitrary encodings.
>>> 
>>> Sorry if I'm repeating something that was already discussed, but is there a reason you don't include a `withCString` variant for arbitrary encodings? It seems like an odd asymmetry.
>> 
>> Hmm. Is this a common use-case people have? Symmetry for the sake of it doesn’t seem enough. If uncommon, you can do it via an Array that you nul-terminate manually.
> 
> Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why the opposite wouldn't be.
> 

This + another use case has convinced me that yes, we should have a matching withCString version.

>> Yeah, it’s tempting to make ParseResult general, and the only reason we held off is because we don’t want making sure it’s generally useful to be a distraction.
> 
> Understandable.
> 
> I wonder if some part of the parsing algorithm could somehow be generalized so it was suitable for many purposes and then put on `Collection`, with the `UnicodeEncoding` then being passed as a parameter to it. If so, that would justify making `ParseResult` a top-level type.
> 
>> Ah, yes. Here it is:
>> 
>> public protocol EncodedScalarProtocol : RandomAccessCollection {
>>  init?(_ scalarValue: UnicodeScalar)
>>  var utf8: UTF8.EncodedScalar { get }
>>  var utf16: UTF16.EncodedScalar { get }
>>  var utf32: UTF32.EncodedScalar { get }
>> }
> 
> What is the `Element` type expected to be here?
> 
> I think what's missing is a holistic overview of the encoding system. So, please help me write this function:
> 
> 	func unicodeScalars<Encoding: UnicodeEncoding>(in data: Data, using encoding: Encoding.Type) -> [UnicodeScalar] {
> 		var scalars: [UnicodeScalar] = []
> 		
> 		data.withUnsafeBytes { (bytes: UnsafePointer<$ParseInputElement>) in
> 			let buffer = UnsafeBufferPointer(start: bytes, count: data.count / MemoryLayout<$ParseInputElement>.size)
> 			encoding.parseForward(buffer) { encodedScalar in
> 				let unicodeScalar: UnicodeScalar = $doSomething(encodedScalar)
> 				scalars.append(unicodeScalar)
> 			}
> 		}
> 		
> 		return scalars
> 	}
> 
> What type would I put for $ParseInputElement? What function or initializer do I call for $doSomething?
> 

Will come back on this.

>>>> @discardableResult
>>>> public static func parseForward<C: Collection>(
>>>>   _ input: C,
>>>>   repairingIllFormedSequences makeRepairs: Bool = true,
>>>>   into output: (EncodedScalar) throws->Void
>>>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int)
>>> 
>>> Are there constraints missing on `parseForward`?
>>> 
>> 
>> Yep – see the note that appears a little later. They’re really implementation details – so not something to capture in the proposal – which may or may not be needed depending on whether this lands before or after the generics features that make them redundant.
> 
> No, I mean because this says nothing about `C`'s element type. Presumably you can't parse a bunch of `UIView`s into Unicode scalars, so there must be some kind of constraint on the collection's elements. What is it?
> 
> ...oh, I notice that `parseScalarForward(_:knownCount:)` has the clause `where C.Iterator.Element == EncodedScalar.Iterator.Element` attached. Should that also be attached to `parseForward(_:repairingIllFormedSequences:into:)`?
> 
>>> What do these do if `makeRepairs` is false? Would it be clearer if we made an enum that described the behaviors and changed the label to something like `ifIllFormed:`?
>> 
>> The Unicode standard specifies values to substitute when making repairs.
> 
> I'm asking what happens if you *don't* want to make repairs. Does it, say, stop immediately, returning an `errorCount` of `1` and a `remainder` that starts at the site of the error? If so, would we better off having that parameter be something like `ifIllFormed: .stop` or `ifIllFormed: .repair`, rather than `repairingIllFormedSequences: false` or `repairingIllFormedSequences: true`?
> 

The idea is, if you don’t want to make repairs, you use the transcoding primitives instead. The belief is that the old non-repairing versions (return nil if repairs needed) weren’t useful.

>>>> Due to the change in internal implementation, this means that these operations will be O(n) rather than O(1). This is not expected to be a major concern, based on experiences from a similar change made to Java, but projects will be able to work around performance issues without upgrading to Swift 4 by explicitly typing slices as Substring, which will call the Swift 4 variant, and which will be available but not invoked by default in Swift 3 mode.
>>> 
>>> Will there be a way to make this also work with a real Swift 3 compiler? For instance, can you define `typealias Substring = String` in such a way that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore it?
>> 
>> Are you talking about this as a way for people to change their code, while still being able to compile their code with the old compiler? Yes, that might be a good strategy, will think about that.
> 
> Yes, that's what I'm talking about.
> 
> I guess the actual question is, does `#if swift(>=4)` come out as `true` for Swift 4 in Swift 3 mode? If not, is there some way to detect that you're using Swift 4 in Swift 3 mode? (I suppose one answer is "yes, Swift 4 in Swift 3 mode is called Swift 3.2"; I just haven't heard anyone mention anything like that yet.) In either case, if there's some way to distinguish, you could say:
> 
> 	#if thisIsRealSwift3NotSwift4PretendingToBeSwift3()
> 	typealias Substring = String
> 	#endif
> 
> And then you could write the rest of your code using `Substring` and it would compile using both Swift 3 and Swift 4 toolchains, never forcing an implicit copy.
> 

Ah right. Unfortunately as things are currently envisioned, this won’t work – you won’t be able to distinguish “true” Swift 3 from Swift 3 compatibility mode.

> -- 
> Brent Royal-Gordon
> Architechies
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170405/5597995d/attachment.html>


More information about the swift-evolution mailing list