[swift-evolution] Strings in Swift 4

Jordan Rose jordan_rose at apple.com
Fri Jan 20 19:55:29 CST 2017


I want to start by saying great work, Ben and Dave. I know you've put a lot of time into this (and humored me in several Apple-internal discussions) and what's here looks like a great overhaul of String, balancing several tricky constraints. I do want to record some comments on specific parts of the proposal that I still have concerns about, but as usual you can of course take these with a grain of salt.

> To ease the pain of type mismatches, Substring should be a subtype of String in the same way that Int is a subtype of Optional<Int>. This would give users an implicit conversion from Substring to String, as well as the usual implicit conversions such as [Substring] to [String] that other subtype relationships receive.


I'm concerned about this for two reasons: first, because even with the comparison with Optional boxing, this is basically adding arbitrary conversions to the language, which we took out in part because of their disastrous effect on the performance of the type checker; and second, because this one in particular makes an O(N) copy operation very subtle (though admittedly one you incur all the time today, with no opt-out). A possible mitigation for the first issue would be to restrict the implicit conversion to arguments, like we do for inout-to-pointer conversions. It's still putting implicit conversions back into the language, though.


> Therefore, APIs that operate on an NSString/NSRange pair should be imported without the NSRange argument. The Objective-C importer should be changed to give these APIs special treatment so that when a Substring is passed, instead of being converted to a String, the full NSString and range are passed to the Objective-C method, thereby avoiding a copy.


I'm very skeptical about assuming that a method that takes an NSString and an NSRange automatically means to apply that NSRange to the NSString, but fortunately it may not be much of an issue in practice. A quick grep of Foundation and AppKit turned up only 45 methods that took both an NSRange and an NSString *, clustered on a small number of classes; in less than half of these cases would the transformation to Substring actually be valid (the other ranges refer to some data in the receiver rather than the argument). I've attached these results below, if you're interested.

(Note that I left out methods on NSString itself, since those are manually bridged to String, but there aren't actually too many of those either. "Foundation and AppKit" also isn't exhaustive, of course.)




>   associatedtype ExtendedASCII 
>     : BidirectionalCollection where Element == UInt32
>   var extendedASCII: ExtendedASCII { get }

This isn't a criticism, just a question: why constrain the collection to UInt32 elements? It seems unfortunate that the most common buffer types (UTF-16, UTF-8, or "unparsed bytes") can't just be passed as-is.


>   var unicodeScalars: UnicodeScalars { get }


Typo: this appears twice in the Unicode protocol.


> We should represent these aspects as orthogonal, composable components, abstracting pattern matchers into a protocol like this one, that can allow us to define logical operations once, without introducing overloads, and massively reducing API surface area.


I'm still uneasy about the performance of generalized matching operations built on top of Collection. I'm not sure we can reasonably expect the compiler to lower that all down to bulk memory accesses. That's at least only one part of the manifesto, though.


> clipboard.write(s.endIndex.codeUnitOffset)
> let offset = clipboard.read(Int.self)
> let i = String.Index(codeUnitOffset: offset)

Sorry, what is 'clipboard'? I think I'm missing something in this section—it's talking about how it's important to have a stable representation for string positions across the different index types, but the code sample doesn't directly connect for me.


> Our support for interoperation with nul-terminated C strings is scattered and incoherent, with 6 ways to transform a C string into a String and four ways to do the inverse. These APIs should be replaced with the following


This proposal doesn't plan to remove the implicit, scoped, argument-only String-to-UnsafePointer<CChar> conversion, does it? (Is that one of the 6?)


> To address this need, we can build models of the Unicode protocol that encode representation information into the type, such as NFCNormalizedUTF16String.

Not an urgent thought, but I wonder if these alternate representations really belong in the stdlib, as opposed to some auxiliary library like "SwiftStrings" or "CoreStrings", and if so whether that's still part of the standard Swift distribution or just a plain old SwiftPM package that happens to be maintained by Apple (and maybe comes preincluded with Xcode for now).


That's all I have. Again, great work, and godspeed.

Jordan

(unfortunately I am not in charge of implementing any of the features you need for this, at least as far as I know)



> On Jan 19, 2017, at 18:56, Ben Cohen via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Hi all,
> 
> Below is our take on a design manifesto for Strings in Swift 4 and beyond.
> 
> Probably best read in rendered markdown on GitHub:
> https://github.com/apple/swift/blob/master/docs/StringManifesto.md
> 
> We’re eager to hear everyone’s thoughts.
> 
> Regards,
> Ben and Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170120/98203c81/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: NSRange.txt
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170120/98203c81/attachment.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170120/98203c81/attachment-0001.html>


More information about the swift-evolution mailing list