[swift-evolution] Strings in Swift 4

Dave Abrahams dabrahams at apple.com
Wed Jan 25 15:21:03 CST 2017


on Tue Jan 24 2017, Zach Waldowski <swift-evolution at swift.org> wrote:

> I'll use Karl's point here as a minor jumping-off point for a semi-
> related train of thought… I'm excited by the content of the original
> manifesto, including a powerful Unicode  namespace and types. But as
> I've continued down the thread, I've had growing concern about  modeling
> strings breadthwise in the type system i.e., with UTF8String and so on.
>
> I strongly want Swift to have world-class string processing, but I
> believe even more strongly in the language's spirit of progressive
> disclosure. Newcomers to Swift's current String API find it difficult
> (something I personally disagree with, but that's neither here nor
> there); I don't think that difficulty is solved by aggressively use-
> specific type modeling. I instead think it gives rise to the same severe
> cargo-culting that gets us the scarily prevalent
> String.Index.init(offset:) extensions in the current model.

I think you're overplaying the impact these other types will have on the
user experience.  String will still be the common-currency vocabulary
type most users will handle.  Other models of Unicode *will* exist for
cases where the highest performance matters, and will interoperate
smoothly with String, but most users will never know about them.

>
>
> Best
>
>   Zach Waldowski
>
>   zach at waldowski.me
>
> On Tue, Jan 24, 2017, at 10:15 PM, Karl Wagner via swift-evolution wrote:
>> 
>
>>> 
>
>>>> I hope I am correct about the no-copy thing, and I would also
>>>> like to
>>>> permit promoting C strings to Swift strings without
>>>> validation.  This
>>>> is obviously unsafe in general, but I know my strings... and I care
>>>> about performance. ;)
>
>>> 
>
>>> We intend to support that use-case.  That's part of the reason
>>> for the
>>> ValidUTF8 and ValidUTF16 encodings you see here:
>
>>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598
>>> and here:
>
>>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862
>> 
>
>> It seems a little strange to me that a pre-validated UTF8 string from
>> C would have different types to a UTF8String (i.e. using ValidUTF8 vs
>> UTF8). It defeats the point of having the encoding represented in the
>> type-system.
>> 
>
>> For example, if I write a generic function:
>
>> 
>
>>> func sendMessage<Source: Unicode where Source.Encoding == UTF8>(from:
>>> Source)
>> 
>
>> I would only be able to accept UTF-8 text which hasn’t already been
>> validated.
>> 
>
>> What about if we allowed each encoding to provide multiple kinds of
>> decoder? That would also allow us to substitute our own decoders in,
>> if there are application-specific shortcuts we can take.
>> 
>
>>> protocol UnicodeEncoding {
>
>>>   associatedtype CodeUnit
>
>>> 
>
>>>   associatedtype ValidatingDecoder: UnicodeDecoder
>
>>>   associatedtype NonValidatingDecoder: UnicodeDecoder
>>> }
>
>>> 
>
>>> protocol UnicodeDecoder {
>
>>>     associatedtype Encoding: UnicodeEncoding
>
>>>     associatedtype DecodedScalar: RandomAccessCollection where
>>>     Iterator.Element == Encoding.CodeUnit
>>> 
>
>>>     static func parse1Forward<C>(…) -> ParseResult<DecodedScalar,
>>>     C.Index>
>>>     static func parse1Backward<C>(…) -> ParseResult<DecodedScalar,
>>>     C.Index>
>>> }
>
>>> // Not shown: UnicodeEncoder protocol, with transcodeScalar<T>
>>> function.
>>> 
>
>>> struct UTF8: UnicodeEncoding  { 
>
>>>     typealias CodeUnit             = UInt8  
>
>>>     typealias ValidatingDecoder    = ValidatingUTF8Decoder
>
>>>     typealias NonValidatingDecoder = NonValidatingUTF8Decoder
>
>>> }
>
>>> 
>
>>> struct NonValidatingUTF8Decoder: UnicodeDecoder {
>
>>>     typealias Encoding = UTF8
>
>>>     struct DecodedScalar: RandomAccessCollection { … }
>
>>>     // Parsing functions
>
>>> }
>
>>> 
>
>>> struct ValidatingUTF8Decoder: UnicodeDecoder {
>
>>>     typealias Encoding = UTF8
>
>>>     typealias DecodedScalar = NonValidatingUTF8Decoder.DecodedScalar
>>>     // newtype would be cool here
>>>     // Parsing functions
>
>>> }
>
>>> 
>
>>> struct String {
>
>>>     init<C, Encoding, Decoder>(from: C, encodedAs: Encoding, using:
>>>     Decoder = Encoding.ValidatingDecoder)
>>>         where C: Collection, C.Iterator.Element == Encoding.CodeUnit,
>>>         Decoder.Encoding == Encoding {
>>> 
>
>>>          // transcode to native String encoding using ‘Decoder’ we
>>>          were given
>>>     }
>
>>> }
>
>> 
>
>> - Karl
>
>> _________________________________________________
>
>> swift-evolution mailing list
>
>> swift-evolution at swift.org
>
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>

-- 
-Dave



More information about the swift-evolution mailing list