[swift-evolution] Strings in Swift 4
Dave Abrahams
dabrahams at apple.com
Wed Jan 25 15:21:03 CST 2017
on Tue Jan 24 2017, Zach Waldowski <swift-evolution at swift.org> wrote:
> I'll use Karl's point here as a minor jumping-off point for a semi-
> related train of thought… I'm excited by the content of the original
> manifesto, including a powerful Unicode namespace and types. But as
> I've continued down the thread, I've had growing concern about modeling
> strings breadthwise in the type system i.e., with UTF8String and so on.
>
> I strongly want Swift to have world-class string processing, but I
> believe even more strongly in the language's spirit of progressive
> disclosure. Newcomers to Swift's current String API find it difficult
> (something I personally disagree with, but that's neither here nor
> there); I don't think that difficulty is solved by aggressively use-
> specific type modeling. I instead think it gives rise to the same severe
> cargo-culting that gets us the scarily prevalent
> String.Index.init(offset:) extensions in the current model.
I think you're overplaying the impact these other types will have on the
user experience. String will still be the common-currency vocabulary
type most users will handle. Other models of Unicode *will* exist for
cases where the highest performance matters, and will interoperate
smoothly with String, but most users will never know about them.
>
>
> Best
>
> Zach Waldowski
>
> zach at waldowski.me
>
> On Tue, Jan 24, 2017, at 10:15 PM, Karl Wagner via swift-evolution wrote:
>>
>
>>>
>
>>>> I hope I am correct about the no-copy thing, and I would also
>>>> like to
>>>> permit promoting C strings to Swift strings without
>>>> validation. This
>>>> is obviously unsafe in general, but I know my strings... and I care
>>>> about performance. ;)
>
>>>
>
>>> We intend to support that use-case. That's part of the reason
>>> for the
>>> ValidUTF8 and ValidUTF16 encodings you see here:
>
>>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598
>>> and here:
>
>>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862
>>
>
>> It seems a little strange to me that a pre-validated UTF8 string from
>> C would have different types to a UTF8String (i.e. using ValidUTF8 vs
>> UTF8). It defeats the point of having the encoding represented in the
>> type-system.
>>
>
>> For example, if I write a generic function:
>
>>
>
>>> func sendMessage<Source: Unicode where Source.Encoding == UTF8>(from:
>>> Source)
>>
>
>> I would only be able to accept UTF-8 text which hasn’t already been
>> validated.
>>
>
>> What about if we allowed each encoding to provide multiple kinds of
>> decoder? That would also allow us to substitute our own decoders in,
>> if there are application-specific shortcuts we can take.
>>
>
>>> protocol UnicodeEncoding {
>
>>> associatedtype CodeUnit
>
>>>
>
>>> associatedtype ValidatingDecoder: UnicodeDecoder
>
>>> associatedtype NonValidatingDecoder: UnicodeDecoder
>>> }
>
>>>
>
>>> protocol UnicodeDecoder {
>
>>> associatedtype Encoding: UnicodeEncoding
>
>>> associatedtype DecodedScalar: RandomAccessCollection where
>>> Iterator.Element == Encoding.CodeUnit
>>>
>
>>> static func parse1Forward<C>(…) -> ParseResult<DecodedScalar,
>>> C.Index>
>>> static func parse1Backward<C>(…) -> ParseResult<DecodedScalar,
>>> C.Index>
>>> }
>
>>> // Not shown: UnicodeEncoder protocol, with transcodeScalar<T>
>>> function.
>>>
>
>>> struct UTF8: UnicodeEncoding {
>
>>> typealias CodeUnit = UInt8
>
>>> typealias ValidatingDecoder = ValidatingUTF8Decoder
>
>>> typealias NonValidatingDecoder = NonValidatingUTF8Decoder
>
>>> }
>
>>>
>
>>> struct NonValidatingUTF8Decoder: UnicodeDecoder {
>
>>> typealias Encoding = UTF8
>
>>> struct DecodedScalar: RandomAccessCollection { … }
>
>>> // Parsing functions
>
>>> }
>
>>>
>
>>> struct ValidatingUTF8Decoder: UnicodeDecoder {
>
>>> typealias Encoding = UTF8
>
>>> typealias DecodedScalar = NonValidatingUTF8Decoder.DecodedScalar
>>> // newtype would be cool here
>>> // Parsing functions
>
>>> }
>
>>>
>
>>> struct String {
>
>>> init<C, Encoding, Decoder>(from: C, encodedAs: Encoding, using:
>>> Decoder = Encoding.ValidatingDecoder)
>>> where C: Collection, C.Iterator.Element == Encoding.CodeUnit,
>>> Decoder.Encoding == Encoding {
>>>
>
>>> // transcode to native String encoding using ‘Decoder’ we
>>> were given
>>> }
>
>>> }
>
>>
>
>> - Karl
>
>> _________________________________________________
>
>> swift-evolution mailing list
>
>> swift-evolution at swift.org
>
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
--
-Dave
More information about the swift-evolution
mailing list