[swift-evolution] Strings in Swift 4
Dave Abrahams
dabrahams at apple.com
Wed Jan 25 15:16:47 CST 2017
on Tue Jan 24 2017, Karl Wagner <swift-evolution at swift.org> wrote:
>>
>>> I hope I am correct about the no-copy thing, and I would also like to
>>> permit promoting C strings to Swift strings without validation. This
>>> is obviously unsafe in general, but I know my strings... and I care
>>> about performance. ;)
>>
>> We intend to support that use-case. That's part of the reason for the
>> ValidUTF8 and ValidUTF16 encodings you see here:
>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598
>> <https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598>
>> and here:
>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862
>> <https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862>
>
> It seems a little strange to me that a pre-validated UTF8 string from C would have different types
> to a UTF8String (i.e. using ValidUTF8 vs UTF8). It defeats the point of having the encoding
> represented in the type-system.
Why do you say that?
The main point is to allow the compiler to make static choices about how
to do decoding efficiently.
> For example, if I write a generic function:
>
> func sendMessage<Source: Unicode where Source.Encoding == UTF8>(from: Source)
>
> I would only be able to accept UTF-8 text which hasn’t already been
> validated.
protocol UTF8Encoding : UnicodeEncoding where CodeUnit == UInt8 {}
extension UTF8 : UTF8Encoding {}
extension ValidUTF8 : UTF8Encoding {}
func sendMessage<Source: Unicode where Source.Encoding : UTF8Encoding>(from: Source)
> What about if we allowed each encoding to provide multiple kinds of decoder? That would also allow
> us to substitute our own decoders in, if there are application-specific shortcuts we can take.
>
> protocol UnicodeEncoding {
> associatedtype CodeUnit
>
> associatedtype ValidatingDecoder: UnicodeDecoder
> associatedtype NonValidatingDecoder: UnicodeDecoder
> }
>
> protocol UnicodeDecoder {
> associatedtype Encoding: UnicodeEncoding
> associatedtype DecodedScalar: RandomAccessCollection where Iterator.Element == Encoding.CodeUnit
>
> static func parse1Forward<C>(…) -> ParseResult<DecodedScalar, C.Index>
> static func parse1Backward<C>(…) -> ParseResult<DecodedScalar, C.Index>
> }
> // Not shown: UnicodeEncoder protocol, with transcodeScalar<T> function.
>
> struct UTF8: UnicodeEncoding {
> typealias CodeUnit = UInt8
> typealias ValidatingDecoder = ValidatingUTF8Decoder
> typealias NonValidatingDecoder = NonValidatingUTF8Decoder
> }
>
> struct NonValidatingUTF8Decoder: UnicodeDecoder {
> typealias Encoding = UTF8
> struct DecodedScalar: RandomAccessCollection { … }
> // Parsing functions
> }
>
> struct ValidatingUTF8Decoder: UnicodeDecoder {
> typealias Encoding = UTF8
> typealias DecodedScalar = NonValidatingUTF8Decoder.DecodedScalar // newtype would be cool here
> // Parsing functions
> }
>
> struct String {
> init<C, Encoding, Decoder>(from: C, encodedAs: Encoding, using: Decoder =
> Encoding.ValidatingDecoder)
> where C: Collection, C.Iterator.Element == Encoding.CodeUnit, Decoder.Encoding == Encoding {
>
> // transcode to native String encoding using ‘Decoder’ we were given
> }
> }
That's another way to slice the same pie. I'll think about this, thanks.
Note: part of the thinking had been that we might want to represent other
information, like "it's NFC normalized" in the encoding type. At that
point, I think a design like your suggestion above may start to get messy.
--
-Dave
More information about the swift-evolution
mailing list