[swift-evolution] Strings in Swift 4

Karl Wagner razielim at gmail.com
Tue Jan 24 21:15:45 CST 2017

>> I hope I am correct about the no-copy thing, and I would also like to
>> permit promoting C strings to Swift strings without validation.  This
>> is obviously unsafe in general, but I know my strings... and I care
>> about performance. ;)
> We intend to support that use-case.  That's part of the reason for the
> ValidUTF8 and ValidUTF16 encodings you see here:
> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598 <https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598>
> and here:
> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862 <https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862>

It seems a little strange to me that a pre-validated UTF8 string from C would have different types to a UTF8String (i.e. using ValidUTF8 vs UTF8). It defeats the point of having the encoding represented in the type-system.

For example, if I write a generic function:

func sendMessage<Source: Unicode where Source.Encoding == UTF8>(from: Source)

I would only be able to accept UTF-8 text which hasn’t already been validated. 

What about if we allowed each encoding to provide multiple kinds of decoder? That would also allow us to substitute our own decoders in, if there are application-specific shortcuts we can take.

protocol UnicodeEncoding {
  associatedtype CodeUnit

  associatedtype ValidatingDecoder: UnicodeDecoder
  associatedtype NonValidatingDecoder: UnicodeDecoder

protocol UnicodeDecoder {
    associatedtype Encoding: UnicodeEncoding
    associatedtype DecodedScalar: RandomAccessCollection where Iterator.Element == Encoding.CodeUnit

    static func parse1Forward<C>(…) -> ParseResult<DecodedScalar, C.Index>
    static func parse1Backward<C>(…) -> ParseResult<DecodedScalar, C.Index>
// Not shown: UnicodeEncoder protocol, with transcodeScalar<T> function.

struct UTF8: UnicodeEncoding  { 
    typealias CodeUnit             = UInt8  
    typealias ValidatingDecoder    = ValidatingUTF8Decoder
    typealias NonValidatingDecoder = NonValidatingUTF8Decoder

struct NonValidatingUTF8Decoder: UnicodeDecoder {
    typealias Encoding = UTF8
    struct DecodedScalar: RandomAccessCollection { … }
    // Parsing functions

struct ValidatingUTF8Decoder: UnicodeDecoder {
    typealias Encoding = UTF8
    typealias DecodedScalar = NonValidatingUTF8Decoder.DecodedScalar // newtype would be cool here
    // Parsing functions

struct String {
    init<C, Encoding, Decoder>(from: C, encodedAs: Encoding, using: Decoder = Encoding.ValidatingDecoder) 
        where C: Collection, C.Iterator.Element == Encoding.CodeUnit, Decoder.Encoding == Encoding {

         // transcode to native String encoding using ‘Decoder’ we were given

- Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170125/5358bbc0/attachment.html>

More information about the swift-evolution mailing list