[swift-evolution] Strings in Swift 4

Zach Waldowski zach at waldowski.me
Tue Jan 24 22:16:29 CST 2017


I'll use Karl's point here as a minor jumping-off point for a semi-
related train of thought… I'm excited by the content of the original
manifesto, including a powerful Unicode  namespace and types. But as
I've continued down the thread, I've had growing concern about  modeling
strings breadthwise in the type system i.e., with UTF8String and so on.


I strongly want Swift to have world-class string processing, but I
believe even more strongly in the language's spirit of progressive
disclosure. Newcomers to Swift's current String API find it difficult
(something I personally disagree with, but that's neither here nor
there); I don't think that difficulty is solved by aggressively use-
specific type modeling. I instead think it gives rise to the same severe
cargo-culting that gets us the scarily prevalent
String.Index.init(offset:) extensions in the current model.


Best

  Zach Waldowski

  zach at waldowski.me



On Tue, Jan 24, 2017, at 10:15 PM, Karl Wagner via swift-evolution wrote:
> 

>> 

>>> I hope I am correct about the no-copy thing, and I would also
>>> like to
>>> permit promoting C strings to Swift strings without
>>> validation.  This
>>> is obviously unsafe in general, but I know my strings... and I care
>>> about performance. ;)

>> 

>> We intend to support that use-case.  That's part of the reason
>> for the
>> ValidUTF8 and ValidUTF16 encodings you see here:

>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L598
>> and here:

>> https://github.com/apple/swift/blob/unicode-rethink/stdlib/public/core/Unicode2.swift#L862
> 

> It seems a little strange to me that a pre-validated UTF8 string from
> C would have different types to a UTF8String (i.e. using ValidUTF8 vs
> UTF8). It defeats the point of having the encoding represented in the
> type-system.
> 

> For example, if I write a generic function:

> 

>> func sendMessage<Source: Unicode where Source.Encoding == UTF8>(from:
>> Source)
> 

> I would only be able to accept UTF-8 text which hasn’t already been
> validated.
> 

> What about if we allowed each encoding to provide multiple kinds of
> decoder? That would also allow us to substitute our own decoders in,
> if there are application-specific shortcuts we can take.
> 

>> protocol UnicodeEncoding {

>>   associatedtype CodeUnit

>> 

>>   associatedtype ValidatingDecoder: UnicodeDecoder

>>   associatedtype NonValidatingDecoder: UnicodeDecoder
>> }

>> 

>> protocol UnicodeDecoder {

>>     associatedtype Encoding: UnicodeEncoding

>>     associatedtype DecodedScalar: RandomAccessCollection where
>>     Iterator.Element == Encoding.CodeUnit
>> 

>>     static func parse1Forward<C>(…) -> ParseResult<DecodedScalar,
>>     C.Index>
>>     static func parse1Backward<C>(…) -> ParseResult<DecodedScalar,
>>     C.Index>
>> }

>> // Not shown: UnicodeEncoder protocol, with transcodeScalar<T>
>> function.
>> 

>> struct UTF8: UnicodeEncoding  { 

>>     typealias CodeUnit             = UInt8  

>>     typealias ValidatingDecoder    = ValidatingUTF8Decoder

>>     typealias NonValidatingDecoder = NonValidatingUTF8Decoder

>> }

>> 

>> struct NonValidatingUTF8Decoder: UnicodeDecoder {

>>     typealias Encoding = UTF8

>>     struct DecodedScalar: RandomAccessCollection { … }

>>     // Parsing functions

>> }

>> 

>> struct ValidatingUTF8Decoder: UnicodeDecoder {

>>     typealias Encoding = UTF8

>>     typealias DecodedScalar = NonValidatingUTF8Decoder.DecodedScalar
>>     // newtype would be cool here
>>     // Parsing functions

>> }

>> 

>> struct String {

>>     init<C, Encoding, Decoder>(from: C, encodedAs: Encoding, using:
>>     Decoder = Encoding.ValidatingDecoder)
>>         where C: Collection, C.Iterator.Element == Encoding.CodeUnit,
>>         Decoder.Encoding == Encoding {
>> 

>>          // transcode to native String encoding using ‘Decoder’ we
>>          were given
>>     }

>> }

> 

> - Karl

> _________________________________________________

> swift-evolution mailing list

> swift-evolution at swift.org

> https://lists.swift.org/mailman/listinfo/swift-evolution


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170124/27f7205e/attachment.html>


More information about the swift-evolution mailing list