[swift-evolution] [swift-evolution-announce] [Review] SE-0027 Expose code unit initializers on String

Tue Feb 16 13:08:49 CST 2016

Hi Patrick,

I think the “bag of bytes” characterization might result from a shortcoming in the wording of the proposal, but it’s not actually a concern for the proposed new methods themselves. The intended use for these methods is to convert ASCII, UTF8 or UTF16 code sequences into Strings. That’s about as fundamental to the functioning of a Unicode-compliant String class as you can get.

It’s true that you could use these to convert some arbitrary sequence of bytes into a String. If there are invalid characters that result from that, the failable initializer will fail and the standard initializer will silently “repair” the characters. The decode() method will tell you whether it repaired anything. Those are exactly the set of options I would want.

—CK

> On Feb 16, 2016, at 5:15 AM, Patrick Gili via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Hi Zach,
> 
> On your advice, I went back and read the sections of the Swift book relating to Strings and Characters. While it is easy to see that a String is a "linked list of text-isa things" or a "collection of Characters", I do not see anything that encourages a developer to treat a String as a "bag of bytes". I would not misinterpret the section on "Unicode Representations of Strings" as a "bag of bytes". Perhaps you can show be more specific and quote some text that gives you this impression.
> 
> I would argue that these methods decrease the safety of a String and that they do indeed change the contract of the API design. If an application opens a truly binary file (e.g., something that was encrypted or a executable) and you initialize a String using these contents, I would argue that the String does not hold valid characters, and hence the value of the String is not a string-value.
> 
> String offers a robust toolbox for dealing with a "bag of bytes", but to use it such represents an abuse. I think NSString may have encouraged years of abuse. Even more than a Uint8View for String, which would only  perpetuate the abuse, I would like to determine the shortcomings of [Uint8], as this is the purest representation of a "bag of bytes".
> 
> Cheers,
> -Patrick 
> 
>> On Feb 14, 2016, at 1:40 AM, Zach Waldowski via swift-evolution <swift-evolution at swift.org> wrote:
>> 
>> I think you're drawing an overly arbitrary distinction about the
>> semantics. I'd recommend a close re-reading of the Swift book's chapters
>> on String after their reworking in 2.0; it bridges together the "linked
>> list of text-ish things", "collection of Characters", and "bag of bytes"
>> ideas rather well. They're not mutually exclusive.
>> 
>> The new methods do not decrease the safety of String, nor does it change
>> the contract of the API design. It should not be possible to get
>> malformed strings back from the new API; the non-validating version
>> automatically performs repairs, and the validating version fails (by
>> returning nil) on any errors. In fact, exposing these APIs in a way that
>> is aware and respectful of String's underpinnings is safer than the
>> alternative. The stdlib won't screw up things like surrogate pairs or
>> range checking of valid code points, whereas I've seen plenty of code
>> try and do what these methods do themselves by upcasting UInt8 to
>> UnicodeScalar and accumulating.
>> 
>> Addressing other points about the proposal: I overall agree with you
>> that the Views would do a better job of this on the long scale of time,
>> but C and ObjC interop simple require entry points like the ones in this
>> proposal, and are in-line with how Swift works today. This proposal is
>> not intended to overhaul String, even though that may be one day
>> desirable by what Dave and others said on the Evolution thread.
>> 
>> Thanks for your feedback! :)
>> 
>> Zach Waldowski
>> zach at waldowski.me
>> 
>> On Sat, Feb 13, 2016, at 05:33 PM, Patrick Gili via swift-evolution
>> wrote:
>>> Okay. However, does this change the implied semantics?
>>> 
>>>> On Feb 13, 2016, at 5:26 PM, Brent Royal-Gordon <brent at architechies.com> wrote:
>>>> 
>>>>> The introduction starts out by making the claim, "Going back and forth from Strings to their byte representations is an important part of solving many problems, including object serialization, binary and text file formats, wire/network interfaces, and cryptography." Essentially, these problems deal with an array of raw bytes, and I have to wonder why an application would push them into a String?
>>>> 
>>>> I read this section as trying to say "object serialization, binary and text file formats, wire/network interfaces, and cryptography all require you to construct strings from decoded bytes, which is what this proposal is trying to improve". I don't think it's trying to say that we should have better support for treating strings as bags of arbitrary bytes, and in fact I don't think this proposal does that.
>>>> 
>>>> -- 
>>>> Brent Royal-Gordon
>>>> Architechies
>>>> 
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution