[swift-evolution] Strings in Swift 4

Félix Cloutier felixcca at yahoo.ca
Tue Jan 24 21:02:06 CST 2017


> Le 24 janv. 2017 à 11:33, Dave Abrahams via swift-evolution <swift-evolution at swift.org> a écrit :
>>>> I've never seen anyone start a string with a combining character on purpose, 
>>> 
>>> It will occur as a byproduct of the process of attaching a diacritic
>>> to a base character.
>> 
>> Unless you're in the business of writing a text editor, I don't know
>> if that's a common use case.
> 
> I don't either, to be honest.  But the experts I consult with keep
> reassuring me that it's an important one.

Would it be possible that the Unicode experts' use cases are different from non-experts' use cases? It would make sense to put people who know a lot about Unicode in charge of handling complex Unicode operations, and that makes that use case very important to them, but through their hard work no one else needs to care about it.

>>>> though I'm familiar with just one natural language that needs
>>>> combining characters. I can imagine that it could be a convenient
>>>> feature in other natural languages.
>>>> 
>>>> However, if Swift Strings are now designed for machine processing
>>>> and less for human language convenience, for me, it's easy enough to
>>>> justify a safe default in the context of machine processing: `a+b`
>>>> will not combine the end of `a` with the start of `b`. You could do
>>>> this by inserting a ◌ that `b` could combine with if necessary.
>>> 
>>> You can do it, but it trades one semantic problem for a usability
>>> problem, without solving all the semantic problems: you end up with
>>> a.count + b.count == (a+b).count, sure, but you still don't satisfy
>>> the usual law of collections that (a+b).contains(b.first!) if b is
>>> non-empty, and now you've made it difficult to attach diacritics to
>>> base characters.
>> 
>> "Difficult".
>> 
>> What kind of processing would you suggest on a variable "b" in the
>> expression "\(a),\(b)" to ensure that the result can be split with a
>> comma?
> 
> I'm sorry, I don't understand what you're driving at, here.

Okay, so I'm serializing two strings "a" and "b", and later on I want to deserialize them. I control "a", and the user controls "b". I know that I'll never have a comma in "a", so one obvious way to serialize the two strings is with "\(a),\(b)", and the most obvious way to deserialize them is with string.split(maxSplits: 2) { $0 == "," }.

For the example, string "a" is "hello", and the user put in "\u{0301}screw you" for "b". This makes the result "hello,́screw you". Now split misses the comma.

How do I fix it?

Félix



More information about the swift-evolution mailing list