[swift-evolution] Strings in Swift 4
Dave Abrahams
dabrahams at apple.com
Tue Jan 24 13:33:40 CST 2017
on Mon Jan 23 2017, Félix Cloutier <swift-evolution at swift.org> wrote:
>> Le 23 janv. 2017 à 20:45, Dave Abrahams <dabrahams at apple.com> a
> écrit :
>>
>>
>>
>>
>>
>> On Jan 22, 2017, at 9:54 PM, Félix Cloutier <felixcca at yahoo.ca
> <mailto:felixcca at yahoo.ca>> wrote:
>>
>>>
>>>>> doesn't necessarily mean that ignoring that case is the right
> thing to do. In fact, it means that Unicode won't do anything to
> protect programs against these, and if Swift doesn't, chances are that
> no one will. Isolated combining characters break a number of
> expectations that developers could reasonably have:
>>>>>
>>>>> (a + b).count == a.count + b.count
>>>>> (a + b).startsWith(a)
>>>>> (a + b).endsWith(b)
>>>>> (a + b).find(a) // or .find(b)
>>>>>
>>>>> Of course, this can be documented, but people want easy, and documentation is hard.
>>>>
>>>> Yes. Unfortunately they also want the ability to append a string
>>>> consisiting of a combining character to another string and have it
>>>> append. And they don't want to be prevented from forming
>>>> valid-but-defective Unicode strings.
>>>>
>>>> […]
>>>>
>>>> Can you suggest an alternative that doesn't violate the Unicode
>>>> standard and supports the expected use-cases, somehow?
>>>
>>>
>>> I'm not sure I understand. Did we go from "this is a
>>> degenerate/defective
>>> <https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string-should-be-a-collection-of-characters-again>
>>> case that we shouldn't bother with" to "this is a supported use case
>>> that needs to work as-is"?
>>
>> No. The Unicode standard says it's a valid string, so we shouldn't
>> prohibit it. The standard also says it's a corner case for which it
>> isn't worth making heroic efforts to create sensible semantics. It's
>> totally in keeping with the Unicode standards that we treat it as
>> proposed.
>>
>> In a domain as complex as String processing, we need a guiding star,
>> and that star is the Unicode standard. I'm very reluctant to do
>> anything that clashes with the spirit of the standard.
>>
>>> I've never seen anyone start a string with a combining character on purpose,
>>
>> It will occur as a byproduct of the process of attaching a diacritic
>> to a base character.
>
> Unless you're in the business of writing a text editor, I don't know
> if that's a common use case.
I don't either, to be honest. But the experts I consult with keep
reassuring me that it's an important one.
>>> though I'm familiar with just one natural language that needs
>>> combining characters. I can imagine that it could be a convenient
>>> feature in other natural languages.
>>>
>>> However, if Swift Strings are now designed for machine processing
>>> and less for human language convenience, for me, it's easy enough to
>>> justify a safe default in the context of machine processing: `a+b`
>>> will not combine the end of `a` with the start of `b`. You could do
>>> this by inserting a ◌ that `b` could combine with if necessary.
>>
>> You can do it, but it trades one semantic problem for a usability
>> problem, without solving all the semantic problems: you end up with
>> a.count + b.count == (a+b).count, sure, but you still don't satisfy
>> the usual law of collections that (a+b).contains(b.first!) if b is
>> non-empty, and now you've made it difficult to attach diacritics to
>> base characters.
>
> "Difficult".
>
> What kind of processing would you suggest on a variable "b" in the
> expression "\(a),\(b)" to ensure that the result can be split with a
> comma?
I'm sorry, I don't understand what you're driving at, here.
--
-Dave
More information about the swift-evolution
mailing list