[swift-evolution] Strings in Swift 4

Dave Abrahams dabrahams at apple.com
Mon Jan 23 22:57:39 CST 2017



Sent from my iPad

> On Jan 23, 2017, at 4:08 AM, Karl Wagner <razielim at gmail.com> wrote:
> 
> 
>> On 23 Jan 2017, at 06:54, Félix Cloutier via swift-evolution <swift-evolution at swift.org> wrote:
>> 
>> 
>>>> doesn't necessarily mean that ignoring that case is the right thing to do. In fact, it means that Unicode won't do anything to protect programs against these, and if Swift doesn't, chances are that no one will. Isolated combining characters break a number of expectations that developers could reasonably have:
>>>> 
>>>> (a + b).count == a.count + b.count
>>>> (a + b).startsWith(a)
>>>> (a + b).endsWith(b)
>>>> (a + b).find(a) // or .find(b)
>>>> 
>>>> Of course, this can be documented, but people want easy, and documentation is hard.
>>> 
>>> Yes.  Unfortunately they also want the ability to append a string consisiting of a combining character to another string and have it append.  And they don't want to be prevented from forming valid-but-defective Unicode strings.
>>> 
>>> […]
>>> 
>>> Can you suggest an alternative that doesn't violate the Unicode standard and supports the expected use-cases, somehow? 
>> 
>> 
>> I'm not sure I understand. Did we go from "this is a degenerate/defective case that we shouldn't bother with" to "this is a supported use case that needs to work as-is"? I've never seen anyone start a string with a combining character on purpose, though I'm familiar with just one natural language that needs combining characters. I can imagine that it could be a convenient feature in other natural languages.
>> 
>> However, if Swift Strings are now designed for machine processing and less for human language convenience, for me, it's easy enough to justify a safe default in the context of machine processing: `a+b` will not combine the end of `a` with the start of `b`. You could do this by inserting a ◌ that `b` could combine with if necessary. That solution would make half of the cases that I've mentioned work as expected and make the operation always safe, as far as I can tell.
>> 
>> In that world, it would be a good idea to have a `&+` fallback or something like that that will let characters combine. I would think that this is a much less common use case than serializing strings, though.
>> 
>>>> My second concern is with how easy it is to convert an Int to a String index. I've been vocal about this before: I'm concerned that Swift developers will adequate Ints to random-access String iterators, which they are emphatically not. String.Index(100) is proposed as a constant-time operation
>>> 
>>> No, that has not been proposed.  It would be 
>>> 
>>> String.Index(codeUnitOffset: 100)
>>> 
>>> It's hard to strike a balance between keeping programmers from making mistakes and making the important use-cases easy.  Do you have any suggestions for improving on what we've proposed?
>> 
>> That's still one extension away from String.Index(100), and one function away from an even more convenient form. I don't have a great solution, but I don't have a great understanding of the problem that this is solving either. I'm leaving it here because, AFAIK, Swift 3 imposes constraints that are hard to ignore and mostly beneficial to people outside of the English bubble, and it seems that the proposed index regresses on this.
>> 
>> I'm perfectly happy with interchanging indices between the different views of a String. It's getting the offset in or out of the index that I think lets people do incorrect assumptions about strings.
> 
> We could have a pair of helper functions to search for the grapheme cluster boundary relative to a given CodeUnit.Index:
> 
> /// Returns the index at the start of the grapheme-cluster containing the given code-unit.
> func indexOfCharacterBoundary(at i: CodeUnits.Index) -> CodeUnits.Index
> 
> /// Returns the index at the start of the grapheme-cluster following the given code-unit.
> func indexOfCharacterBoundary(after i: CodeUnits.Index) -> CodeUnits.Index

What problem does this proposed API solve?

> Actually, if we do forgiving conversion when sharing indexes between String views, it might be nice to expose these explicit index-adjusting functions anyway.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170123/ee46d8a5/attachment.html>


More information about the swift-evolution mailing list