<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><br><br>Sent from my iPad</div><div><br>On Jan 23, 2017, at 4:08 AM, Karl Wagner <<a href="mailto:razielim@gmail.com">razielim@gmail.com</a>> wrote:<br><br></div><blockquote type="cite"><div><meta http-equiv="Content-Type" content="text/html charset=utf-8"><br class=""><div><blockquote type="cite" class=""><div class="">On 23 Jan 2017, at 06:54, Félix Cloutier via swift-evolution <<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><blockquote type="cite" class=""><div class=""><div class="">doesn't necessarily mean that ignoring that case is the right thing to do. In fact, it means that Unicode won't do anything to protect programs against these, and if Swift doesn't, chances are that no one will. Isolated combining characters break a number of expectations that developers could reasonably have:</div><div class=""><br class=""></div><div class=""><ul class="MailOutline"><li class="">(a + b).count == a.count + b.count</li><li class="">(a + b).startsWith(a)</li><li class="">(a + b).endsWith(b)</li><li class="">(a + b).find(a) // or .find(b)</li></ul><div class=""><br class=""></div></div><div class="">Of course, this can be documented, but people want easy, and documentation is hard.</div></div></blockquote><div class=""><br class=""></div>Yes. Unfortunately they also want the ability to append a string consisiting of a combining character to another string and have it append. And they don't want to be prevented from forming valid-but-defective Unicode strings.</div></div></div></blockquote><div class=""><blockquote type="cite" class=""><div dir="auto" class=""><div class=""><br class=""></div></div></blockquote><blockquote type="cite" class=""><div dir="auto" class=""><div class="">[…]</div></div></blockquote><blockquote type="cite" class=""><div dir="auto" class=""><div class=""><br class=""></div></div></blockquote><blockquote type="cite" class=""><div dir="auto" class=""><div class="">Can you suggest an alternative that doesn't violate the Unicode standard and supports the expected use-cases, somehow? </div></div></blockquote></div><div class=""><br class=""></div><div class="">I'm not sure I understand. Did we go from "this is a <a href="https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string-should-be-a-collection-of-characters-again" class="">degenerate/defective</a> case that we shouldn't bother with" to "this is a supported use case that needs to work as-is"? I've never seen anyone start a string with a combining character on purpose, though I'm familiar with just one natural language that needs combining characters. I can imagine that it could be a convenient feature in other natural languages.</div><div class=""><br class=""></div><div class="">However, if Swift Strings are now designed for machine processing and less for human language convenience, for me, it's easy enough to justify a safe default in the context of machine processing: `a+b` will not combine the end of `a` with the start of `b`. You could do this by inserting a ◌ that `b` could combine with if necessary. That solution would make half of the cases that I've mentioned work as expected and make the operation always safe, as far as I can tell.</div><div class=""><br class=""></div><div class="">In that world, it would be a good idea to have a `&+` fallback or something like that that will let characters combine. I would think that this is a much less common use case than serializing strings, though.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><blockquote type="cite" class=""><div class=""><div class="">My second concern is with how easy it is to convert an Int to a String index. I've been vocal about this before: I'm concerned that Swift developers will adequate Ints to random-access String iterators, which they are emphatically not. String.Index(100) is proposed as a constant-time operation</div></div></blockquote><div class=""><br class=""></div><div class="">No, that has not been proposed. It would be </div><div class=""><br class=""></div><div class="">String.Index(codeUnitOffset: 100)</div></div></div></div></blockquote><blockquote type="cite" class=""><div dir="auto" class=""><div class=""><div class=""><br class=""></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="auto" class=""><div class=""><div class="">It's hard to strike a balance between keeping programmers from making mistakes and making the important use-cases easy. Do you have any suggestions for improving on what we've proposed?</div></div></div></div></blockquote><div class=""><div class=""><br class=""></div><div class="">That's still one extension away from String.Index(100), and one function away from an even more convenient form. I don't have a great solution, but I don't have a great understanding of the problem that this is solving either. I'm leaving it here because, AFAIK, Swift 3 imposes constraints that are hard to ignore and mostly beneficial to people outside of the English bubble, and it seems that the proposed index regresses on this.</div><div class=""><br class=""></div><div class="">I'm perfectly happy with interchanging indices between the different views of a String. It's getting the offset in or out of the index that I think lets people do incorrect assumptions about strings.</div></div></div></div></div></blockquote><br class=""></div><div>We could have a pair of helper functions to search for the grapheme cluster boundary relative to a given CodeUnit.Index:</div><div><br class=""></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;" class=""><div><font face="Courier" class="">/// Returns the index at the start of the grapheme-cluster containing the given code-unit.</font></div><div><font face="Courier" class="">func indexOfCharacterBoundary(at i: CodeUnits.Index) -> CodeUnits.Index</font></div><div><font face="Courier" class=""><br class=""></font></div><div><font face="Courier" class="">/// Returns the index at the start of the grapheme-cluster following the given code-unit.</font></div><div><font face="Courier" class="">func indexOfCharacterBoundary(after i: CodeUnits.Index) -> CodeUnits.Index</font></div></blockquote></div></blockquote><div><br></div>What problem does this proposed API solve?<div><br><blockquote type="cite"><div><div class="">Actually, if we do forgiving conversion when sharing indexes between String views, it might be nice to expose these explicit index-adjusting functions anyway.</div></div></blockquote></div></body></html>