<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">Le 25 janv. 2017 à 13:08, Ben Cohen <<a href="mailto:ben_cohen@apple.com" class="">ben_cohen@apple.com</a>> a écrit :</div><div class=""><blockquote type="cite" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><div class=""><div class="">Okay, so I'm serializing two strings "a" and "b", and later on I want to deserialize them. I control "a", and the user controls "b". I know that I'll never have a comma in "a", so one obvious way to serialize the two strings is with "\(a),\(b)", and the most obvious way to deserialize them is with string.split(maxSplits: 2) { $0 == "," }.<br class=""><br class="">For the example, string "a" is "hello", and the user put in "\u{0301}screw you" for "b". This makes the result "hello,́screw you". Now split misses the comma.<br class=""><br class="">How do I fix it?<br class=""><br class=""></div></div></blockquote><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">One option (once Character acquires a unicodeScalars view similar to String’s) would be:</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span class="" style="color: rgb(52, 149, 175); font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">s</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">.</span><span class="" style="color: rgb(52, 149, 175); font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">split</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span>{ $0.</span><span class="" style="color: rgb(52, 149, 175); font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">unicodeScalars</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">.</span><span class="" style="color: rgb(52, 149, 175); font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">first</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span>==<span class="Apple-converted-space"> </span></span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(180, 38, 26);">","</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span>}</span></div></div></blockquote><div><br class=""></div><div>My two main objections to this are that (1) this drops the acute accent (although that's probably an acceptable sacrifice in the face of purposefully bad input); and (2) it's annoying to me that you have to drop below the Character level to safely perform a task this simple.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">There’s probably also a case to be made for a String-specific overload <span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">split(separator:<span class="Apple-converted-space"> </span></span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">UnicodeScalar</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">)</span> in which case you’d pass in the scalar of “,”. This would replicate similar behavior to languages that use code points as their “character”.</div></div></blockquote><div><br class=""></div><div>The way they're being built, I'm leaning towards the opinion that Strings wouldn't be the right tool to serialize anything. Unfortunately, in a world of XML, JSON, YAML, Markdown and such, they're also a very obvious choice.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Alternatively, the right solution is to sanitize your input before the interpolation. Sanitization is a big topic, of which this is just one example. Essentially, you are asking for this kind of sanitization to be automatically applied for all range-replaceable operations on strings for this specific use case. I’m not sure that’s a good precedent to set. There are other ways in which Unicode can be abused that wouldn’t be covered, should we be sanitizing for those too on all low-level operations?</div></div></blockquote><div><br class=""></div><div>I agree that the general Unicode abuse problem cannot be solved. The novel thing here is that Swift is one of the first languages to bring grapheme-cluster-aware strings to a wide audience, and doing so, it introduces a class of bugs that have essentially no precedent. I feel like this should worry people a little bit. People have been able to abuse RTL overrides for several years now, and we found that it's a problem to users but machines are pretty good at dealing with it. However, if you'll allow me to dramatize, these are characters that basically eat their neighbor.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">This would also have pretty far-reaching implications across lots of different types and operations. For example, it’s not just on append:</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(180, 38, 26);"><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(4, 51, 255);">var</span><span class="" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span>s =<span class="Apple-converted-space"> </span></span><span class="" style="font-variant-ligatures: no-common-ligatures;">"pokemon"</span></div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(4, 51, 255);">let</span><span class="" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span>i =<span class="Apple-converted-space"> </span></span><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">s</span><span class="" style="font-variant-ligatures: no-common-ligatures;">.</span><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">index</span><span class="" style="font-variant-ligatures: no-common-ligatures;">(of:<span class="Apple-converted-space"> </span></span><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(180, 38, 26);">"m”</span><span class="" style="font-variant-ligatures: no-common-ligatures;">)!</span></div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"><div class="" style="margin: 0px; line-height: normal; color: rgb(0, 143, 0);"><span class="" style="font-variant-ligatures: no-common-ligatures;">// insert not just \u{0301} but also a separator?</span></div></div><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(180, 38, 26);"><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">s</span><span class="" style="font-variant-ligatures: no-common-ligatures;">.</span><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">insert</span><span class="" style="font-variant-ligatures: no-common-ligatures;">(</span><span class="" style="font-variant-ligatures: no-common-ligatures;">"\u{0301}"</span><span class="" style="font-variant-ligatures: no-common-ligatures;">, at:<span class="Apple-converted-space"> </span></span><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">i</span><span class="" style="font-variant-ligatures: no-common-ligatures;">)</span></div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div class=""><br class=""></div></div></blockquote><blockquote type="cite" class=""><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class=""><span class="" style="font-variant-ligatures: no-common-ligatures;">It also would apply to in-place mutation on slices, given you can do this:</span></div><div class=""><span class="" style="font-variant-ligatures: no-common-ligatures;"><br class=""></span></div><div class=""><span class="" style="font-variant-ligatures: no-common-ligatures;"><div class="" style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"><div class="" style="margin: 0px; line-height: normal;"><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(4, 51, 255);">var</span><span class="" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span>a = [1,2,3,4]</span></div><div class="" style="margin: 0px; line-height: normal;"><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">a</span><span class="" style="font-variant-ligatures: no-common-ligatures;">[0...2].</span><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">append</span><span class="" style="font-variant-ligatures: no-common-ligatures;">(99)</span></div><div class="" style="margin: 0px; line-height: normal; color: rgb(0, 143, 0);"><span class="" style="font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">a</span><span class="" style="font-variant-ligatures: no-common-ligatures;"><span class="Apple-converted-space"> </span></span><span class="" style="font-variant-ligatures: no-common-ligatures;">// [1,2,3,99,4]</span></div><div class=""><span class="" style="font-variant-ligatures: no-common-ligatures;"><br class=""></span></div></div></span></div><div class="">In this case, suppose you appended <span class="" style="color: rgb(180, 38, 26); font-family: Menlo; font-size: 11px;">"</span><font color="#b4261a" face="Menlo" class=""><span class="" style="font-size: 11px;">e"</span></font> to a slice that ended between <span class="" style="color: rgb(180, 38, 26); font-family: Menlo; font-size: 11px;">"</span><font color="#b4261a" face="Menlo" class=""><span class="" style="font-size: 11px;">m" </span></font>and <span class="" style="color: rgb(180, 38, 26); font-family: Menlo; font-size: 11px;">"\u{0301}</span><font color="#b4261a" face="Menlo" class=""><span class="" style="font-size: 11px;">”</span></font>. The append operation on the substring would need to look into the outer string, see that the next scalar is a combining character, and then insert a spacer element in between them.</div><div class=""><br class=""></div><div class="">We would still need the ability to append modifiers to characters legitimately. If users could not do this by inserting/appending these modifiers into String, we would have to put this logic onto Character, which would need to have the ability to range-replace within its scalars, which adds to a lot to the complexity of that type. It would also be fiddly to use, given that String is not going to conform to MutableCollection (because mutation on an element cannot be done in constant time). So you couldn’t do it in-place i.e. <span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">s</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">[</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(52, 149, 175);">i</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">].unicodeScalars.append(</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures; color: rgb(180, 38, 26);">"\u{0301}"</span><span class="" style="font-family: Menlo; font-size: 11px; font-variant-ligatures: no-common-ligatures;">)</span> wouldn’t work.</div></div></div></blockquote><div><br class=""></div><div><div style="margin: 0px; line-height: normal;" class=""><span style="font-kerning: none" class="">I'd argue that no one should feel particularly great about writing code points to a collection that exposes Characters in return. Have any alternatives around modifying a Unicode scalar view been explored? I don't have any problem with making it impossible to add a Character-that-is-not-a-Character to a String's Character view if you can opt in to Unicode scalars when you mean it.</span></div><div style="margin: 0px; line-height: normal;" class=""><br class=""></div><div style="margin: 0px; line-height: normal;" class="">Félix</div><div style="margin: 0px; line-height: normal;" class=""><br class=""></div></div></div></body></html>