This is why I’m advocating for the sections of the previous draft that deal with this issue to be maintained going forward. In that document and in the links provided in that document, there are very extensive previous discussions on lookalike characters and invisibles.<br><br>No need to rehash this very complex topic again. I will just say that are languages for which invisible modifiers are essential, but there are well-defined Unicode guidelines about restricting their use so as to maximize security without impeding legitimate use cases. Lookalikes are dealt with by Unicode in several flavors, and again the previous draft discusses why a certain flavor of normalization is most appropriate for Swift.<br><div class="gmail_quote"><div dir="ltr">On Mon, Oct 2, 2017 at 03:13 Félix Cloutier via swift-evolution <<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space">If you tried hard enough, you could probably create a variable that looks like it's shadowing one from an outer scope while it actually isn't, and use the two to confuse readers. This could trick people into thinking that some dangerous/backdoor code is actually good and safe, especially in the open-source world where you can't always trust your contributors.<div><br></div><div>On one hand, other than the complexity of telling if two characters are lookalikes, I don't know why Αrray (GREEK CAPITAL LETTER ALPHA) and Array (LATIN CAPITAL LETTER A) should be considered different identifiers. On the other hand, I struggle to imagine the specifics of an exploit that uses that. You'd have to work pretty hard to assemble all the pieces of a backdoor in visually-similar variable names without arousing suspicion.</div><div><br></div><div></div></div><div style="word-wrap:break-word;line-break:after-white-space"><div>Félix</div></div><div style="word-wrap:break-word;line-break:after-white-space"><div><br><div><br><blockquote type="cite"><div>Le 1 oct. 2017 à 22:30, Kenny Leung via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>> a écrit :</div><br class="m_5594026663645648669Apple-interchange-newline"><div><div style="word-wrap:break-word"><div>I guess theoretically you could have two variables that look alike, but are actually different values, allowing you to insert some obfuscated malicious code somehow.</div><div><br></div><div>-Kenny</div><div><br></div><br><div><blockquote type="cite"><div>On Oct 1, 2017, at 10:01 PM, Chris Lattner <<a href="mailto:clattner@nondot.org" target="_blank">clattner@nondot.org</a>> wrote:</div><br class="m_5594026663645648669Apple-interchange-newline"><div><blockquote type="cite" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><br class="m_5594026663645648669Apple-interchange-newline">On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>> wrote:</div><br class="m_5594026663645648669Apple-interchange-newline"><div><div style="word-wrap:break-word"><div>Hi All.</div><div><br></div><div>I’d like to help as well. I have fun with operators.</div><div><br></div><div>There is also the issue of code security with invisible unicode characters and characters that look exactly alike.</div></div></div></blockquote><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?</div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br></div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">-Chris</div><div style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br></div><br style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><blockquote type="cite" style="font-family:Helvetica;font-size:13px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><div style="word-wrap:break-word"><div>(They should make a Coding font that ensures all characters look different.) Was that ever resolved? Googling, I found this:</div><div><br></div><div><a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html" target="_blank">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html</a></div><div><br></div><div>Which seems to have been left at this:</div><div><br></div><div><a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html" target="_blank">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html</a></div><div><br></div><div><a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229" target="_blank">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229</a></div><div><br></div><div>Should we throw all of this into the same pot, and make any characters that aren’t on the approved list illegal?</div><div><br></div><div>-Kenny</div><div><br></div><br><div><blockquote type="cite"><div>On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>> wrote:</div><br class="m_5594026663645648669Apple-interchange-newline"><div>I’m happy to participate in the reshaping of the proposal. It would be nice to gather a group of people again to help drive it forward.<br><br>That said, it’s unclear to me that superscript T is clearly an operator, any more than would be superscript H (Hermitian), superscript 2, superscript 3, etc. But at any rate, this would be discussion for the future workgroup.<br><br>I would strongly advocate that the things-that-are-identifiers group be strongly tied to the existing, complete Unicode standard for such, and that the critical parts of the previous document about normalization be retained.<br><br><div class="gmail_quote"><div dir="ltr">On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>The core team recently met to discuss PR609 - Refining identifier and operator symbology:</div><div><a href="https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md" target="_blank">https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md</a></div><div><br></div><div>The proposal correctly observes that the partitioning of unicode codepoints into identifiers and operators is a mess in some cases. It really is an outright bug for 🙂 to be an identifier, but ☹️ to be an operator. That said, the proposal itself is complicated and is defined in terms of a bunch of unicode classes that may evolve in the “wrong way for Swift” in the future.</div><div><br></div><div>The core team would really like to get this sorted out for Swift 5, and sooner is better than later :-). Because it seems that this is a really hard problem and that <a href="https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good" target="_blank">perfection is becoming the enemy of good</a>, the core team requests the creation of a new proposal with a different approach. The general observation is that there are three kinds of characters: things that are obviously identifiers, things that are obviously math operators, and things that are non-obvious. Things that are non-obvious can be made into invalid code points, and legislated later in follow-up proposals if/when someone cares to argue for them.</div><div><br></div><div><br></div><div>To make progress on this, we suggest a few separable steps:</div><div><br></div><div><div>First, please split out the changes to the ASCII characters (e.g. . and \ operator parsing rules) to its own (small) proposal, since it is unrelated to the unicode changes, and can make progress on that proposal independently.</div></div><div><br></div><div><br></div><div>Second, someone should take a look at the concrete set of unicode identifiers that are accepted by Swift 4 and write a new proposal that splits them into the three groups: those that are clearly identifiers (which become identifiers), those that are clearly operators (which become operators), and those that are unclear or don’t matter (these become invalid code points).</div><div><br></div><div>I suggest that the criteria be based on<span class="m_5594026663645648669Apple-converted-space"> </span><b>utility for Swift code</b>, not on the underlying unicode classification. For example, the discussion thread for PR609 mentions that the T character in “ xᵀ ” is defined in unicode as a latin “letter”. Despite that, its use is Swift would clearly be as a postfix operator, so we should classify it as an operator.</div><div><br></div><div>Other suggestions:</div><div> - Math symbols are operators excepting those primarily used as identifiers like “alpha”. If there are any characters that are used for both, this proposal should make them invalid.</div><div> - While there may be useful ranges for some identifiers (e.g. to handle european accented characters), the Emoji range should probably have each codepoint independently judged, and currently unassigned codepoints should not get a meaning defined for them.</div><div> - Unicode “faces”, “people”, “animals” etc are all identifiers.</div><div> - In order to reduce the scope of the proposal, it is a safe default to exclude characters that are unlikely to be used by Swift code today, including Braille, weird currency symbols, or any set of characters that are so broken and useless in Swift 4 that it isn’t worth worrying about.</div><div> - The proposal is likely to turn a large number of code points into rejected characters. In the discussions, some people will be tempted to argue endlessly about individual rejections. To control that, we can require that people point out an example where the character is already in use, or where it has a clear application to a domain that is known today: the discussion needs to be grounded and practical, not theoretical.</div><div><br></div><div><br></div><div>Third, if there is interest sometime in the future, we can have subsequent proposals that expand the range of accepted code points, motivated by the specific application domain that cares about them. These proposals will not be source breaking, so they can happen at any time.</div><div><br></div><div><br></div><div>Is anyone interested in helping to push this effort forward?</div><div><br></div><div>-Chris</div><div><br></div></div>_______________________________________________<br>swift-evolution mailing list<br><a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br></blockquote></div>_______________________________________________<br>swift-evolution mailing list<br><a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br></div></blockquote></div><br></div>_______________________________________________<br>swift-evolution mailing list<br><a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a></div></blockquote></div></blockquote></div><br></div>_______________________________________________<br>swift-evolution mailing list<br><a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br><a href="https://lists.swift.org/mailman/listinfo/swift-evolution" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br></div></blockquote></div><br></div></div>_______________________________________________<br>
swift-evolution mailing list<br>
<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br>
<a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>
</blockquote></div>