<div style="white-space:pre-wrap">I think this issue is bigger than that. As UAX #31 suggests, the most appropriate approach is canonicalizing identifiers by NFC, with specific treatment of ZWJ and ZWNJ by allowing them in three contexts, which will require thought as to how to implement.<br><br>Given that there is a specifically recommended algorithm on how to handle this issue, I'm also not sure anymore that this requires a proposal; "process Unicode correctly" is really more of a bug fix because, given the strict limits of what's canonicalized, there shouldn't be a user-facing effect if we are merely proposing to prohibit glyphs from appearing in certain contexts where they are never in fact encountered in real language.<br></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Jun 23, 2016 at 11:19 AM Sean Heber <<a href="mailto:sean@fifthace.com">sean@fifthace.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I’m no unicode expert, but this sounds like the way to go to me.<br>
<br>
l8r<br>
Sean<br>
<br>
<br>
> On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>> wrote:<br>
><br>
><br>
>> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>> wrote:<br>
>><br>
>> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <<a href="mailto:jgroff@apple.com" target="_blank">jgroff@apple.com</a>> wrote:<br>
>> Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:<br>
>><br>
>> <a href="http://unicode.org/reports/tr31/" rel="noreferrer" target="_blank">http://unicode.org/reports/tr31/</a><br>
>><br>
>> Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.<br>
>><br>
>> That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. >From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.<br>
><br>
> From what I've read of UAX #31 it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.<br>
><br>
> The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.<br>
><br>
> This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.<br>
><br>
> I'd like to start writing the proposal if there is agreement that this would be the best course of action.<br>
><br>
> Sincerely,<br>
> João Pinheiro<br>
> _______________________________________________<br>
> swift-evolution mailing list<br>
> <a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a><br>
> <a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>
<br>
</blockquote></div>