[swift-evolution] Prohibit invisible characters in identifier names

João Pinheiro joao at joaopinheiro.org
Tue Jul 26 14:27:13 CDT 2016


I've submitted a draft of the proposal on the thread "Normalize Unicode Identifiers <http://thread.gmane.org/gmane.comp.lang.swift.evolution/25126>". Please make any comments and recommendations there.

Sincerely,
João Pinheiro


> On 23 Jun 2016, at 18:30, Chris Lattner <clattner at apple.com> wrote:
> 
> 
>> On Jun 23, 2016, at 9:17 AM, João Pinheiro via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> 
>> 
>>> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> 
>>> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff at apple.com <mailto:jgroff at apple.com>> wrote:
>>> Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:
>>> 
>>> http://unicode.org/reports/tr31/ <http://unicode.org/reports/tr31/>
>>> 
>>> Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.
>>> 
>>> That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.
>> 
>> From what I've read of UAX #31 <http://unicode.org/reports/tr31/> it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.
>> 
>> The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.
>> 
>> This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.
>> 
>> I'd like to start writing the proposal if there is agreement that this would be the best course of action.
> 
> Sounds great, please do.  Thanks!
> 
> -Chris
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160726/422ef5c4/attachment.html>


More information about the swift-evolution mailing list