[swift-evolution] [Proposal] Normalize Unicode Identifiers

Xiaodi Wu xiaodi.wu at gmail.com
Tue Jul 26 14:26:50 CDT 2016


+1. Even if it's too late for Swift 3, though, I'd argue that it's highly
unlikely to be code-breaking in practice. Any existing code that would get
tripped up by this normalization is arguably broken already.


On Tue, Jul 26, 2016 at 2:22 PM, João Pinheiro <swift-evolution at swift.org>
wrote:

> This proposal [gist
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800>]
> is the result of the discussions from the thread "Prohibit invisible
> characters in identifier names
> <http://thread.gmane.org/gmane.comp.lang.swift.evolution/21022>". I hope
> it's still on time for inclusion in Swift 3.
>
> Sincerely,
> João Pinheiro
>
>
> Normalize Unicode Identifiers
>
>    - Proposal: SE-NNNN
>    <https://gist.github.com/JoaoPinheiro/NNNN-normalize-identifiers.md>
>    - Author: João Pinheiro <https://github.com/joaopinheiro>
>    - Status: Awaiting review
>    - Review manager: TBD
>
>
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#introduction>
> Introduction
>
> This proposal aims to introduce identifier normalization in order to
> prevent the unsafe and potentially abusive use of invisible or equivalent
> representations of Unicode characters in identifiers.
>
> Swift-evolution thread: Discussion thread
> <http://thread.gmane.org/gmane.comp.lang.swift.evolution/21022>
>
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#motivation>
> Motivation
>
> Even though Swift supports the use of Unicode for identifiers, these
> aren't yet normalized. This allows for different Unicode representations of
> the same characters to be considered distinct identifiers.
>
> For example:
>
> let Å = "Angstrom"
> let Å = "Latin Capital Letter A With Ring Above"
> let Å = "Latin Capital Letter A + Combining Ring Above"
>
> In addition to that, *default-ignorable* characters like the *Zero Width
> Space* and *Zero Width Non-Joiner* (exemplified below) are also currently
> accepted as valid parts of identifiers without any restrictions.
>
> let ab = "ab"
> let a​b = "a + Zero Width Space + b"
>
> func xy() { print("xy") }
> func x‌y() { print("x + <Zero Width Non-Joiner> + y") }
>
> The use of default-ignorable characters in identifiers is problematical,
> first because the effects they represent are stylistic or otherwise out of
> scope for identifiers, and second because the characters themselves often
> have no visible display. It is also possible to misapply these characters
> such that users can create strings that look the same but actually contain
> different characters, which can create security problems.
>
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#proposed-solution>Proposed
> solution
>
> Normalize Swift identifiers according to the normalization form NFC
> recommended for case-sensitive languages in the Unicode Standard Annexes
> 15 <https://gist.github.com/JoaoPinheiro/UAX15> and 31
> <https://gist.github.com/JoaoPinheiro/UAX31> and follow the Normalization
> Charts <https://gist.github.com/JoaoPinheiro/NormalizationCharts>.
>
> In addition to that, prohibit the use of *default-ignorable* characters
> in identifiers except in the special cases described in UAX31
> <https://gist.github.com/JoaoPinheiro/UAX31>, listed below:
>
>    - Allow Zero Width Non-Joiner (U+200C) when breaking a cursive
>    connection
>    - Allow Zero Width Non-Joiner (U+200C) in a conjunct context
>    - Allow Zero Width Joiner (U+200D) in a conjunct context
>
>
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#impact-on-existing-code>Impact
> on existing code
>
> This has potential to be a code-breaking change in cases where people may
> have used distinct, but identical looking, identifiers with different
> Unicode representations. The likelihood of that happening in actual code is
> very small and the problem can be solved by renaming identifiers that don't
> conform to the new normalized form into new non-colliding identifiers.
>
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#alternatives-considered>Alternatives
> considered
>
> The option of ignoring *default-ignorable* characters in identifiers was
> also discussed, but it was considered to be more confusing and less secure
> than explicitly treating them as errors.
>
> <https://gist.github.com/JoaoPinheiro/5f226f46c67d235a7039c775a4300800#unaddressed-issues>Unaddressed
> Issues
> There was some discussion around the issue of Unicode confusable
> characters, but it was considered to be out of scope for this proposal.
> Unicode confusable characters are a complicated issue and any possible
> solutions also come with significant drawbacks that would require more time
> and consideration.
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160726/3241c23c/attachment.html>


More information about the swift-evolution mailing list