[swift-evolution] Prohibit invisible characters in identifier names

Xiaodi Wu xiaodi.wu at gmail.com
Thu Jun 23 11:40:07 CDT 2016


I think this issue is bigger than that. As UAX #31 suggests, the most
appropriate approach is canonicalizing identifiers by NFC, with specific
treatment of ZWJ and ZWNJ by allowing them in three contexts, which will
require thought as to how to implement.

Given that there is a specifically recommended algorithm on how to handle
this issue, I'm also not sure anymore that this requires a proposal;
"process Unicode correctly" is really more of a bug fix because, given the
strict limits of what's canonicalized, there shouldn't be a user-facing
effect if we are merely proposing to prohibit glyphs from appearing in
certain contexts where they are never in fact encountered in real language.

On Thu, Jun 23, 2016 at 11:19 AM Sean Heber <sean at fifthace.com> wrote:

> I’m no unicode expert, but this sounds like the way to go to me.
>
> l8r
> Sean
>
>
> > On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution <
> swift-evolution at swift.org> wrote:
> >
> >
> >> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <
> swift-evolution at swift.org> wrote:
> >>
> >> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff at apple.com> wrote:
> >> Any discussion about this ought to start from UAX #31, the Unicode
> consortium's recommendations on identifiers in programming languages:
> >>
> >> http://unicode.org/reports/tr31/
> >>
> >> Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ
> need to be allowed. The document also describes a stability policy for
> handling new Unicode versions, other confusability issues, and many of the
> other problems with adopting Unicode in a programming language's syntax.
> >>
> >> That's a fantastic document--a very edifying read. Given Swift's robust
> support for Unicode in its core libraries, it's kind of surprising to me
> that identifiers aren't canonicalized at compile time. From a quick first
> read, faithful adoption of UAX #31 recommendations would address most if
> not all of the confusability and zero-width security issues raised in this
> conversation.
> >
> > From what I've read of UAX #31 it does seem to address all of the
> invisible character issues raised in the discussion. Given their unicode
> status of of Default_Ignorable_Code_Points, I believe the best course of
> action would be to canonicalise identifiers by allowing invisible
> characters only where appropriate and ignoring them everywhere else.
> >
> > The alternative to ignoring them would be to not canonicalise
> identifiers and treat invisible characters as an error instead.
> >
> > This doesn't address the issue of unicode confusable characters, but
> solving that has additional problems of its own and would probably be
> better addressed in a different proposal entirely.
> >
> > I'd like to start writing the proposal if there is agreement that this
> would be the best course of action.
> >
> > Sincerely,
> > João Pinheiro
> > _______________________________________________
> > swift-evolution mailing list
> > swift-evolution at swift.org
> > https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160623/400bd8d4/attachment.html>


More information about the swift-evolution mailing list