[swift-evolution] Prohibit invisible characters in identifier names

Xiaodi Wu xiaodi.wu at gmail.com
Thu Jun 23 14:43:06 CDT 2016


On Thu, Jun 23, 2016 at 2:29 PM, João Pinheiro <joao at joaopinheiro.org>
wrote:

> > I think we're using terminology differently here. What you call
> "character normalization" is what I'm calling canonicalization. NFC is
> described in UAX #15 as "canonical decomposition followed by canonical
> composition" and I'm just using the word "canonicalization" because it's
> shorter. If Swift represents each identifier in an NFC-transformed form
> (what I call canonicalized), then I understand the identifier to be
> canonicalized. What is the distinction you're drawing here?
>
> There is a small difference between normalisation and canonicalisation,
> but it's mostly splitting hairs. They both ensure something is represented
> properly, but canonicalisation implies establishing a single base
> representation for something. Web addresses are a good example. Both
> http://www.apple.com and http://apple.com are valid normalised addresses,
> but only the former is the canonical address for the Apple website.
>
> > Just re-read UAX #31. I see two different issues here too--do these
> match up with what you're saying above?
> >
> > * Disallowing certain glyphs in identifiers. To do so, we can implement
> the recommendation to disallow all glyphs in UAX #31 Table 4, except ZWJ
> and ZWNJ in the specific scenarios outlined in section 2.3.
> >
> > * Internally, when comparing two identifiers A and B, compare NFC(A) and
> NFC(B) without modifying or otherwise restricting the actual user-facing
> code to contain only NFC-normalized strings. This would be the approach
> recommended in section 1.3.
>
> Yes, that's correct. The proposal would be to normalise the encoding via
> NFC and then canonicalise the identifiers by ignoring invisible characters
> except in the scenarios described in UAX #31


That's cool, although my preferred solution would be more closely aligned
with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring
them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX
#31, then afterwards internally represent the identifier as its
NFC-normalized string.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160623/28afc03d/attachment.html>


More information about the swift-evolution mailing list