[swift-evolution] Prohibit invisible characters in identifier names

Thu Jun 23 14:29:36 CDT 2016

> I think we're using terminology differently here. What you call "character normalization" is what I'm calling canonicalization. NFC is described in UAX #15 as "canonical decomposition followed by canonical composition" and I'm just using the word "canonicalization" because it's shorter. If Swift represents each identifier in an NFC-transformed form (what I call canonicalized), then I understand the identifier to be canonicalized. What is the distinction you're drawing here?

There is a small difference between normalisation and canonicalisation, but it's mostly splitting hairs. They both ensure something is represented properly, but canonicalisation implies establishing a single base representation for something. Web addresses are a good example. Both http://www.apple.com and http://apple.com are valid normalised addresses, but only the former is the canonical address for the Apple website.

> Just re-read UAX #31. I see two different issues here too--do these match up with what you're saying above?
> 
> * Disallowing certain glyphs in identifiers. To do so, we can implement the recommendation to disallow all glyphs in UAX #31 Table 4, except ZWJ and ZWNJ in the specific scenarios outlined in section 2.3.
> 
> * Internally, when comparing two identifiers A and B, compare NFC(A) and NFC(B) without modifying or otherwise restricting the actual user-facing code to contain only NFC-normalized strings. This would be the approach recommended in section 1.3.

Yes, that's correct. The proposal would be to normalise the encoding via NFC and then canonicalise the identifiers by ignoring invisible characters except in the scenarios described in UAX #31.

Sincerely,
João Pinheiro