[swift-evolution] Unicode identifiers & operators

Chris Lattner clattner at apple.com
Sun Sep 18 23:29:53 CDT 2016


> On Sep 18, 2016, at 6:24 PM, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
> 
> On Sun, Sep 18, 2016 at 9:19 PM, Erica Sadun via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
> Let me tl;dr'er this even more: ☹️ is an operator, but πŸ™‚ is an identifier.
> 
> -- E, succinct, who thinks there's room for improvement
> 
> Ha, yes. Let's see if I can be as succinct in my contribution to the discussion:
> 
> 1) Agree that current situation not ideal, for reasons above

+1, totally agreed.  We really need to improve this, aiming for Swift 3.1 or Swift 4 seems like a really good idea, because the appetite for this sort of change will probably be very low after Swift 4.

> 2) The solution might best be not one but several proposals:
> 
>   2a) Unicode normalization: invisible characters, Greek tonos, etc. (cf. previous message about previously proposed solution, which reflects Unicode recommendations in UTR #31)--low hanging fruit: there's an established Unicode recommendation with clear wins for security and consistency
> 
>   2b) Legal and illegal characters for identifiers *or* operators: UTR #31 makes recommendations regarding rarely used scripts; probably best to follow the letter and spirit of these recommendations (which would probably mean ancient Greek musical symbols and Egyptian hieroglyphics shouldn't be identifier or operator characters)
> 
>   2c) Decisions as to which characters are identifier characters or operator characters: for instance, emoji should probably never be operator characters; if an emoji has a non-emoji counterpart that is an operator (β—οΈβ“βž•βž–βž—βœ–οΈ, etc.) it might be best simply to make these illegal rather than operator characters
> 
>   2d) Confusables: I think the last time we had this discussion, it was apparent that it'd be difficult to decide which confusables to allow or disallow after some of the low-hanging fruit is taken care of by Unicode normalization (see item 2a); the Unicode Consortium-provided list seems too quick to call two things "confusable" for our purposes (with criteria that might be relevant for URLs or other use cases, but casting too wide a net perhaps for Swift identifiers)

These all seem like good points.  I agree that we should default to following an existing Unicode standard unless there is a really good reason to deviate.

I don’t have an opinion about the specific direction of the proposal though.


-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160918/f3a7feec/attachment.html>


More information about the swift-evolution mailing list