[swift-evolution] Unicode identifiers & operators
alblue at apple.com
Fri Sep 23 14:09:00 CDT 2016
It would probably make sense to define the supported characters based on their category, rather than abstract ranges of character sets. For example, using the Letter and Number categories might be sufficient for defining identifiers.
In this case both of these characters are in the 'Symbol, Other' category:
Having the language define which categories are used for which type means they don't have to be individually enumerated as part of the grammar
It is possible to read and process the Unicode format to build up the character ranges programmatically; that's what ICU does to efficiently be able to answer questions like 'Is this a valid upper case letter?'. But defining the ranges as part of the grammar leads to evolutionary changes like this which can't be predicted in advance, because they're defined on a set of fixed code points.
> On 18 Sep 2016, at 21:29, Chris Lattner via swift-evolution <swift-evolution at swift.org> wrote:
>> On Sep 18, 2016, at 6:24 PM, Xiaodi Wu via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> On Sun, Sep 18, 2016 at 9:19 PM, Erica Sadun via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> Let me tl;dr'er this even more: ☹️ is an operator, but 🙂 is an identifier.
>> -- E, succinct, who thinks there's room for improvement
>> Ha, yes. Let's see if I can be as succinct in my contribution to the discussion:
>> 1) Agree that current situation not ideal, for reasons above
> +1, totally agreed. We really need to improve this, aiming for Swift 3.1 or Swift 4 seems like a really good idea, because the appetite for this sort of change will probably be very low after Swift 4.
>> 2) The solution might best be not one but several proposals:
>> 2a) Unicode normalization: invisible characters, Greek tonos, etc. (cf. previous message about previously proposed solution, which reflects Unicode recommendations in UTR #31)--low hanging fruit: there's an established Unicode recommendation with clear wins for security and consistency
>> 2b) Legal and illegal characters for identifiers *or* operators: UTR #31 makes recommendations regarding rarely used scripts; probably best to follow the letter and spirit of these recommendations (which would probably mean ancient Greek musical symbols and Egyptian hieroglyphics shouldn't be identifier or operator characters)
>> 2c) Decisions as to which characters are identifier characters or operator characters: for instance, emoji should probably never be operator characters; if an emoji has a non-emoji counterpart that is an operator (❗️❓➕➖➗✖️, etc.) it might be best simply to make these illegal rather than operator characters
>> 2d) Confusables: I think the last time we had this discussion, it was apparent that it'd be difficult to decide which confusables to allow or disallow after some of the low-hanging fruit is taken care of by Unicode normalization (see item 2a); the Unicode Consortium-provided list seems too quick to call two things "confusable" for our purposes (with criteria that might be relevant for URLs or other use cases, but casting too wide a net perhaps for Swift identifiers)
> These all seem like good points. I agree that we should default to following an existing Unicode standard unless there is a really good reason to deviate.
> I don’t have an opinion about the specific direction of the proposal though.
> swift-evolution mailing list
> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the swift-evolution