[swift-evolution] A path forward on rationalizing unicode identifiers and operators
Taylor Swift
kelvin13ma at gmail.com
Sat Sep 30 21:14:31 CDT 2017
what happens if two public operator declarations conflict?
On Sat, Sep 30, 2017 at 9:10 PM, Jonathan Hull via swift-evolution <
swift-evolution at swift.org> wrote:
> I have a technical question on this:
>
> Instead of parsing these into identifiers & operators, would it be
> possible to parse these into 3 categories: Identifiers, Operators, and
> Ambiguous?
>
> The ambiguous category would be disallowed for the moment, as you say.
> But since they are rarely used, maybe we can allow a declaration (similar
> to how we define operators) that effectively pulls it into one of the other
> categories (not in terms of tokenization, but in terms of how it can be
> used in Swift). Trying to pull it into both would be a compilation error.
>
> That way, Xiaodi can have a framework which lets her use superscript T as
> an identifier, and I can have one where I use superscript 2 to square
> things. The obvious/frequently used characters would not be ambiguous, so
> it would only slow down compilation when the rare/ambiguous characters are
> used.
>
> In my mind, this would be the ideal solution, and it could be done in
> stages (with the ambiguous characters just being forbidden for now), but I
> am not sure if it is technically possible.
>
> Thanks,
> Jon
>
> On Sep 30, 2017, at 3:59 PM, Chris Lattner via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>
> The core team recently met to discuss PR609 - Refining identifier and
> operator symbology:
> https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638
> f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md
>
> The proposal correctly observes that the partitioning of unicode
> codepoints into identifiers and operators is a mess in some cases. It
> really is an outright bug for đ to be an identifier, but âšď¸ to be an
> operator. That said, the proposal itself is complicated and is defined in
> terms of a bunch of unicode classes that may evolve in the âwrong way for
> Swiftâ in the future.
>
> The core team would really like to get this sorted out for Swift 5, and
> sooner is better than later :-). Because it seems that this is a really
> hard problem and that perfection is becoming the enemy of good
> <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core
> team requests the creation of a new proposal with a different approach.
> The general observation is that there are three kinds of characters: things
> that are obviously identifiers, things that are obviously math operators,
> and things that are non-obvious. Things that are non-obvious can be made
> into invalid code points, and legislated later in follow-up proposals
> if/when someone cares to argue for them.
>
>
> To make progress on this, we suggest a few separable steps:
>
> First, please split out the changes to the ASCII characters (e.g. . and \
> operator parsing rules) to its own (small) proposal, since it is unrelated
> to the unicode changes, and can make progress on that proposal
> independently.
>
>
> Second, someone should take a look at the concrete set of unicode
> identifiers that are accepted by Swift 4 and write a new proposal that
> splits them into the three groups: those that are clearly identifiers
> (which become identifiers), those that are clearly operators (which become
> operators), and those that are unclear or donât matter (these become
> invalid code points).
>
> I suggest that the criteria be based on *utility for Swift code*, not on
> the underlying unicode classification. For example, the discussion thread
> for PR609 mentions that the T character in â xáľ â is defined in unicode
> as a latin âletterâ. Despite that, its use is Swift would clearly be as a
> postfix operator, so we should classify it as an operator.
>
> Other suggestions:
> - Math symbols are operators excepting those primarily used as
> identifiers like âalphaâ. If there are any characters that are used for
> both, this proposal should make them invalid.
> - While there may be useful ranges for some identifiers (e.g. to handle
> european accented characters), the Emoji range should probably have each
> codepoint independently judged, and currently unassigned codepoints should
> not get a meaning defined for them.
> - Unicode âfacesâ, âpeopleâ, âanimalsâ etc are all identifiers.
> - In order to reduce the scope of the proposal, it is a safe default to
> exclude characters that are unlikely to be used by Swift code today,
> including Braille, weird currency symbols, or any set of characters that
> are so broken and useless in Swift 4 that it isnât worth worrying about.
> - The proposal is likely to turn a large number of code points into
> rejected characters. In the discussions, some people will be tempted to
> argue endlessly about individual rejections. To control that, we can
> require that people point out an example where the character is already in
> use, or where it has a clear application to a domain that is known today:
> the discussion needs to be grounded and practical, not theoretical.
>
>
> Third, if there is interest sometime in the future, we can have subsequent
> proposals that expand the range of accepted code points, motivated by the
> specific application domain that cares about them. These proposals will
> not be source breaking, so they can happen at any time.
>
>
> Is anyone interested in helping to push this effort forward?
>
> -Chris
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170930/4891d0f7/attachment.html>
More information about the swift-evolution
mailing list