[swift-evolution] A path forward on rationalizing unicode identifiers and operators

Vladimir.S svabox at gmail.com
Mon Oct 2 06:59:41 CDT 2017

On 02.10.2017 8:30, Kenny Leung via swift-evolution wrote:
> I guess theoretically you could have two variables that look alike, but are actually 
> different values, allowing you to insert some obfuscated malicious code somehow.

Also, IIRC, there is a "similar" problem exists with Right-To-Left "modifier", so 
when inserted inside some variable name, you *see* (in browser/in editor) not the 
same variable name that will be used *by compiler*. Can't find the link right now, 
but if this could be helpful - will try to find.


> -Kenny
>> On Oct 1, 2017, at 10:01 PM, Chris Lattner <clattner at nondot.org 
>> <mailto:clattner at nondot.org>> wrote:
>>> On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution 
>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> Hi All.
>>> I’d like to help as well. I have fun with operators.
>>> There is also the issue of code security with invisible unicode characters and 
>>> characters that look exactly alike.
>> Unless there is a compelling reason to add them, I think we should ban invisible 
>> characters.  What is the harm of characters that look alike?
>> -Chris
>>> (They should make a Coding font that ensures all characters look different.) Was 
>>> that ever resolved? Googling, I found this:
>>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html
>>> Which seems to have been left at this:
>>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html
>>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229
>>> Should we throw all of this into the same pot, and make any characters that aren’t 
>>> on the approved list illegal?
>>> -Kenny
>>>> On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution 
>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>> I’m happy to participate in the reshaping of the proposal. It would be nice to 
>>>> gather a group of people again to help drive it forward.
>>>> That said, it’s unclear to me that superscript T is clearly an operator, any more 
>>>> than would be superscript H (Hermitian), superscript 2, superscript 3, etc. But 
>>>> at any rate, this would be discussion for the future workgroup.
>>>> I would strongly advocate that the things-that-are-identifiers group be strongly 
>>>> tied to the existing, complete Unicode standard for such, and that the critical 
>>>> parts of the previous document about normalization be retained.
>>>> On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution 
>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>     The core team recently met to discuss PR609 - Refining identifier and
>>>>     operator symbology:
>>>>     https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md
>>>>     The proposal correctly observes that the partitioning of unicode codepoints
>>>>     into identifiers and operators is a mess in some cases.  It really is an
>>>>     outright bug for 🙂 to be an identifier, but ☹️ to be an operator.  That
>>>>     said, the proposal itself is complicated and is defined in terms of a bunch
>>>>     of unicode classes that may evolve in the “wrong way for Swift” in the future.
>>>>     The core team would really like to get this sorted out for Swift 5, and
>>>>     sooner is better than later :-).  Because it seems that this is a really hard
>>>>     problem and that perfection is becoming the enemy of good
>>>>     <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team
>>>>     requests the creation of a new proposal with a different approach.  The
>>>>     general observation is that there are three kinds of characters: things that
>>>>     are obviously identifiers, things that are obviously math operators, and
>>>>     things that are non-obvious.  Things that are non-obvious can be made into
>>>>     invalid code points, and legislated later in follow-up proposals if/when
>>>>     someone cares to argue for them.
>>>>     To make progress on this, we suggest a few separable steps:
>>>>     First, please split out the changes to the ASCII characters (e.g. . and \
>>>>     operator parsing rules) to its own (small) proposal, since it is unrelated to
>>>>     the unicode changes, and can make progress on that proposal independently.
>>>>     Second, someone should take a look at the concrete set of unicode identifiers
>>>>     that are accepted by Swift 4 and write a new proposal that splits them into
>>>>     the three groups: those that are clearly identifiers (which become
>>>>     identifiers), those that are clearly operators (which become operators), and
>>>>     those that are unclear or don’t matter (these become invalid code points).
>>>>     I suggest that the criteria be based on*utility for Swift code*, not on the
>>>>     underlying unicode classification.  For example, the discussion thread for
>>>>     PR609 mentions that the T character in “  xᵀ  ” is defined in unicode as a
>>>>     latin “letter”.  Despite that, its use is Swift would clearly be as a postfix
>>>>     operator, so we should classify it as an operator.
>>>>     Other suggestions:
>>>>      - Math symbols are operators excepting those primarily used as identifiers
>>>>     like “alpha”.  If there are any characters that are used for both, this
>>>>     proposal should make them invalid.
>>>>      - While there may be useful ranges for some identifiers (e.g. to handle
>>>>     european accented characters), the Emoji range should probably have each
>>>>     codepoint independently judged, and currently unassigned codepoints should
>>>>     not get a meaning defined for them.
>>>>      - Unicode “faces”, “people”, “animals” etc are all identifiers.
>>>>      - In order to reduce the scope of the proposal, it is a safe default to
>>>>     exclude characters that are unlikely to be used by Swift code today,
>>>>     including Braille, weird currency symbols, or any set of characters that are
>>>>     so broken and useless in Swift 4 that it isn’t worth worrying about.
>>>>      - The proposal is likely to turn a large number of code points into rejected
>>>>     characters.  In the discussions, some people will be tempted to argue
>>>>     endlessly about individual rejections.  To control that, we can require that
>>>>     people point out an example where the character is already in use, or where
>>>>     it has a clear application to a domain that is known today: the discussion
>>>>     needs to be grounded and practical, not theoretical.
>>>>     Third, if there is interest sometime in the future, we can have subsequent
>>>>     proposals that expand the range of accepted code points, motivated by the
>>>>     specific application domain that cares about them.  These proposals will not
>>>>     be source breaking, so they can happen at any time.
>>>>     Is anyone interested in helping to push this effort forward?
>>>>     -Chris
>>>>     _______________________________________________
>>>>     swift-evolution mailing list
>>>>     swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>>     https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

More information about the swift-evolution mailing list