[swift-evolution] A path forward on rationalizing unicode identifiers and operators
Vladimir.S
svabox at gmail.com
Mon Oct 2 06:59:41 CDT 2017
On 02.10.2017 8:30, Kenny Leung via swift-evolution wrote:
> I guess theoretically you could have two variables that look alike, but are actually
> different values, allowing you to insert some obfuscated malicious code somehow.
>
Also, IIRC, there is a "similar" problem exists with Right-To-Left "modifier", so
when inserted inside some variable name, you *see* (in browser/in editor) not the
same variable name that will be used *by compiler*. Can't find the link right now,
but if this could be helpful - will try to find.
Vladimir.
> -Kenny
>
>
>> On Oct 1, 2017, at 10:01 PM, Chris Lattner <clattner at nondot.org
>> <mailto:clattner at nondot.org>> wrote:
>>
>>>
>>> On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution
>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>
>>> Hi All.
>>>
>>> I’d like to help as well. I have fun with operators.
>>>
>>> There is also the issue of code security with invisible unicode characters and
>>> characters that look exactly alike.
>>
>> Unless there is a compelling reason to add them, I think we should ban invisible
>> characters. What is the harm of characters that look alike?
>>
>> -Chris
>>
>>
>>> (They should make a Coding font that ensures all characters look different.) Was
>>> that ever resolved? Googling, I found this:
>>>
>>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160620/021446.html
>>>
>>> Which seems to have been left at this:
>>>
>>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160725/025555.html
>>>
>>> https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160919/thread.html#27229
>>>
>>> Should we throw all of this into the same pot, and make any characters that aren’t
>>> on the approved list illegal?
>>>
>>> -Kenny
>>>
>>>
>>>> On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution
>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>
>>>> I’m happy to participate in the reshaping of the proposal. It would be nice to
>>>> gather a group of people again to help drive it forward.
>>>>
>>>> That said, it’s unclear to me that superscript T is clearly an operator, any more
>>>> than would be superscript H (Hermitian), superscript 2, superscript 3, etc. But
>>>> at any rate, this would be discussion for the future workgroup.
>>>>
>>>> I would strongly advocate that the things-that-are-identifiers group be strongly
>>>> tied to the existing, complete Unicode standard for such, and that the critical
>>>> parts of the previous document about normalization be retained.
>>>>
>>>> On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution
>>>> <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>
>>>>
>>>> The core team recently met to discuss PR609 - Refining identifier and
>>>> operator symbology:
>>>> https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md
>>>>
>>>> The proposal correctly observes that the partitioning of unicode codepoints
>>>> into identifiers and operators is a mess in some cases. It really is an
>>>> outright bug for 🙂 to be an identifier, but ☹️ to be an operator. That
>>>> said, the proposal itself is complicated and is defined in terms of a bunch
>>>> of unicode classes that may evolve in the “wrong way for Swift” in the future.
>>>>
>>>> The core team would really like to get this sorted out for Swift 5, and
>>>> sooner is better than later :-). Because it seems that this is a really hard
>>>> problem and that perfection is becoming the enemy of good
>>>> <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team
>>>> requests the creation of a new proposal with a different approach. The
>>>> general observation is that there are three kinds of characters: things that
>>>> are obviously identifiers, things that are obviously math operators, and
>>>> things that are non-obvious. Things that are non-obvious can be made into
>>>> invalid code points, and legislated later in follow-up proposals if/when
>>>> someone cares to argue for them.
>>>>
>>>>
>>>> To make progress on this, we suggest a few separable steps:
>>>>
>>>> First, please split out the changes to the ASCII characters (e.g. . and \
>>>> operator parsing rules) to its own (small) proposal, since it is unrelated to
>>>> the unicode changes, and can make progress on that proposal independently.
>>>>
>>>>
>>>> Second, someone should take a look at the concrete set of unicode identifiers
>>>> that are accepted by Swift 4 and write a new proposal that splits them into
>>>> the three groups: those that are clearly identifiers (which become
>>>> identifiers), those that are clearly operators (which become operators), and
>>>> those that are unclear or don’t matter (these become invalid code points).
>>>>
>>>> I suggest that the criteria be based on*utility for Swift code*, not on the
>>>> underlying unicode classification. For example, the discussion thread for
>>>> PR609 mentions that the T character in “ xᵀ ” is defined in unicode as a
>>>> latin “letter”. Despite that, its use is Swift would clearly be as a postfix
>>>> operator, so we should classify it as an operator.
>>>>
>>>> Other suggestions:
>>>> - Math symbols are operators excepting those primarily used as identifiers
>>>> like “alpha”. If there are any characters that are used for both, this
>>>> proposal should make them invalid.
>>>> - While there may be useful ranges for some identifiers (e.g. to handle
>>>> european accented characters), the Emoji range should probably have each
>>>> codepoint independently judged, and currently unassigned codepoints should
>>>> not get a meaning defined for them.
>>>> - Unicode “faces”, “people”, “animals” etc are all identifiers.
>>>> - In order to reduce the scope of the proposal, it is a safe default to
>>>> exclude characters that are unlikely to be used by Swift code today,
>>>> including Braille, weird currency symbols, or any set of characters that are
>>>> so broken and useless in Swift 4 that it isn’t worth worrying about.
>>>> - The proposal is likely to turn a large number of code points into rejected
>>>> characters. In the discussions, some people will be tempted to argue
>>>> endlessly about individual rejections. To control that, we can require that
>>>> people point out an example where the character is already in use, or where
>>>> it has a clear application to a domain that is known today: the discussion
>>>> needs to be grounded and practical, not theoretical.
>>>>
>>>>
>>>> Third, if there is interest sometime in the future, we can have subsequent
>>>> proposals that expand the range of accepted code points, motivated by the
>>>> specific application domain that cares about them. These proposals will not
>>>> be source breaking, so they can happen at any time.
>>>>
>>>>
>>>> Is anyone interested in helping to push this effort forward?
>>>>
>>>> -Chris
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
More information about the swift-evolution
mailing list