[swift-evolution] Prohibit invisible characters in identifier names

Dave Abrahams dabrahams at apple.com
Mon Jun 20 16:44:01 CDT 2016


on Mon Jun 20 2016, Xiaodi Wu <swift-evolution at swift.org> wrote:

> On Mon, Jun 20, 2016 at 2:42 PM, João Pinheiro <joao at joaopinheiro.org>
> wrote:
>
>> I agree that treating zero-width spaces as non-existent would be a
>> possible solution, but I think it would make more sense to consider it as
>> white space and thus not admissible in identifier names.
>>
>
> If you treat it like whitespace, then you get interesting behaviors that I
> don't think you would want. For example, something that looks like `if
> letter...` could be parsed as conditional binding `if let ter...` if I put
> in a zero-width space in the right place.
>
>> I'm not sure of what the best way to handle left-to-right and
>> right-to-left markers would be. Does it make sense to allow mixed text
>> orientation in identifiers?
>>
>
> How do other languages that support Unicode handle these markers in
> identifiers? I'd be interested to know.
>
>> Removing ambiguity between unicode confusables is a much more complicated
>> issue which implies defining a canonical unicode representation for
>> identifiers and a way to resolve them. It would also make it impractical to
>> use certain valid mathematical symbols as identifiers.
>>
>
> Most interesting mathematical symbols are reserved for operators anyway. As
> a result, `x` and the multiplication symbol are not readily confusable in
> most contexts in Swift, and confusable resolution could be built in such a
> way that identifier characters are not regarded as confusable with operator
> characters.

I'm a little concerned about cases like these:

1D6CE ;	0076 ;	MA	# ( 𝛎 → v ) MATHEMATICAL BOLD SMALL NU → LATIN SMALL LETTER V	# →ν→
1D6D2 ;	0070 ;	MA	# ( 𝛒 → p ) MATHEMATICAL BOLD SMALL RHO → LATIN SMALL LETTER P	# →ρ→

etc.  Now, one could reasonably argue that using “𝛎” and “v” to mean
different things in the same scope would be bad, but I'm not sure
we really want to accept them as aliases of one another, either.

>> João Pinheiro
>>
>>
>> On 20 Jun 2016, at 20:23, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>
>> On Mon, Jun 20, 2016 at 2:17 PM, João Pinheiro <swift-evolution at swift.org>
>> wrote:
>>
>>> Nice feature in the IBM Swift Sandbox. Xcode doesn't display zero-width
>>> spaces either so the identifier names look exactly the same.
>>>
>>> The issue with left-to-right and right-to-left markers is interesting and
>>> has previously been exploited in email phishing attacks.
>>>
>>> It would be possible to highlight invisible characters in Xcode as a
>>> stopgap measure, but that doesn't solve the problem for developers using
>>> other editors or in other platforms. I think it would be a better idea to
>>> sanitise the set of allowed (or prohibited) characters for identifiers at
>>> the language level.
>>>
>>
>> This is a potential security problem, but no need try to invent an ad-hoc
>> solution here, particularly one as drastic as prohibiting characters. The
>> same security considerations are applicable elsewhere and there's a lot of
>> work about Unicode security. See here:
>> http://www.unicode.org/reports/tr39/
>>
>> Unicode maintains a list of "confusable" characters. See here:
>> http://www.unicode.org/Public/security/latest/confusables.txt
>>
>> It should be sufficient to regard confusables as the same glyph for the
>> purpose of identifier names; zero-width and invisible marks would then be
>> regarded as non-existent, so that `test` and `t[invisible glyph]est` would
>> refer to the same variable.
>>
>>
>>> Sincerely,
>>> João Pinheiro
>>>
>>>
>>> > On 20 Jun 2016, at 19:26, Vladimir.S <svabox at gmail.com> wrote:
>>> >
>>> > Very interesting.
>>> >
>>> > Btw, IBM Swift Sandbox shows these spaces:
>>> > https://swiftlang.ng.bluemix.net/
>>> > But my mail client does not - i.e. I saw exactly the same "test"&"abc"
>>> >
>>> > Also, I read about some issues with left-to-right and right-to-left
>>> markers that also somehow change the actual text of source - i.e. you see
>>> one text, but when it compiles - it works not as expected. I.e.
>>> viewer/editor processes these special codes and show you one text, but
>>> compiler treats text in another way.
>>> >
>>> > I believe it is a potential security problem that all unicode chars are
>>> allowed for variables/func names in Swift. IMO We definitely should limit
>>> allowed charset for identifiers in sources.
>>> >
>>> > On 20.06.2016 20:51, João Pinheiro via swift-evolution wrote:
>>> >> Recently there has been a screenshot going around Twitter about C++
>>> allowing zero-width spaces in variable names. Swift also suffers from this
>>> problem which can be abused to create ambiguous, misleading, and
>>> potentially obfuscate nefarious code.
>>> >>
>>> >> I would like to propose a change to prohibit the use of invisible
>>> characters in identifier names.
>>> >>
>>> >> I'm including an example of problematic code at the bottom of this
>>> email.
>>> >>
>>> >> Sincerely,
>>> >> João Pinheiro
>>> >>
>>> >>
>>> >> /* The output for this code is:
>>> >> A
>>> >> B
>>> >> C
>>> >> 1
>>> >> 2
>>> >> 3
>>> >> */
>>> >>
>>> >> func test() { print("A") }
>>> >> func t​est() { print("B") }
>>> >> func te​st() { print("C") }
>>> >>
>>> >> let abc = 1
>>> >> let a​bc = 2
>>> >> let ab​c = 3
>>> >>
>>> >> test()
>>> >> t​est()
>>> >> te​st()
>>> >>
>>> >> print(abc)
>>> >> print(a​bc)
>>> >> print(ab​c)
>>> >> _______________________________________________
>>> >> swift-evolution mailing list
>>> >> swift-evolution at swift.org
>>> >> https://lists.swift.org/mailman/listinfo/swift-evolution
>>> >>
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>
>>
>>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>

-- 
Dave



More information about the swift-evolution mailing list