<div dir="ltr">On Mon, Jun 20, 2016 at 4:44 PM, Dave Abrahams via swift-evolution <span dir="ltr"><<a href="mailto:swift-evolution@swift.org" target="_blank">swift-evolution@swift.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
on Mon Jun 20 2016, Xiaodi Wu <<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a>> wrote:<br>
<br>
> On Mon, Jun 20, 2016 at 2:42 PM, João Pinheiro <<a href="mailto:joao@joaopinheiro.org">joao@joaopinheiro.org</a>><br>
> wrote:<br>
><br>
>> I agree that treating zero-width spaces as non-existent would be a<br>
>> possible solution, but I think it would make more sense to consider it as<br>
>> white space and thus not admissible in identifier names.<br>
>><br>
><br>
> If you treat it like whitespace, then you get interesting behaviors that I<br>
> don't think you would want. For example, something that looks like `if<br>
> letter...` could be parsed as conditional binding `if let ter...` if I put<br>
> in a zero-width space in the right place.<br>
><br>
>> I'm not sure of what the best way to handle left-to-right and<br>
>> right-to-left markers would be. Does it make sense to allow mixed text<br>
>> orientation in identifiers?<br>
>><br>
><br>
> How do other languages that support Unicode handle these markers in<br>
> identifiers? I'd be interested to know.<br>
><br>
>> Removing ambiguity between unicode confusables is a much more complicated<br>
>> issue which implies defining a canonical unicode representation for<br>
>> identifiers and a way to resolve them. It would also make it impractical to<br>
>> use certain valid mathematical symbols as identifiers.<br>
>><br>
><br>
> Most interesting mathematical symbols are reserved for operators anyway. As<br>
> a result, `x` and the multiplication symbol are not readily confusable in<br>
> most contexts in Swift, and confusable resolution could be built in such a<br>
> way that identifier characters are not regarded as confusable with operator<br>
> characters.<br>
<br>
</span>I'm a little concerned about cases like these:<br>
<br>
1D6CE ; 0076 ; MA # ( 𝛎 → v ) MATHEMATICAL BOLD SMALL NU → LATIN SMALL LETTER V # →ν→<br>
1D6D2 ; 0070 ; MA # ( 𝛒 → p ) MATHEMATICAL BOLD SMALL RHO → LATIN SMALL LETTER P # →ρ→<br>
<br>
etc. Now, one could reasonably argue that using “𝛎” and “v” to mean<br>
different things in the same scope would be bad, but I'm not sure<br>
we really want to accept them as aliases of one another, either.<br></blockquote><div><br></div><div>Yes, that does give me pause. FWIW, though, Greek letters have been known to turn into their lookalike Latin counterparts. For instance, do a Google search for Planck's equation written as "E = hv" (that "v" is supposed to be lowercase nu). Or consider the abbreviation "XP" for Christ, etymologically uppercase chi and rho (the first two letters of Christ in Greek). (Or relatedly, the erroneous claim that "Xmas" is an attempt to remove Christ out of Christmas.)</div><div><br></div><div>I guess what I'm saying is, if a co-worker named two distinct variables v and nu, I would have a word or two with them... Consider an alternative scenario. I have a Greek keyboard in my keyboard switcher, handy for scientific uses. If I accidentally use Greek uppercase alpha in my code instead of A, this would be essentially impossible to find by eye. Why should the language not elide the invisible distinction?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
>> João Pinheiro<br>
>><br>
>><br>
>> On 20 Jun 2016, at 20:23, Xiaodi Wu <<a href="mailto:xiaodi.wu@gmail.com">xiaodi.wu@gmail.com</a>> wrote:<br>
>><br>
>> On Mon, Jun 20, 2016 at 2:17 PM, João Pinheiro <<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a>><br>
>> wrote:<br>
>><br>
>>> Nice feature in the IBM Swift Sandbox. Xcode doesn't display zero-width<br>
>>> spaces either so the identifier names look exactly the same.<br>
>>><br>
>>> The issue with left-to-right and right-to-left markers is interesting and<br>
>>> has previously been exploited in email phishing attacks.<br>
>>><br>
>>> It would be possible to highlight invisible characters in Xcode as a<br>
>>> stopgap measure, but that doesn't solve the problem for developers using<br>
>>> other editors or in other platforms. I think it would be a better idea to<br>
>>> sanitise the set of allowed (or prohibited) characters for identifiers at<br>
>>> the language level.<br>
>>><br>
>><br>
>> This is a potential security problem, but no need try to invent an ad-hoc<br>
>> solution here, particularly one as drastic as prohibiting characters. The<br>
>> same security considerations are applicable elsewhere and there's a lot of<br>
>> work about Unicode security. See here:<br>
>> <a href="http://www.unicode.org/reports/tr39/" rel="noreferrer" target="_blank">http://www.unicode.org/reports/tr39/</a><br>
>><br>
>> Unicode maintains a list of "confusable" characters. See here:<br>
>> <a href="http://www.unicode.org/Public/security/latest/confusables.txt" rel="noreferrer" target="_blank">http://www.unicode.org/Public/security/latest/confusables.txt</a><br>
>><br>
>> It should be sufficient to regard confusables as the same glyph for the<br>
>> purpose of identifier names; zero-width and invisible marks would then be<br>
>> regarded as non-existent, so that `test` and `t[invisible glyph]est` would<br>
>> refer to the same variable.<br>
>><br>
>><br>
>>> Sincerely,<br>
>>> João Pinheiro<br>
>>><br>
>>><br>
>>> > On 20 Jun 2016, at 19:26, Vladimir.S <<a href="mailto:svabox@gmail.com">svabox@gmail.com</a>> wrote:<br>
>>> ><br>
>>> > Very interesting.<br>
>>> ><br>
>>> > Btw, IBM Swift Sandbox shows these spaces:<br>
>>> > <a href="https://swiftlang.ng.bluemix.net/" rel="noreferrer" target="_blank">https://swiftlang.ng.bluemix.net/</a><br>
>>> > But my mail client does not - i.e. I saw exactly the same "test"&"abc"<br>
>>> ><br>
>>> > Also, I read about some issues with left-to-right and right-to-left<br>
>>> markers that also somehow change the actual text of source - i.e. you see<br>
>>> one text, but when it compiles - it works not as expected. I.e.<br>
>>> viewer/editor processes these special codes and show you one text, but<br>
>>> compiler treats text in another way.<br>
>>> ><br>
>>> > I believe it is a potential security problem that all unicode chars are<br>
>>> allowed for variables/func names in Swift. IMO We definitely should limit<br>
>>> allowed charset for identifiers in sources.<br>
>>> ><br>
>>> > On 20.06.2016 20:51, João Pinheiro via swift-evolution wrote:<br>
>>> >> Recently there has been a screenshot going around Twitter about C++<br>
>>> allowing zero-width spaces in variable names. Swift also suffers from this<br>
>>> problem which can be abused to create ambiguous, misleading, and<br>
>>> potentially obfuscate nefarious code.<br>
>>> >><br>
>>> >> I would like to propose a change to prohibit the use of invisible<br>
>>> characters in identifier names.<br>
>>> >><br>
>>> >> I'm including an example of problematic code at the bottom of this<br>
>>> email.<br>
>>> >><br>
>>> >> Sincerely,<br>
>>> >> João Pinheiro<br>
>>> >><br>
>>> >><br>
>>> >> /* The output for this code is:<br>
>>> >> A<br>
>>> >> B<br>
>>> >> C<br>
>>> >> 1<br>
>>> >> 2<br>
>>> >> 3<br>
>>> >> */<br>
>>> >><br>
>>> >> func test() { print("A") }<br>
>>> >> func test() { print("B") }<br>
>>> >> func test() { print("C") }<br>
>>> >><br>
>>> >> let abc = 1<br>
>>> >> let abc = 2<br>
>>> >> let abc = 3<br>
>>> >><br>
>>> >> test()<br>
>>> >> test()<br>
>>> >> test()<br>
>>> >><br>
>>> >> print(abc)<br>
>>> >> print(abc)<br>
>>> >> print(abc)<br>
>>> >> _______________________________________________<br>
>>> >> swift-evolution mailing list<br>
>>> >> <a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a><br>
>>> >> <a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>
>>> >><br>
>>><br>
>>> _______________________________________________<br>
>>> swift-evolution mailing list<br>
>>> <a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a><br>
>>> <a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>
>>><br>
>><br>
>><br>
>><br>
> _______________________________________________<br>
> swift-evolution mailing list<br>
> <a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a><br>
> <a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>
><br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
Dave<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
swift-evolution mailing list<br>
<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a><br>
<a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank">https://lists.swift.org/mailman/listinfo/swift-evolution</a><br>
</div></div></blockquote></div><br></div></div>