[swift-evolution] Prohibit invisible characters in identifier names

Tue Jun 21 09:48:13 CDT 2016

On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:
>
>> On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution
>> <swift-evolution at swift.org> wrote:
>>
>>> Perhaps stupid but: why was Swift designed to accept most Unicode
>>> characters in identifier names? Wouldn’t it be simpler to go back to
>>> a model where only standard ascii characters are accepted in
>>> identifier names?
>>
>> I assume it has something to do with the fact that 94.6% of the
>> world's population speak a first language which is not English. That
>> outweighs the inconvenience for Anglo developers, IMHO.
>
> Yes, but the SDKs (frameworks, system libraries) are all in English,
> including Swift standard library. I remember a few languages attempting
> localized versions for kids to study better, failing terribly because
> you learned something that had a very very limited use.

Support Charlie's opinion. For me (as non-native English speaker) non-ASCII 
characters in identifiers had no sense, even when I start to tech the 
programming when I was a child. Expressions composed from identifiers 
written in my native language is not near correct sentences.

Even more, we still have all other parts of language in English - 
for-while-guard-let-var-func etc..

>
> When it comes to maintaining code, using localized identifier names is a
> bad practice since anyone outside that country coming to the code can't
> really use it. I personally can't imagine coming to maintain Swift code
> with identifiers in Chinese, Japanese, Arabic, ...
>
> While the feature of non-ASCII characters being allowed as identifiers
> (which was held up high with Apple giving emoji examples) may seem cool,
> I can only see this helpful in the future, given a different keyboard
> layout (as someone has pointed out some time ago here), to introduce
> one-character operators that would be otherwise impossible. But if
> someone came to me with a code where a variable would be an emoji of a
> dog, he'd get fired on the spot.

Yes, but I don't believe Apple will accept limiting of character set for 
identifiers to ASCII *after* these presentations with emoji of a dog ;-)

>
> I'd personally vote to keep the zero-width-joiner characters forbidden
> within the code outside of string literals (where they may make sense).
> I agree that this can be easily solved by linters, but: I think this
> particular set of characters should be restricted by the language
> itself, since it's something easily omittable during code review and
> given the upcoming package manager, this can lead to a hard-to-find
> malware being distributed among developers who include these packages
> within their projects - since you usually do not run a linter on a 3rd
> party code.

I also think the main problem that could be caused by such tricks with 
zero-width-joiner or right-to-left-markers is injecting some malware code 
into sources in github, in package manager *or* even just in  code snippet 
on web page(so you copy-pasted it to your source). Right now I don't know 
exact method to implement such malware code, but I believe this 
vulnerability could be used some day.

Btw, regarding the package manager. Will we have any protection from 
Typosquatting ? 
http://incolumitas.com/2016/06/08/typosquatting-package-managers/#typosquatting-package-managers

>
> As for the confusables - this depends a lot on the rendering and what
> font you have set. I've tried  𝛎 → v with current Xcode and it looks
> really different, mostly when you use a fixed-space font which usually
> doesn't have non-ASCII characters which are then rendered using a
> different font, making the distinction easy to spot.

In Russian we have these chars :
у к е г х а р о с ь
which are similar to english:
y k e r x a p o c b

So you most likely can't differ `рос` and `poc` , `хае` and `xae` etc

I don't think compiler should somehow decide if one non-English letter is 
looks like another English letter. But don't see any other method to 
protect myself other than using lints/checking tools for 3rd party code also.

>
>>
>> Honestly, this seems to me like a concern for linters and security
>> auditing tools, not for the compiler. Swift identifiers are
>> case-sensitive; I see no reason they shouldn't be script-sensitive or
>> zero-width-joiner-sensitive. (Though basic Unicode normalization seems
>> like a good idea, since differently-normalized strings are `==`
>> anyway.)
>>
>> -- Brent Royal-Gordon Architechies
>>
>> _______________________________________________ swift-evolution
>> mailing list swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________ swift-evolution mailing
> list swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>