[swift-evolution] Prohibit invisible characters in identifier names

Tue Jun 28 14:43:39 CDT 2016

Doesn't Unicode have a standard for this that specified which characters are look-alikes?

Russ

> On Jun 21, 2016, at 7:48 AM, Vladimir.S via swift-evolution <swift-evolution at swift.org> wrote:
> 
> 
>> On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:
>> 
>>> On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution
>>> <swift-evolution at swift.org> wrote:
>>> 
>>>> Perhaps stupid but: why was Swift designed to accept most Unicode
>>>> characters in identifier names? Wouldn’t it be simpler to go back to
>>>> a model where only standard ascii characters are accepted in
>>>> identifier names?
>>> 
>>> I assume it has something to do with the fact that 94.6% of the
>>> world's population speak a first language which is not English. That
>>> outweighs the inconvenience for Anglo developers, IMHO.
>> 
>> Yes, but the SDKs (frameworks, system libraries) are all in English,
>> including Swift standard library. I remember a few languages attempting
>> localized versions for kids to study better, failing terribly because
>> you learned something that had a very very limited use.
> 
> Support Charlie's opinion. For me (as non-native English speaker) non-ASCII characters in identifiers had no sense, even when I start to tech the programming when I was a child. Expressions composed from identifiers written in my native language is not near correct sentences.
> 
> Even more, we still have all other parts of language in English - for-while-guard-let-var-func etc..
> 
>> 
>> When it comes to maintaining code, using localized identifier names is a
>> bad practice since anyone outside that country coming to the code can't
>> really use it. I personally can't imagine coming to maintain Swift code
>> with identifiers in Chinese, Japanese, Arabic, ...
>> 
>> While the feature of non-ASCII characters being allowed as identifiers
>> (which was held up high with Apple giving emoji examples) may seem cool,
>> I can only see this helpful in the future, given a different keyboard
>> layout (as someone has pointed out some time ago here), to introduce
>> one-character operators that would be otherwise impossible. But if
>> someone came to me with a code where a variable would be an emoji of a
>> dog, he'd get fired on the spot.
> 
> Yes, but I don't believe Apple will accept limiting of character set for identifiers to ASCII *after* these presentations with emoji of a dog ;-)
> 
>> 
>> I'd personally vote to keep the zero-width-joiner characters forbidden
>> within the code outside of string literals (where they may make sense).
>> I agree that this can be easily solved by linters, but: I think this
>> particular set of characters should be restricted by the language
>> itself, since it's something easily omittable during code review and
>> given the upcoming package manager, this can lead to a hard-to-find
>> malware being distributed among developers who include these packages
>> within their projects - since you usually do not run a linter on a 3rd
>> party code.
> 
> I also think the main problem that could be caused by such tricks with zero-width-joiner or right-to-left-markers is injecting some malware code into sources in github, in package manager *or* even just in  code snippet on web page(so you copy-pasted it to your source). Right now I don't know exact method to implement such malware code, but I believe this vulnerability could be used some day.
> 
> Btw, regarding the package manager. Will we have any protection from Typosquatting ? http://incolumitas.com/2016/06/08/typosquatting-package-managers/#typosquatting-package-managers
> 
>> 
>> As for the confusables - this depends a lot on the rendering and what
>> font you have set. I've tried  𝛎 → v with current Xcode and it looks
>> really different, mostly when you use a fixed-space font which usually
>> doesn't have non-ASCII characters which are then rendered using a
>> different font, making the distinction easy to spot.
> 
> In Russian we have these chars :
> у к е г х а р о с ь
> which are similar to english:
> y k e r x a p o c b
> 
> So you most likely can't differ `рос` and `poc` , `хае` and `xae` etc
> 
> I don't think compiler should somehow decide if one non-English letter is looks like another English letter. But don't see any other method to protect myself other than using lints/checking tools for 3rd party code also.
> 
>> 
>>> 
>>> Honestly, this seems to me like a concern for linters and security
>>> auditing tools, not for the compiler. Swift identifiers are
>>> case-sensitive; I see no reason they shouldn't be script-sensitive or
>>> zero-width-joiner-sensitive. (Though basic Unicode normalization seems
>>> like a good idea, since differently-normalized strings are `==`
>>> anyway.)
>>> 
>>> -- Brent Royal-Gordon Architechies
>>> 
>>> _______________________________________________ swift-evolution
>>> mailing list swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
>> _______________________________________________ swift-evolution mailing
>> list swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution