<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Let me tl;dr'er this even more: ☹️ is an operator, but 🙂 is an identifier.</div><div class=""><br class=""></div><div class="">-- E, succinct, who thinks there's room for improvement</div><div class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On Sep 18, 2016, at 1:33 PM, Jacob Bandes-Storch via swift-evolution &lt;<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class="gmail_signature"><div dir="ltr" class=""><div class=""><div class=""><b class="">TL;DR:</b></div><div class=""><br class=""></div><div class="">Swift 4 Stage 1 seeks to prioritize "Source stability features". Most source-breaking changes were done with in Swift 3; however, the categorization of Unicode characters into identifiers &amp; operators was never thoroughly discussed on swift-evolution. This seems like it might be our last chance, and I think there are some big improvements to be had.</div><div class=""><br class=""></div><div class="">I've gathered some information+thoughts into an early-stage pitch / pre-proposal. It doesn't really have a conclusion, so I'm hoping we can discuss these issues and come up with good (pragmatic) solutions here. I imagine this can morph into a proposal later.</div><div class=""><br class=""></div><div class="">You can read the following in nicer HTML form at&nbsp;<a href="https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59" class="">https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59</a></div><div class=""><br class="gmail-Apple-interchange-newline">I look forward to the discussion!<br class=""></div><div class=""><br class=""></div><div class="">-Jacob</div><div class=""><font size="4" class=""><br class=""></font></div><div class=""><b class=""><font size="4" class=""># Background and motivation</font></b></div><div class=""><br class=""></div><div class="">To ease lexing/parsing and avoid user confusion, the names of custom identifiers (type names, variable names, etc.) and operators in Swift can be composed of (mostly) separate sets of characters.</div><div class=""><br class=""></div><div class="">Using terminology from TSPL:</div><div class=""><br class=""></div><div class="">`identifier-head`/`operator-head` are characters which can <i class="">begin&nbsp;</i>an identifier or operator.</div><div class=""><br class=""></div><div class="">`identifier-character`/`operator-character` are characters which can appear anywhere in an identifier or operator (these are supersets of the `-head` sets).</div><div class=""><br class=""></div><div class="">&lt;<a href="https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html" class="">https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html</a>&gt;</div><div class=""><br class=""></div><div class="">(Note also that some particular arrangements of characters are reserved; for instance, `$` followed by digits for an implicit closure parameter, and "If an operator doesn’t begin with a dot, it can’t contain a dot elsewhere." There are also special characters in the language which are neither identifiers nor operators, such as: `()[]{},:@#`)</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class="">## Prior discussion on swift-evolution</b></div><div class=""><br class=""></div><div class=""><b class="">"Request to add middle dot (U+00B7) as operator character?"</b></div><div class="">&lt;<a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html" class="">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html</a>&gt;</div><div class=""><br class=""></div><div class=""><b class="">"Free the '$' Symbol!"</b></div><div class="">&lt;<a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html" class="">https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html</a>&gt;</div><div class=""><br class=""></div><div class=""><b class="">"Proposal: Allow Single Dollar Sign as Valid Identifier"</b></div><div class="">&lt;<a href="https://github.com/apple/swift-evolution/pull/354" class="">https://github.com/apple/swift-evolution/pull/354</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Chris Lattner has said:</div><div class=""><br class=""></div><div class="">&gt; "...our current operator space (particularly the unicode segments covered) is not super well considered.&nbsp; It would be great for someone to take a more systematic pass over them to rationalize things."</div><div class=""><br class=""></div><div class="">&gt; "We need a token to be unambiguously an operator or identifier - we can have different rules for the leading and subsequent characters though."</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class=""><font size="4" class=""># Current state of affairs</font></b></div><div class=""><br class=""></div><div class="">Swift's `identifier-head` and `identifier-character` mostly conform to the recommendations in &lt;<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3146.html" class="">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3146.html</a>&gt; &nbsp;</div><div class="">&lt;<a href="https://github.com/apple/swift/blob/08e7963/lib/Parse/Lexer.cpp#L421-L489" class="">https://github.com/apple/swift/blob/08e7963/lib/Parse/Lexer.cpp#L421-L489</a>&gt;</div><div class=""><br class=""></div><div class="">The allowed operator characters include "Unicode math, symbol, arrow, dingbat, and line/box drawing chars", however I don't believe this aligns with any particular spec:</div><div class="">&lt;<a href="https://github.com/apple/swift/blob/08e7963/include/swift/AST/Identifier.h#L87-L121" class="">https://github.com/apple/swift/blob/08e7963/include/swift/AST/Identifier.h#L87-L121</a>&gt; &nbsp;</div><div class="">&lt;<a href="https://github.com/apple/swift/commit/a2341a4" class="">https://github.com/apple/swift/commit/a2341a4</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class="">## Identifiers/operators elsewhere</b></div><div class=""><br class=""></div><div class="">There is an Unicode Standard Annex "identifier and pattern syntax" &lt;<a href="http://unicode.org/reports/tr31/" class="">http://unicode.org/reports/tr31/</a>&gt; which defines the categories `ID_Start`/`ID_Continue`.</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AID_Continue%3A%5D" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AID_Continue%3A%5D</a>&gt;</div><div class=""><br class=""></div><div class=""><b class="">### ECMAScript 2015 "ES6"</b></div><div class=""><br class=""></div><div class="">Uses `ID_Start` and `ID_Continue`, as well as `Other_ID_Start` / `Other_ID_Continue`.</div><div class="">&lt;<a href="http://www.ecma-international.org/ecma-262/6.0/#sec-names-and-keywords" class="">http://www.ecma-international.org/ecma-262/6.0/#sec-names-and-keywords</a>&gt;</div><div class=""><br class=""></div><div class=""><b class="">### Haskell</b></div><div class=""><br class=""></div><div class="">Distinguishes identifiers/operators by their general category (such as "any Unicode lowercase letter", "any Unicode symbol or punctuation", etc.). &nbsp;</div><div class="">&lt;<a href="http://www.fileformat.info/info/unicode/category/index.htm" class="">http://www.fileformat.info/info/unicode/category/index.htm</a>&gt;</div><div class=""><br class=""></div><div class="">In particular, identifiers can start with any lowercase letter or _, and may contain any letter/digit/'/_. This would seem to include letters like δ and Я, and digits like ٢.</div><div class=""><br class=""></div><div class="">&lt;<a href="https://www.haskell.org/onlinereport/syntax-iso.html" class="">https://www.haskell.org/onlinereport/syntax-iso.html</a>&gt; &nbsp;</div><div class="">&lt;<a href="https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973" class="">https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class=""><font size="4" class=""># Current problems</font></b></div><div class=""><br class=""></div><div class=""><b class="">## Weird identifier code points</b></div><div class=""><br class=""></div><div class="">The current `identifier-character` set contains many characters which wouldn't make good identifiers:</div><div class=""><br class=""></div><div class="">- 11 entire planes of characters (U+20000–U+2FFFD, etc.) which are currently unassigned.</div><div class="">- The middle dot · which looks like an operator.</div><div class="">- Many non-combining "modifiers" and accent marks, such as ´ and ¨ and ꓻ which don't really make sense on their own.</div><div class="">- "Tone marks" from various languages, including ˫ (similar to a box-drawing character ├ which is an operator).</div><div class="">- The "Greek question mark" ;</div><div class="">- Symbols which are simply not linguistic, such as ۞ and ༒.</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/tyn0Cz" class="">https://goo.gl/tyn0Cz</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D</a>&gt;</div><div class=""><br class=""></div><div class=""><b class="">## Weird operator code points</b></div><div class=""><br class=""></div><div class="">The current `operator-character` set has a lot of characters that are clearly operator-esque (≈ ∈ ⊕ ⊅), but some things are not so obviously desirable:</div><div class=""><br class=""></div><div class="">- Box-drawing characters</div><div class="">- Combining accents and other characters</div><div class="">- Various symbols, e.g. ⚄ and ♄ (this category also overlaps with emoji)</div><div class="">- Braille patterns such as ⠟ — should they not be treated as letter-like (thus identifiers)?</div><div class="">- A plethora of arrows</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/s136Nh" class="">https://goo.gl/s136Nh</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class="">## Code points which are both</b></div><div class=""><br class=""></div><div class="">A handful of characters are accepted <b class="">both</b> as `identifier-head` and `operator-head` (which seems pointless and might have been unintentional):</div><div class=""><br class=""></div><div class="">U+3021–U+3029, Suzhou numerals &nbsp;〡〢〣〤〥〦〧〨〩 &lt;<a href="https://en.wikipedia.org/wiki/Suzhou_numerals" class="">https://en.wikipedia.org/wiki/Suzhou_numerals</a>&gt; &nbsp;</div><div class=""><br class=""></div><div class="">U+302A–U+302F, ideographic &amp; hangul tone marks &nbsp; 〪 &nbsp;〫 &nbsp;〬 &nbsp;〭 &nbsp;〮 &nbsp;〯</div><div class=""><br class=""></div><div class="">&nbsp; &nbsp; let 〨 = 2</div><div class="">&nbsp; &nbsp; infix operator &lt;〨&gt;</div><div class=""><br class=""></div><div class="">(Note that `infix operator 〨` doesn't work because the lexer greedily treats this as an identifier. Also, interestingly, the corresponding ideographic zero 〇 is only an identifier char.)</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/lZcMqO" class="">https://goo.gl/lZcMqO</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d]" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d]</a>&gt;</div><div class=""><br class=""></div><div class="">In addition to the numerals and tone marks above, many (all?) <b class="">combining marks</b> are accepted as `identifier-character` and `operator-character`. These may be necessary for natural-looking words in some languages, but they don't seem necessary for operators.</div><div class=""><br class=""></div><div class="">Also present in both sets are the <b class="">variation selectors</b> 1 through 256 (U+FE00–U+FE0F, U+E0100–U+E01EF). It seems they are of limited use for the operator characters, unless you count the emoji: &lt;<a href="http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt" class="">http://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt</a>&gt;</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/VKrisf" class="">https://goo.gl/VKrisf</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%0d%0a%5b0-9%0d%0a%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE20-%5cuFE2F%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d%0d%0a%5b%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE00-%5cuFE0F%0d%0a%5cuFE20-%5cuFE2F%0d%0a%5cU000E0100-%5cU000E01EF%5d]" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%5ba-zA-Z%0d%0a_%0d%0a%5cu00A8%0d%0a%5cu00AA%0d%0a%5cu00AD%0d%0a%5cu00AF%0d%0a%5cu00B2-%5cu00B5%0d%0a%5cu00B7-%5cu00BA%0d%0a%5cu00BC-%5cu00BE%0d%0a%5cu00C0-%5cu00D6%0d%0a%5cu00D8-%5cu00F6%0d%0a%5cu00F8-%5cu00FF%0d%0a%5cu0100-%5cu02FF%0d%0a%5cu0370-%5cu167F%0d%0a%5cu1681-%5cu180D%0d%0a%5cu180F-%5cu1DBF%0d%0a%5cu1E00-%5cu1FFF%0d%0a%5cu200B-%5cu200D%0d%0a%5cu202A-%5cu202E%0d%0a%5cu203F-%5cu2040%0d%0a%5cu2054%0d%0a%5cu2060-%5cu206F%0d%0a%5cu2070-%5cu20CF%0d%0a%5cu2100-%5cu218F%0d%0a%5cu2460-%5cu24FF%0d%0a%5cu2776-%5cu2793%0d%0a%5cu2C00-%5cu2DFF%0d%0a%5cu2E80-%5cu2FFF%0d%0a%5cu3004-%5cu3007%0d%0a%5cu3021-%5cu302F%0d%0a%5cu3031-%5cu303F%0d%0a%5cu3040-%5cuD7FF%0d%0a%5cuF900-%5cuFD3D%0d%0a%5cuFD40-%5cuFDCF%0d%0a%5cuFDF0-%5cuFE1F%0d%0a%5cuFE30-%5cuFE44%0d%0a%5cuFE47-%5cuFFFD%0d%0a%5cU00010000-%5cU0001FFFD%0d%0a%5cU00020000-%5cU0002FFFD%0d%0a%5cU00030000-%5cU0003FFFD%0d%0a%5cU00040000-%5cU0004FFFD%0d%0a%5cU00050000-%5cU0005FFFD%0d%0a%5cU00060000-%5cU0006FFFD%0d%0a%5cU00070000-%5cU0007FFFD%0d%0a%5cU00080000-%5cU0008FFFD%0d%0a%5cU00090000-%5cU0009FFFD%0d%0a%5cU000A0000-%5cU000AFFFD%0d%0a%5cU000B0000-%5cU000BFFFD%0d%0a%5cU000C0000-%5cU000CFFFD%0d%0a%5cU000D0000-%5cU000DFFFD%0d%0a%5cU000E0000-%5cU000EFFFD%5d%0d%0a%5b0-9%0d%0a%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE20-%5cuFE2F%5d%26%5b%2f%3d%5c-%2b%21%2a%25%3C%3E%5c%26%7c%5c%5e~%3f%0d%0a%5cu00A1-%5cu00A7%0d%0a%5cu00A9%5cu00AB%0d%0a%5cu00AC%0d%0a%5cu00AE%0d%0a%5cu00B0-%5cu00B1%0d%0a%5cu00B6%0d%0a%5cu00BB%0d%0a%5cu00BF%0d%0a%5cu00D7%0d%0a%5cu00F7%0d%0a%5cu2016-%5cu2017%0d%0a%5cu2020-%5cu2027%0d%0a%5cu2030-%5cu203E%0d%0a%5cu2041-%5cu2053%0d%0a%5cu2055-%5cu205E%0d%0a%5cu2190-%5cu23FF%0d%0a%5cu2500-%5cu2775%0d%0a%5cu2794-%5cu2BFF%0d%0a%5cu2E00-%5cu2E7F%0d%0a%5cu3001-%5cu3003%0d%0a%5cu3008-%5cu3030%5d%0d%0a%5b%5cu0300-%5cu036F%0d%0a%5cu1DC0-%5cu1DFF%0d%0a%5cu20D0-%5cu20FF%0d%0a%5cuFE00-%5cuFE0F%0d%0a%5cuFE20-%5cuFE2F%0d%0a%5cU000E0100-%5cU000E01EF%5d]</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class="">## Code points which should be illegal</b></div><div class=""><br class=""></div><div class="">There are several surprising non-printing characters, including:</div><div class=""><br class=""></div><div class="">- U+2064 INVISIBLE PLUS is currently an identifier</div><div class="">- U+200B ZERO WIDTH SPACE is currently an identifier</div><div class=""><br class=""></div><div class="">No good will come of these. Invisible characters should probably be disallowed (although some may be necessary for properly joining/splitting characters in some other languages).</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class="">## Categories which are split between identifiers and operators</b></div><div class=""><br class=""></div><div class="">- Emoji and symbols: most of the newer emoji are identifiers, but many emoji/pictographs are operators, especially those from "Miscellaneous Symbols". The results are hilariously illogical:</div><div class=""><br class=""></div><div class="">&nbsp; - ☹️ is an operator, but 🙂 is an identifier.</div><div class="">&nbsp; - ✌️ is an operator, but 🤘 is an identifier.</div><div class="">&nbsp; - 🔼 is an operator, but ▶️ is an identifier.</div><div class="">&nbsp; - ✳️ is an operator, but 🔯 is an identifier.</div><div class="">&nbsp; - ✈️ is an operator, but 🛩 is an identifier.</div><div class="">&nbsp; - ♠️ is an operator, but 🂡 is an identifier. (Presumably, 🂡 = A ♠️ 🂠!)</div><div class="">&nbsp;&nbsp;</div><div class="">&nbsp; (But the counterintuitive examples extend outside the emoji too: + is an operator, while ₊ and ⁺ are identifiers.)</div><div class=""><br class=""></div><div class="">- Currency symbols: ¢ £ ¤ ¥ are operators, but ₪ € ₱ ₹ ฿ and many others are identifiers, and $ is allowed in an identifier.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class="">## Missing characters</b></div><div class=""><br class=""></div><div class="">A handful of characters are neither operators nor identifiers. This list mostly makes sense (reserved characters and whitespace), but I wonder about a few which seem like they could easily be operators: ⑊ ⑀ ﹅ etc.</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/U0GVNn" class="">https://goo.gl/U0GVNn</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%5Cu0001-%5CU0010FFFF%5D-%5B%5B%2F%3D%5C-%2B!*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D%5D%5D" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%5Cu0001-%5CU0010FFFF%5D-%5B%5B%2F%3D%5C-%2B!*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5Ba-zA-Z%0D%0A_%0D%0A%5Cu00A8%0D%0A%5Cu00AA%0D%0A%5Cu00AD%0D%0A%5Cu00AF%0D%0A%5Cu00B2-%5Cu00B5%0D%0A%5Cu00B7-%5Cu00BA%0D%0A%5Cu00BC-%5Cu00BE%0D%0A%5Cu00C0-%5Cu00D6%0D%0A%5Cu00D8-%5Cu00F6%0D%0A%5Cu00F8-%5Cu00FF%0D%0A%5Cu0100-%5Cu02FF%0D%0A%5Cu0370-%5Cu167F%0D%0A%5Cu1681-%5Cu180D%0D%0A%5Cu180F-%5Cu1DBF%0D%0A%5Cu1E00-%5Cu1FFF%0D%0A%5Cu200B-%5Cu200D%0D%0A%5Cu202A-%5Cu202E%0D%0A%5Cu203F-%5Cu2040%0D%0A%5Cu2054%0D%0A%5Cu2060-%5Cu206F%0D%0A%5Cu2070-%5Cu20CF%0D%0A%5Cu2100-%5Cu218F%0D%0A%5Cu2460-%5Cu24FF%0D%0A%5Cu2776-%5Cu2793%0D%0A%5Cu2C00-%5Cu2DFF%0D%0A%5Cu2E80-%5Cu2FFF%0D%0A%5Cu3004-%5Cu3007%0D%0A%5Cu3021-%5Cu302F%0D%0A%5Cu3031-%5Cu303F%0D%0A%5Cu3040-%5CuD7FF%0D%0A%5CuF900-%5CuFD3D%0D%0A%5CuFD40-%5CuFDCF%0D%0A%5CuFDF0-%5CuFE1F%0D%0A%5CuFE30-%5CuFE44%0D%0A%5CuFE47-%5CuFFFD%0D%0A%5CU00010000-%5CU0001FFFD%0D%0A%5CU00020000-%5CU0002FFFD%0D%0A%5CU00030000-%5CU0003FFFD%0D%0A%5CU00040000-%5CU0004FFFD%0D%0A%5CU00050000-%5CU0005FFFD%0D%0A%5CU00060000-%5CU0006FFFD%0D%0A%5CU00070000-%5CU0007FFFD%0D%0A%5CU00080000-%5CU0008FFFD%0D%0A%5CU00090000-%5CU0009FFFD%0D%0A%5CU000A0000-%5CU000AFFFD%0D%0A%5CU000B0000-%5CU000BFFFD%0D%0A%5CU000C0000-%5CU000CFFFD%0D%0A%5CU000D0000-%5CU000DFFFD%0D%0A%5CU000E0000-%5CU000EFFFD%5D%0D%0A%5B0-9%0D%0A%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE20-%5CuFE2F%5D%5D%5D</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class=""><font size="4" class=""># Solutions</font></b></div><div class=""><br class=""></div><div class="">Still up for discussion — please reply to this thread!</div><div class=""><br class=""></div><div class="">Adopting (X)ID_Start/Continue for identifiers, or a simpler solution like Haskell's use of "letter" categories, might work well.</div><div class=""><br class=""></div><div class="">(I've given up hope of finding some kind of "perfect" solution — how can it be possible, when ᛏ is a letter, yet ↑ is not?)</div><div class=""><br class=""></div><div class="">Making the choice of operator characters more logical/standards-based would be nice (not just a set of ranges). However, Haskell's approach of using all punctuation &amp; symbols is probably not right for Swift:</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/Ud4KqY" class="">https://goo.gl/Ud4KqY</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/unicodeset.jsp?a=%5B%5B-%2F%3D%2B!*%25%3C%3E%5C%26%7C%5C%5E~?%5Cu00A1-%5Cu00A7%5Cu00A9%5Cu00AB%5Cu00AC%5Cu00AE%5Cu00B0-%5Cu00B1%5Cu00B6%5Cu00BB%5Cu00BF%5Cu00D7%5Cu00F7%5Cu2016-%5Cu2017%5Cu2020-%5Cu2027%5Cu2030-%5Cu203E%5Cu2041-%5Cu2053%5Cu2055-%5Cu205E%5Cu2190-%5Cu23FF%5Cu2500-%5Cu2775%5Cu2794-%5Cu2BFF%5Cu2E00-%5Cu2E7F%5Cu3001-%5Cu3003%5Cu3008-%5Cu3030%5Cu0300-%5Cu036F%5Cu1DC0-%5Cu1DFF%5Cu20D0-%5Cu20FF%5CuFE00-%5CuFE0F%5CuFE20-%5CuFE2F%5CU000E0100-%5CU000E01EF%5D%5D&amp;b=%5B%5B:Currency_Symbol:%5D%5B:Modifier_Symbol:%5D%5B:Math_Symbol:%5D%5B:Other_Symbol:%5D%5B:Connector_Punctuation:%5D%5B:Dash_Punctuation:%5D%5B:Close_Punctuation:%5D%5B:Final_Punctuation:%5D%5B:Initial_Punctuation:%5D%5B:Other_Punctuation:%5D%5B:Open_Punctuation:%5D%5D" class="">http://unicode.org/cldr/utility/unicodeset.jsp?a=%5B%5B-%2F%3D%2B!*%25%3C%3E%5C%26%7C%5C%5E~?%5Cu00A1-%5Cu00A7%5Cu00A9%5Cu00AB%5Cu00AC%5Cu00AE%5Cu00B0-%5Cu00B1%5Cu00B6%5Cu00BB%5Cu00BF%5Cu00D7%5Cu00F7%5Cu2016-%5Cu2017%5Cu2020-%5Cu2027%5Cu2030-%5Cu203E%5Cu2041-%5Cu2053%5Cu2055-%5Cu205E%5Cu2190-%5Cu23FF%5Cu2500-%5Cu2775%5Cu2794-%5Cu2BFF%5Cu2E00-%5Cu2E7F%5Cu3001-%5Cu3003%5Cu3008-%5Cu3030%5Cu0300-%5Cu036F%5Cu1DC0-%5Cu1DFF%5Cu20D0-%5Cu20FF%5CuFE00-%5CuFE0F%5CuFE20-%5CuFE2F%5CU000E0100-%5CU000E01EF%5D%5D&amp;b=%5B%5B:Currency_Symbol:%5D%5B:Modifier_Symbol:%5D%5B:Math_Symbol:%5D%5B:Other_Symbol:%5D%5B:Connector_Punctuation:%5D%5B:Dash_Punctuation:%5D%5B:Close_Punctuation:%5D%5B:Final_Punctuation:%5D%5B:Initial_Punctuation:%5D%5B:Other_Punctuation:%5D%5B:Open_Punctuation:%5D%5D</a>&gt;</div><div class=""><br class=""></div><div class="">I'm not really sure what to do with emoji — they're a very cute novelty feature, but I don't know what the motivation is for including these as valid operators/identifiers.</div><div class=""><br class=""></div><div class="">At the least, we should try to gather them all into one of the two categories. My inclination would be to keep them as identifiers, which would mean moving the following out of the operator category:</div><div class=""><br class=""></div><div class="">short url: &lt;<a href="https://goo.gl/CBJEKX" class="">https://goo.gl/CBJEKX</a>&gt;</div><div class=""><br class=""></div><div class="">&lt;<a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AEmoji%3A%5D%26%5B%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5D%5D" class="">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5B%3AEmoji%3A%5D%26%5B%5B%2F%3D%5C-%2B%21*%25%3C%3E%5C%26%7C%5C%5E~%3F%0D%0A%5Cu00A1-%5Cu00A7%0D%0A%5Cu00A9%5Cu00AB%0D%0A%5Cu00AC%0D%0A%5Cu00AE%0D%0A%5Cu00B0-%5Cu00B1%0D%0A%5Cu00B6%0D%0A%5Cu00BB%0D%0A%5Cu00BF%0D%0A%5Cu00D7%0D%0A%5Cu00F7%0D%0A%5Cu2016-%5Cu2017%0D%0A%5Cu2020-%5Cu2027%0D%0A%5Cu2030-%5Cu203E%0D%0A%5Cu2041-%5Cu2053%0D%0A%5Cu2055-%5Cu205E%0D%0A%5Cu2190-%5Cu23FF%0D%0A%5Cu2500-%5Cu2775%0D%0A%5Cu2794-%5Cu2BFF%0D%0A%5Cu2E00-%5Cu2E7F%0D%0A%5Cu3001-%5Cu3003%0D%0A%5Cu3008-%5Cu3030%5D%0D%0A%5B%5Cu0300-%5Cu036F%0D%0A%5Cu1DC0-%5Cu1DFF%0D%0A%5Cu20D0-%5Cu20FF%0D%0A%5CuFE00-%5CuFE0F%0D%0A%5CuFE20-%5CuFE2F%0D%0A%5CU000E0100-%5CU000E01EF%5D%5D%5D</a>&gt;</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><b class=""><font size="4" class=""># Concurrently-discussable topics</font></b></div><div class=""><br class=""></div><div class="">There are a few relevant topics that came to mind, which I think are worth discussing around the same time.</div><div class=""><br class=""></div><div class=""><b class="">## Dollar signs ($)</b></div><div class=""><br class=""></div><div class="">$ is currently allowed in identifiers, but it can't begin an identifier except for the magic implicit closure params ($0, $1, ...) and LLDB/REPL-related uses.</div><div class=""><br class=""></div><div class="">It's arguable, but I feel that $ would be more effective as an operator character than an identifier character. There's precedent in Haskell for operators like `&lt;$&gt;` and being able to replicate these in Swift would be nice.</div><div class=""><br class=""></div><div class=""><b class=""><br class=""></b></div><div class=""><b class="">## Diagnostics improvements</b></div><div class=""><br class=""></div><div class="">Regardless of what ends up being the ultimate solution, it would be great to improve diagnostics for cases when the wrong types of characters are used.</div><div class=""><br class=""></div><div class="">`infix operator abc` produces `'abc' is considered to be an identifier, not an operator`. That's not too bad.</div><div class=""><br class=""></div><div class="">`let +++ = 3` produces `expected pattern`.</div><div class=""><br class=""></div><div class="">`let $foo = 3` produces `expected numeric value following '$'`.</div><div class=""><br class=""></div><div class=""><b class=""><br class=""></b></div><div class=""><b class="">## Security and сοnfuѕаbIе characters</b></div><div class=""><br class=""></div><div class="">Confusable characters (e vs. е, o vs. ο, ; vs. ;) are an issue not taken lightly in the world of web security (cf. domain names). I haven't found much information about whether this has been considered a major security issue in programming languages, but I would think so (one can imagine such characters being introduced to a codebase subtly over time, hiding malicious functionality).</div><div class=""><br class=""></div><div class="">It'd be pretty cool if Swift could detect whether two identifiers might be confusable, and produce a warning.</div><div class=""><br class=""></div><div class="">&lt;<a href="http://www.unicode.org/reports/tr36/#Recommendations_General" class="">http://www.unicode.org/reports/tr36/#Recommendations_General</a>&gt; &nbsp;</div><div class="">&lt;<a href="http://unicode.org/reports/tr39/#Confusable_Detection" class="">http://unicode.org/reports/tr39/#Confusable_Detection</a>&gt;</div></div><div class=""><br class=""></div></div></div></div>
</div>
_______________________________________________<br class="">swift-evolution mailing list<br class=""><a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a><br class="">https://lists.swift.org/mailman/listinfo/swift-evolution<br class=""></div></blockquote></div><br class=""></body></html>