<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><div class="">The core team recently met to discuss PR609 - Refining identifier and operator symbology:</div><div class=""><a href="https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md" class="">https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md</a></div><div class=""><br class=""></div><div class="">The proposal correctly observes that the partitioning of unicode codepoints into identifiers and operators is a mess in some cases. It really is an outright bug for đ to be an identifier, but âšď¸ to be an operator. That said, the proposal itself is complicated and is defined in terms of a bunch of unicode classes that may evolve in the âwrong way for Swiftâ in the future.</div><div class=""><br class=""></div><div class="">The core team would really like to get this sorted out for Swift 5, and sooner is better than later :-). Because it seems that this is a really hard problem and that <a href="https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good" class="">perfection is becoming the enemy of good</a>, the core team requests the creation of a new proposal with a different approach. The general observation is that there are three kinds of characters: things that are obviously identifiers, things that are obviously math operators, and things that are non-obvious. Things that are non-obvious can be made into invalid code points, and legislated later in follow-up proposals if/when someone cares to argue for them.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">To make progress on this, we suggest a few separable steps:</div><div class=""><br class=""></div><div class=""><div class="">First, please split out the changes to the ASCII characters (e.g. . and \ operator parsing rules) to its own (small) proposal, since it is unrelated to the unicode changes, and can make progress on that proposal independently.</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Second, someone should take a look at the concrete set of unicode identifiers that are accepted by Swift 4 and write a new proposal that splits them into the three groups: those that are clearly identifiers (which become identifiers), those that are clearly operators (which become operators), and those that are unclear or donât matter (these become invalid code points).</div><div class=""><br class=""></div><div class="">I suggest that the criteria be based on <b class="">utility for Swift code</b>, not on the underlying unicode classification. For example, the discussion thread for PR609 mentions that the T character in â xáľ â is defined in unicode as a latin âletterâ. Despite that, its use is Swift would clearly be as a postfix operator, so we should classify it as an operator.</div><div class=""><br class=""></div><div class="">Other suggestions:</div><div class=""> - Math symbols are operators excepting those primarily used as identifiers like âalphaâ. If there are any characters that are used for both, this proposal should make them invalid.</div><div class=""> - While there may be useful ranges for some identifiers (e.g. to handle european accented characters), the Emoji range should probably have each codepoint independently judged, and currently unassigned codepoints should not get a meaning defined for them.</div><div class=""> - Unicode âfacesâ, âpeopleâ, âanimalsâ etc are all identifiers.</div><div class=""> - In order to reduce the scope of the proposal, it is a safe default to exclude characters that are unlikely to be used by Swift code today, including Braille, weird currency symbols, or any set of characters that are so broken and useless in Swift 4 that it isnât worth worrying about.</div><div class=""> - The proposal is likely to turn a large number of code points into rejected characters. In the discussions, some people will be tempted to argue endlessly about individual rejections. To control that, we can require that people point out an example where the character is already in use, or where it has a clear application to a domain that is known today: the discussion needs to be grounded and practical, not theoretical.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Third, if there is interest sometime in the future, we can have subsequent proposals that expand the range of accepted code points, motivated by the specific application domain that cares about them. These proposals will not be source breaking, so they can happen at any time.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Is anyone interested in helping to push this effort forward?</div><div class=""><br class=""></div><div class="">-Chris</div><div class=""><br class=""></div></body></html>