[swift-evolution] Resolving identifier vs. operator debates

Wed Oct 4 18:44:12 CDT 2017

Forking from discussion in “A path forward on rationalizing unicode identifiers and operators”, it was suggested to put this in a new thread.

Background:

Swift partitions the character set into operators and identifiers to aid in efficient parsing.  This has the unfortunate side effect that the language spec shoulders the burden of how to classify the thousands of unicode characters, and it must do so universally across all users and contexts.

There are many characters with ambiguous usage, such as denoting the transpose of matrix A as Aᵀ.  The notation is specifically using a superscript T, but this is also fundamentally a latin letter and the unicode code point is found in the phonetic extensions block, not the math symbols block.  In general, many symbols could refer either to an action (operator), or the result of that action (identifier), or have disparate domain-specific meanings.  Should the language spec really be in the business of deciding the ‘right’ use of each character like this?

I also assert a lot of the bad reputation of custom operators comes from languages which have limited operator character sets, which forces developers to overload standard operators with surprising effects, instead of choosing a symbol which is both unique and better recognized for the task at hand.  Allowing developers to choose apt operator symbols is akin to encouraging descriptive identifiers.  Writing good code is all about making these choices appropriately, and that requires context, which only the end developer has.

To be clear, this will most likely be relegated to niche applications serving domain experts. As established below the default behavior is to opt-out of exotic operator choices.  But given a user who wishes to do so, better to give them the right tools for the purpose.

Goals:

1. Performance: file-local operator decisions (don’t require loading all the imports first)
2. Maintenance: improve operator auditing/discoverability
3. Functionality: let users write what they want without lobbying this list
4. Well defined: aid in resolving conflicts between modules

Pitch:

Enable users to ‘import' specific operator symbols on a per-file basis, updating the operator set used for parsing that file.

In the simplest form this would look like:
	import operator ᵀ

This is only needed for “non-standard” operators.  But by providing this escape hatch, we can be conservative about choosing “standard” operators to a smaller, well known set and avoid a lot of debate without sacrificing expressibility.

When this import is encountered, then any matching operator declarations are made available simply because the character is interpreted as such.  (i.e. all modules’ operators are loaded as normal, but the compiler can only make the connection in files that opt-in to interpreting that character as an operator.)  Conversely, conflicting module identifiers become inaccessible following such an import, and hopefully good API would supply less exotic alternative interfaces for both cases.  Worst case the user could write an extension in a new file with the complementary character choice and remap offending operator/identifiers as they see fit.

Regarding operator declarations, one could suggest that the declaration itself could update the operator character set for that file.  However I suggest always requiring the import operator statement (for non-standard operators) partly to surface guidance when a choice of operator will require explicit imports from other files.  This also reduces potential for obfuscation by operators with visually similar representation, as the import list would draw attention to this chicanery.

Advanced Pitch:

The previous provides the “minimum viable product”, but we might like to take this a little further and make it module-specific:
	import matrixlib (operators: [ᵀ,·,⊗])

Again, only “non-standard” operators need to be listed, the “standard” operators would import the same as today.  But now as readers we can see where special operators are coming from, and potentially filter competing declarations from different modules.  I also like that an operator family can be listed on a single line rather than potentially a dozen lines covering various combinations.  A module vendor can concisely document its operator list and make it easy to maintain and discover.

This syntax mimics a module “init” call, which could be a powerful concept for future extensions.  For example, we could introduce “standardOperators: false” to disable the automatic import of standard operators overloads—which some users might appreciate regardless of character set issues.  (e.g. users could select between conflicting standard operators in different modules, or just peace of mind there’s no surprises.)

I anticipate this form would take a bit more work to implement, as Swift would need to filter of the visibility of operators per module based on the declarations in the current file.  However, these two versions can work together.  The first form provides a global import across modules, and the module-specific form can be added later.

What do people think?

Thanks,
 -Ethan