[swift-evolution] A path forward on rationalizing unicode identifiers and operators

Tue Oct 3 19:47:51 CDT 2017

> On Oct 2, 2017, at 10:07 PM, Chris Lattner via swift-evolution <swift-evolution at swift.org> wrote:
> 
> On Oct 2, 2017, at 9:12 PM, David Sweeris via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> Keep in mind that Swift already goes far above and beyond in terms of operators
>> Yep, that's is a large part of why I'm such a Swift fan :-D
> 
> Fortunately, no one is seriously proposing a major curtailing of the capabilities here, we’re just trying to rationalize the operator set, which is a bit of a mess at present.
> 
>>> in that: (a) it allows overloading of almost all standard operators; (b) it permits the definition of effectively an infinite number of custom operators using characters found in standard operators; (c) it permits the definition of custom precedences for custom operators; and (d) it additionally permits the use of a wide number of Unicode characters for custom operators. Most systems programming languages don't even allow (a), let alone (b) or (c). Even dramatically curtailing (d) leaves Swift with an unusually expansive support for custom operators.
> 
>> Yes, but many of those custom operators won't have a clear meaning because operators are rarely limited to pre-existing symbols like "++++++++" (which doesn't mean anything at all AFAIK), so operators that are widely known within some field probably won't be widely known to the general public, which, IIUC, seems to be your standard for inclusion(?). Please let me know if that's not your position... I hate being misunderstood probably more than the next person, and I wouldn't want to be guilty of that myself.
> 
> The approach to operator handling in Swift is very intentional.  IMO, it is well known that:
> 
> 1) Operators can make code significantly easier to understand by reducing noise from complex expressions: writing x.matmul(y) is insane <https://www.python.org/dev/peps/pep-0465/> if you’re doing a lot of matrix multiplies.
> 2) Operators can be completely opaque to someone who doesn’t know them, and sometimes named functions are more clear.
> 3) Named functions can also sometimes be completely opaque if you don't know them, e.g. "let x = cholesky(y)"
> 4) Languages with fixed operator sets that also allow overloading (e.g. C++) end up with those operators being abused.
> 5) Some code can only be written and maintained by domain experts, and those experts often know the operators.

Well said!

I think comments about poorly chosen operator symbols (e.g. invisible or visual similar) are a bit of a red herring.  From a malicious angle, they’d rather overload a standard operator than introduce an exotic one which would draw more attention and doesn’t have pre-existing usage.  From a maintenance angle, choosing a poor operator symbol is akin to choosing a poorly named identifier.  That’s really for the users to figure out themselves, we shouldn’t try to legislate the equivalent of “no single letter variables”.

> Swift’s approach is basically to say to users: “ok we allow overloaded operators, but at least if you encounter some operation that you don’t know… you know that you don’t know it”.  If you encounter "if ¬x {“  or “a ∩ b” in some source code, at least you can command click, jump to the definition and read what it does: you aren’t misled into thinking that the expression is some familiar thing, but find out later it was overloaded to do something crazy (bitshifts for i/o?  really??? :).

Exactly!  If someone has already decided they want an operator for something, better to let them have a choice of a new symbol rather than necessarily overloading one of the standard ones because we’ve restricted the set.  I think most of the bad reputation of custom operators is the surprising results of developers being forced to shoehorn the “standard” operators into new roles that confuse readers who think they know what an operator is doing.  E.g. it’s not the operator that’s as dangerous as the overloading.

> Set algebra is an illustrative example, because it is both used by people who are experts and people who are not.  As far as policies go, I think it makes sense for Swift libraries to define operator-like things as named functions (e.g. “intersection") and also define operators (“∩”) which can optionally be used in source bases that want them for convenience.  The compiler and language cannot know whether a code base is written and maintained by experts who know the symbols and who value their clarity (over the difficulty typing and recognizing them), and this approach allows maintainers of the codebase to pick their own policies.
> 
> I do think that Ethan’s suggestion upthread interesting, which suggest considering something like:
>    import matrixlib (operators: [ᵀ,·,⊗])
> 
> Three concerns I see:
>  - Requiring them today would be a source incompatibility with Swift 4

To clarify, I’m only suggesting the qualifier be required for “non-standard” operators, so the source incompatibility would be on par to whatever unicode cleanup is similarly reclassifying characters already in use.

In that vein, this suggestion would dovetail well with such a reclassification effort, as it would give an easy upgrade path for existing code that wants to continue using a particular character, and allows a fairly conservative set of “standard” operators to be whitelisted without sacrificing end-user expressibility, which simplifies the scope of the classification effort.

“Standard” operators could include sections of the mathematical plane even though they aren’t necessarily used by the standard library, if there is desire to reserve such characters exclusively for operators and never identifiers.  

>  - Multiple modules can define operators, unclear whether this refers to the operator decl or implementations of operators.

Hmm, how are conflicting operator declarations handled today? (e.g. different precedence, associativity for the same fixity?)

My thinking is import all declarations of that operator for a specified module (and so if the declaration isn’t imported, then implementations are hidden too).  You would have to specifically import the operator for each module that provides it.  If the user imports conflicting declarations it’s just the same result as today.

And by “all declarations of that operator” I mean if we have a matrix library that defines ᵀ for combinations of matrix, vector, sparse matrix, etc., then the single "import matrixlib (operator: ᵀ) ” statement makes all of those available since we should expect the module to be giving a consistent interpretation of that operator.  So in technical terms this is importing all declarations regardless of fixity, not sure if it’s worth getting more granular about importing just prefix but not infix.

Conversely, if the operator isn’t imported, then it’s as if those declarations were all internal to the module, and avoids any conflicts.

So if module A declares an operator ¬ and another module B uses that as identifier, then the client resolves this at import.  Either import ¬ from A and lose access to the identifier in B, or ignore the operator from A but retain access to the identifier in B.  (Hopefully rational symbol choices would make this a rare situation on par with other global namespace collisions, and good modules should provide less exotic interface fallbacks as well.)

>  - Imports are per-module, not per-source-file, so this couldn’t be used to “user-partition” the identifier and operator space.  It could be a way to make it clear that the user is opting into these explicitly.

Ahh nuts I actually thought imports were per-source-file! 🤦🏻‍♂️

So I guess a intra-module dependency for building the identifier/operator set is still too much a performance hit?  Parsing isn't already collecting all the imports from across the current module?

Well regardless, I’d be willing to live with repeating a per-file import statement for operator specification.  A little quirky that the operator attribute only has a file-level scope, but clearly I don’t mind respecifying imports in each file anyway (I kind of feel this is good form so you can move source files around and the dependencies come along.)

Alternatively, we could make a new per-file import specific for operators, orthogonal to module imports, although using similar syntax:
	import operator ᵀ
	import operator ·
	import operator ⊗

I thought about just using “import ᵀ”, but I don’t want to risk confusion with a module name.  Might be nice to pass a collection, but since we’re not doing that with module imports then don't start now.

These would be applied similar to previous proposal, but globally toggling operator visibility.  Basically just controls the operator character set and nothing more.  So implementation should be really simple, all imported operators declarations are already loaded as normal, but the compiler can only make the connection if the character was listed as an operator in the current file.  Initially I wanted an operator declaration in the current file to also serve as updating the character set so you don’t need both, but I see an argument to always require the import (for non-standard operators) just to surface guidance when an import will be needed to access that operator from elsewhere.

Does that help?  I liked having per-module control for conflict resolution and also auditing where operators come from, but (naively) this seems like a really simple implementation and if there is demand we could still add a syntax for module-specific filtering later.

-Ethan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171003/e94fca2c/attachment.html>