[swift-evolution] [Review] SE-0155: Normalize Enum Case Representation

Sun Apr 2 18:30:39 CDT 2017

Daniel Duan
Sent from my iPhone

> On Apr 1, 2017, at 11:49 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
> 
>> On Sun, Apr 2, 2017 at 1:03 AM, Daniel Duan <daniel at duan.org> wrote:
>> 
>>> On Apr 1, 2017, at 2:54 PM, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
>>> 
>>>> On Sat, Apr 1, 2017 at 3:38 PM, Daniel Duan <daniel at duan.org> wrote:
>>>> Thanks again for a detailed review. I have a few comments inline.
>>>> 
>>>>>> On Apr 1, 2017, at 9:50 AM, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>> 
>>>>>> 	• Does this proposal fit well with the feel and direction of Swift?
>>>>> 
>>>>> The "Pattern consistency" section does not align well with the feel and direction of Swift. Specifically, it does not explore some of the difficulties that arise from the proposed rules, adopts some of the same shortcomings that required revision for SE-0111, and deviates from some of the anticipated fixes for those shortcomings outlined in the core team's "update and commentary" to SE-0111.
>>>>> 
>>>>> It is not the case that the design proposed is "a consequence of no longer relying on tuple patterns," in that it is not the inevitable result that falls out of that decision. 
>>>> 
>>>> The text in this revision may be poorly phrased. The connection, as I pointed out in an previous thread, is that we need to define syntax for enum pattern matching because the one we’ve been using in Swift 3 is tuple pattern’s syntax, which is now distinct and separate.
>>> 
>>> What I'm saying here is that, although _some_ change becomes necessary, the particular changes proposed here are not themselves "a consequence of no longer relying on tuple patterns."
>>> 
>>> Put another way, given `enum E { case foo(bar: Int, baz: Int) }`, not being allowed to write `switch e { case foo(let a, let b): break }` is *not* an inevitable consequence of moving away from tuple patterns. Since the particular proposed changes break more existing source code than is strictly necessary for moving away from tuple-based pattern matching, those choices require stringent justification.
>>> 
>>>>> I will detail the alternative design that requires the fewest deviations or special rules, and breaks the least code extant today, later on. First, the shortcomings:
>>>>> 
>>>>> 1.
>>>>> The proposed rules for pattern matching are a source-breaking change, and are *not* the most minimal such change given the abandoning of tuples (see alternative below). However, the proposal does not engage with the core team's Swift 4 criteria for source-breaking changes with respect to the proposed "stricter rules" for pattern matching. There is no text at all about why specifically having the compiler encourage local _variable_ names to match argument labels resolves an active harm that outweighs the goal of preserving the greatest possible source compatibility.
>>>> 
>>>> With this proposal, user can still use local variable names. It is true that if there are many ways to achieve the same thing, the compiler would be encouraging user to do that thing. But that puts a cost on the compiler, new users and experienced readers in unfamiliar codebases. This is (albeit not to a satisfactory degree, it seems) pointed out in the motivation section. 
>>>> 
>>>> As for source compatibility, Swift 3 code should continue to work with warnings. Swift 4 mode would issue errors along with fix-its, which the migrator can leverage. Depends on core team/community’s implementor resource, there’s even a chance that this change would roll out one version later (warning in 4.X, error in 5.Y). In theory, the migration hurdle can be minimized.
>>> 
>>> Many syntactic changes can be migrated in this way, but for Swift 4, that would only be justified when the existing syntax meets a high bar for being harmful. Again, the overarching theme of my response is that I don't think the proposed "stricter rules" offer much more harm mitigation than significantly less source-breaking designs for pattern matching, and I don't see anything in the proposal text that discusses the issue or justifies the particular design over less source-breaking alternatives.
>>> 
>>>>> 
>>>>> OTOH, the proposal does outline a major use case for a local variable name that does not match the argument label: `param` vs `parameter`. Widely-respected style guides in various languages encourage unabbreviated and descriptive API names but much more concise local variable names. This is a legitimate and good practice being actively discouraged by the sugared rules.
>>>> 
>>>> This not a counterpoint, but I personally think using shortened names is not something to be encouraged. A (admittedly quirky) practice some of us inherited from the Cocoa style guideline is to use real, complete words for variable names. I’d like to think that The Swift API Design Guidelines are aligned in spirit on this matter - “clarity is more important than brevity”. (incidentally, the guidelines’s code samples don’t contain partial-word variables anywhere).
>>> 
>>> We're talking _local_ variables: local variables aren't API. There are many, many examples of single-letter variables in the design guidelines. For example, `x = y.union(z)` has three of them.
>>> 
>>>> 
>>>>> 
>>>>> This would be merely annoying and not harmful if we could guarantee that it only means the API user will have to use longer local names, but the natural impulse on the part of thoughtful API authors would be to limit the expressiveness of their labels to help out their users.
>>>>> 
>>>>> This puts API authors in an impossible bind: they need to choose labels that are not too short lest it collide frequently with existing local variable names (`x` and `y` would be suboptimal, for example, but there are good reasons why an associated value might have arguments labeled `x` and `y`), 
>>>> 
>>>> API authors are already in this impossible bind: whenever they export a type name, a method signature in an open class or a protocol, risk of collision come up.
>>> 
>>> Again, local variables aren't API. API authors have never been in this bind with respect to local variables. Nothing in the language has ever caused API to restrict the consumer's choice of local variable names. I think this is a highly, highly unusual rule.
>>>  
>> 
>> Local variable being the same as argument label, which is API, correct? I’m saying user’s local variables and types can collide with symbols from APIs. To illustrate, imagine implementing a protocol (yes, as simple as that):
>> 
>> protocol A {
>>     var answer: Int { get }
>>     func ask(_ question: String)
>> }
>> 
>> if blah() {
>>     let answer = 0
>>     let question = "huh?"
>> 
>>     class B: A {
>>         let answer = 42
>>         func ask(_ question: String) {
>>             // what's question and answer here?
>> 	    // what if you want to define a new type here with the name “A”?
>>         }
>>     }
>> }
>> 
>> `A` forced user to shadow their local variable (a collision!), so it’s wise for the user to pick some other variable name here. Why does seem so trivial and natural? Because it’s how API works: someone defines some symbol, you take them into your local scope and use them. The pattern matching rule proposed here is no different.
>> 
>>>> When a local variable does collide with a payload label, it would be bad if the user accidentally used the variable _in stead of_ the actual payload value. Forcing users to proactively rebind the variable would make them more mindful for this type of mistake.
>>> 
>>> What mistake do you have in mind? Currently, labels have nothing to do with variable names. How does a user accidentally use a label name instead of a variable name?
>> 
>> Looking at definition of an enum, user sees something like
>> 
>> enum SomeEnum {
>>     case aCase(veryMundaneName: Type) // substitute “veryMundaneName” with a common label, like “value” or “account"
>> }
>> 
>> What’s the value of `veryMundaneName` in a pattern matched black for `aCase`? The answer in Swift 3 is: no one knows! User may use this variable expecting it’s bind to the associated value because it’s natural given the context, and later find out that they’ve been using a variable from outside because the associated value is bond to something completely unrelated. 
>> 
>> 
>> Example: 
>> 
>> switch enumValue {
>> case aCase….:
>>   // many lines of code later…
>>   doThings(with: veryMundaneName) // bug!
>> }
>> 
>> Turns out, the bug is due to
>> 
>> let veryMundaneName: AType = getAMundaneValue()
>> // many lines of code later
>> switch enumValue {
>> case aCase(let randomLabelFreedomYay):
>>   // many lines of code later
>>   doThings(with: veryMundaneName) // bug!
>> }
>> 
>> This mistake seems silly, and is still a problem in the case of rebinding. But we can make it happen less.
> 
> I am not convinced this is an illustration of a bug related to enum cases in any real sense. You are invoking a function with one variable when you meant to invoke it with another. This can happen with any two variables in any scenario. I see no evidence that argument labels are any more prone to be confused for variable names than are case names, function names, or any other name. It is that Swift is strongly typed that makes confusion happen less, given that `veryMundaneName` is of type `AType` and `randomLabelFreedomYay` is of type `Type`.
>  
>>>> 
>>>>> but they also need to choose labels that are not too verbose. The safest bet in this case would be not to label at all, but then they lose the communicative aspect of argument labels (see point 2 below).
>>>> 
>>>> A more realistic version of the story: API author choose labels that make the most sense for the declaration and user accept the risk of collision as they use the API. Most of those who choose to skip labels would not have given this much thought about their effect at all.
>>>> 
>>>>> 
>>>>> 2.
>>>>> In the "update and commentary" revising SE-0111, it was acknowledged that "cosmetic" labels have a significant use case. Thus, the rules were changed to allow `(_ foo: Int, _ bar: Int) -> ()` to communicate to the reader of code that the first argument serves some purpose "foo" without forcing that name to be part of the API, pending further revisions.
>>>>> 
>>>>> Because enum cases are currently tuples, labels can be dropped freely, and therefore these labels are effectively "optional" parts of the API that can be seen by the user but, at their discretion, not used. That fulfills the use case of "cosmetic" labels. In this revised proposal, by requiring the argument label to be actually _written_ somewhere by the API user, it puts a dent into the legitimate use case of "cosmetic" labels.
>>>>> 
>>>>> That is to say, an API author who wishes to communicate something about a parameter by using a label must now also consider if that label is also appropriate as a variable name and must forgo its use if the label is not so appropriate. This is a very different decision-making process and it is being applied retroactively to previously designed APIs whose labels would have been (hopefully thoughtfully) chosen under very different circumstances.
>>>> 
>>>> This is something we never agreed on: SE-0111 is about functions. In some languages, patterns does resemble constructor functions, but that’s as much similarity as one can get anywhere. I still think applying every decision we made about functions to pattern matching is weird.
>>> 
>>> I have to admit, I still don't understand your reticence. The first part of your proposal aligns enum cases with functions. If we are to look for patterns in something that is spelled like a function, then it is natural for the pattern itself to be spelled like a function, no? Currently, in Swift 3, since we're trying to use pattern matching for a tuple, the pattern is spelled like a tuple. In my simplistic mind, if we're trying to use pattern matching for a $foo, the pattern should be spelled like a $foo. Far from being weird, to me that is the only possible intuitive syntax.
>>> 
>>>> But here’s my analysis anyways: the “cosmetic label” comment is about paving a way to restore expressivity of closures. It talks about the *interaction* between a function/closure’s declaration and use site — if parameter names are provided in a closure’s declaration, they should be required at invocation, similar to pre-SE-0111. IMO this proposal makes enum case and patterns closer to this goal.
>>> 
>>> I agree that your proposal does indeed get us closer to SE-0111. By requiring argument labels chosen by the API author to be written out by the user, we get closer to the goals of SE-0111. But SE-0111 also had a large drawback that required post-approval modification, which was that there ended up being no way to write "cosmetic labels," which both the community and core team agreed was an important use case.
>>> 
>>> With functions, that role can be filled with internal parameter names. This is what the "update and commentary" restored to SE-0111. With tuples, that role is filled by the labels themselves, because they can be ergonomically erased. With enum cases, you have not provided a parallel facility for cosmetic labels, because in your proposal labels can no longer be easily erased, but nor are there internal parameter names or some other substitute. I'm saying that we should learn from the problems discovered after SE-0111 was approved and fix that shortcoming for enum cases before this proposal is adopted. 
>> 
>> (Link to what we are talking about for the benefit of those reading along: https://lists.swift.org/pipermail/swift-evolution-announce/2016-July/000233.html)
>> 
>> The key distinction we need to decide here is whether case labels are “cosmetic”. We don’t allow declaration of separate parameter name and internal name for associated values. I interpret that as we are enforcing the syntax sugar in function declaration where user can use one symbol to represent both:
>> 
>> func f(x: Int) // is the same as func f(x x: Int)
>> 
>> It’s tempting to treat matching an enum value against a pattern as assigning a function value to a variable.
> 
> Sorry, I am not sure I understand this sentence.

Aka viewing the case pattern as simply an compound variable assignment as envisioned in the SE-0111 commentary. This way of the labels would be "cosmetic".

>  
>> If that’s what we are doing, it makes perfect sense to say we get “ultimate glory” here with patterns. Meaning, as you suggested, we consider the case labels “cosmetic”. It’s really just tho parameter name in a function (the first of the two “x” in code comment above.
>> 
>> But that’s kind of a stretch isn’t it? An enum value is very different compared to a function value. Yes, there happen to be a function that constructs this enum value that’s declared when user declare a case, that function gets as much resemblance as any other. But the enum value it self deserves more consideration. Telling the user “do these things that you do with a function value” just makes pattern matching harder to explain, because we are *not* assigning nor invoking function values.
> 
> Ah, I see. You think of the associated value as something distinct from the declaration used to initialize it. However, there is no spelling for an associated value other than what is used to initialize it. Given `case foo(bar: Int, baz: Int, boo: Int)`, previously, the full name of the case was `foo` and the associated value was `(bar: Int, baz: Int, boo: Int)`. Your proposal causes the full name of the case instead to be `foo(bar:baz:boo:)` and the associated value to be `(Int, Int, Int)`. Is that not your understanding of it?

Yes

> Pattern matching is just a matter of (a) indicating what case you want to identify with the pattern; and (b) what parts of the associated value you wish to match or to bind to variables. Part (a) is done by writing the name, either the base name or in full (i.e. either `foo` or `foo(bar:baz:boo:)`). Part (b) is done by writing `let myVariableName` in the intended positions.

What I left out is that the internal/parameter names of a function are non-optional part of its signature (one must use exact parameter names to implement a method in a protocol, for example). I prefer treating labels in case pattern matching the same way we treat parameter names in protocol method implementation (due to  the symmetry between constructing/deconstructing body mentioned in my previous comments).

>> That’s not to say we need totally distinct syntax. Deconstructing a value should visually relate to constructing it. So here’s how I think these two relate: a constructor is a function. Function signature has these arguments that the function refers to in its body. Pattern matching is the starting point of deconstructing a value. The scope created following it is the equivalent of a “body”, in which the associated values are used as “arguments”. Therefore it make sense to say that these labels are more like internal names (the 2nd “x” in the comment of the above sample).
>> 
>>>>> 3.
>>>>> The first part of the proposal aligns enum case syntax with functions. Functions often taken prepositions as argument labels, and indeed previous SE proposals have extended the rules to allow most words. However, `case foo(index: Int, in: T)` would have a disastrous label, as `in` would be a very annoying variable name whose use would be actively encouraged by the proposed sugared pattern matching rules.
>>>>> 
>>>>> The proposed rules for the sugared pattern would also require (well, greatly encourage) unique labels for each argument. This again is inconsistent with the naming conventions encouraged by the first part of the proposal aligning enum case syntax with functions, which have no such restrictions. If a user names something `case foo(point: T, point: T)`, then the matching rules would actively encourage an invalid redefinition of a variable named `point`.
>>>>> 
>>>>> (On the other hand, the API author does not have the luxury of naming the same case `foo(from point: T, to point: T)`, and even if they did, prepositions can make lousy local variable names--see first paragraph.)
>>>> 
>>>> I don’t see this as a problem for enum case authors. It just means the poor pattern writer needs to provide the positional information to disambiguate.
>>> 
>>>  What do you mean by "positional information" here?
>>> 
>>>>> 4.
>>>>> The proposal does not explore what happens when the proposed prohibition on "mixing and matching" the proposed sugared and unsugared pattern matching runs up against associated values that have a mix of labeled and unlabeled parameters, and pattern matching user cases where the user does not wish to bind all of the arguments.
>>>>> 
>>>>> Given `case foo(a: Int, String, b: Int, String)`, the only sensible interpretation of the rules for sugared syntax would allow the user to choose any name for some but not all of the labels. If the user wishes to bind only `b`, however, he or she will need to navigate a puzzling set of rules that are not spelled out in the proposal:
>>>>> 
>>>>> ```
>>>>> case foo(a: _, _, b: let b, _)
>>>>> // this is definitely allowed
>>>>> 
>>>>> case foo(a: _, _, b: let myVar, _)
>>>>> // this is also definitely allowed
>>>>> 
>>>>> // but...
>>>>> case foo(_, _, b: let myVar, _)
>>>>> // is this allowed, or must the user explicitly state and not bind `a`?
>>>>> 
>>>>> // ...and with respect to the sugared version...
>>>>> case foo(_, _, let b, _)
>>>>> // is this allowed, or must the user explicitly state and not bind `a`?
>>>>> ```
>>>>> 
>>>> 
>>>> Good point. To make up for this: `_` can substitute any sub pattern, which is something that this proposal doesn’t change but definitely worth spelling out.  
>>>> 
>>>>> 5.
>>>>> In the "update and commentary" revising SE-0111, the core team outlined a preferred path to restoring the full use of argument labels for functions without giving them type system significance. They gave a non-sugared form and a sugared form, both of which have met with approval from the community.
>>>>> 
>>>>> Briefly, the non-sugared form allows compound names to be used in variable names: `func foo(opToUse op(lhs:rhs:) : (Int, Int) -> Int)`. The first part of this proposal is consistent in that it removes the type system significance of argument labels from the associated values of enum cases, and considers them as part of the enum case name. It also stands to reason that, if a user were to match a case _without_ trying to bind any variables, the same syntax would have be used if the base name is ambiguous: `case elet(locals:body:): break`.
>>>>> 
>>>>> However, the proposal makes no provision for using that same compound name in pattern matching. There appears to be no particular reason for its isolated omission here, as `case elet(locals:body:)(let a, let b): return a * b` is readable and presents no syntactic difficulties. (Moreover, it is consistent with the syntax permitted in this proposal for initializing a variable: `let foo = Expr.elet(locals:body:)([], anExpr)`.)
>>>> 
>>>> Another good point. We can handle this in the purely additional proposal for compound variable names. I consider this not the 5th item in the list, but a separate suggestion, however :P
>>>> 
>>>>> 
>>>>> --- 
>>>>> 
>>>>> In light of these shortcomings, I would argue that the following alternative scheme is the most intuitive and consistent for pattern matching given the general agreement that enum case representation should be "normalized":
>>>>> 
>>>>> Given:
>>>>> 
>>>>> ```
>>>>> enum S {
>>>>>   case foo(bar: Int, baz: Int)
>>>>>   case foo(boo: String)
>>>>>   case bar(boo: String)
>>>>> }
>>>>> ```
>>>>> 
>>>>> a. As in functions after SE-0111, enum cases can be identified unambiguously, regardless of whether one is initializing a variable or matching a case, by their compound name, e.g. `bar(boo:)`. Where a case can be unambiguously identified with only the base name, that is an alternative spelling, e.g. `bar`. Where a case cannot be identified uniquely with the base name, then it is an error to try to use the base name alone: `case foo: break // error: unambiguous`.
>>>>> 
>>>>> b. As in functions after SE-0111, arguments can be passed in either a sugared form or an unsugared form, and they can be bound in a pattern matching statement in the same way. That is, `case foo(bar: let a, baz: let b): break` and `case foo(bar:baz:)(let a, let b): break` are equivalent.
>>>>> 
>>>>> c. As in functions, one cannot supply different or incorrect argument labels. That is, `case foo(baz: let a, bar: let b)` and `case foo(baz:bar:)(let a, let b)` are both forbidden. _This recovers the vast majority of the additional syntactic safety that is outlined in the revised proposal, but without the use of any special rules for pattern matching._
>>>>> 
>>>>> d. By composing rules (a) and (b), `case bar(let a)` is allowed as it is today, preserving source compatibility. However `case foo(let b, let c)` is not allowed, and _not_ because different local variable names are chosen, but because the enum has two cases named foo.
>>>> 
>>>> From a user’s point of view, there’s enough positional information in this pattern for the compiler to figure out which case it should match. This would be very unintuitive IMO.
>>> 
>>> Wait, the key point of your proposal, with its "stricter rules," is that labels shouldn't be optional even with sufficient positional information! That's also the whole thing above about getting us closer to aligning with SE-0111, etc.
>> 
>> Fair enough. The argument I invoked leads us to a dark path :P
>> 
>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170402/fd445a0b/attachment.html>