[swift-dev] [swift-evolution] Re-pitch: Deriving collections of enum cases

Sun Nov 12 14:37:48 CST 2017

On Sun, Nov 12, 2017 at 1:37 PM, Tony Allevato <tony.allevato at gmail.com>
wrote:

>
>
> On Sat, Nov 11, 2017 at 1:53 PM Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>
>> On Sat, Nov 11, 2017 at 3:15 PM, Tony Allevato <tony.allevato at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Nov 11, 2017 at 10:28 AM Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>>
>>>> On Sat, Nov 11, 2017 at 11:23 AM, Tony Allevato <
>>>> tony.allevato at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Nov 10, 2017 at 11:01 PM Xiaodi Wu via swift-evolution <
>>>>> swift-evolution at swift.org> wrote:
>>>>>
>>>>>> Nit: if you want to call it `ValueEnumerable`, then this should be
>>>>>> `DefaultValueCollection`.
>>>>>>
>>>>>> More generally though, I can see the appeal of allowing `Int` to
>>>>>> conform to `ValueEnumerable`, but I've got a feeling that this is headed
>>>>>> rapidly in the direction of overengineering a feature without a good
>>>>>> rationale for the additional complexity. The stated use case is to
>>>>>> enumerate the cases of an enum, and any additional complexity above that
>>>>>> should stand on its own merit. I disagree quite vehemently that protocols
>>>>>> should be "as general as possible"; rather, they exist to enable useful
>>>>>> generic algorithms and should be as _useful_ as possible. There is a happy
>>>>>> medium beyond which overengineering the design makes a protocol markedly
>>>>>> less usable/approachable for the sake of enabling rare functionality.
>>>>>>
>>>>>
>>>>> Perhaps "no more specific than they need to be" would have been a
>>>>> better choice of words on my part than "as general as possible".
>>>>>
>>>>> I'm not sure why you think this decision makes the protocol "markedly
>>>>> less usable/approachable". and I think you're seeing a lot of complexity
>>>>> that isn't there. Regarding good rationale, I've given that in this thread
>>>>> above and in the previous discussion thread, but I'll summarize it again
>>>>> here:
>>>>>
>>>>> 1) "The stated use case is to enumerate the cases of an enum" isn't
>>>>> sufficient for designing a protocol because Swift does not have protocols
>>>>> that are restricted only to enums. Just as anyone can conform their own
>>>>> types to RawRepresentable, anyone should be able to conform their own types
>>>>> to ValueEnumerable.
>>>>>
>>>>
>>>> This is the part of the argument that I am questioning. If the desired
>>>> use case is to enumerate the cases of an enum, is it much of a gain to
>>>> allow anyone to be able to conform their own non-enum types to this
>>>> protocol? Yes, if you buy that, then much of the rest follows. But make no
>>>> mistake that it is not necessary to serve the desired use case and
>>>> dramatically increases the complexity of the design (see below for just
>>>> some considerations).
>>>>
>>>
>>> Some increase in complexity is certainly going to be introduced by
>>> generalizing a design to make it useful to more than just a single use
>>> case, but I hope you understand that using unquantifiable and subjective
>>> qualifiers like "markedly less usable" or "dramatically increases the
>>> complexity" is a bit unhelpful when trying to debate specifics. You clearly
>>> put a great deal of thought into design problems like this and I always
>>> appreciate your points of view, so I want to focus on the objective parts.
>>>
>>> In prior threads on this topic, I believe I recall you saying that a
>>> static property that returns an array is fine. (I don't remember your exact
>>> words, so please correct me if I'm wrong. I don't want to misstate your
>>> position.) Given the concerns I've pointed out about why that is both a
>>> time and memory performance problem, and given your focus on the common use
>>> case above, I'd appreciate your opinion on these points:
>>>
>>> * Are you advocating for a magic protocol that *only* enums can conform
>>> to? (This is a different issue than whether the protocol is just a
>>> compiler-known protocol that it can synthesize.)
>>>     * If yes, why intentionally restrict it others from implementing the
>>> protocol? That's actually *more* complexity than allowing it, like we do
>>> with RawRepresentable. There's also no precedent for "a protocol that can
>>> only be applied to enums" in the Swift language. The ramifications of that
>>> decision introduce their own design complexity—you're just shifting it
>>> around, not avoiding it.
>>>     * If no, then I don't think we differ on this point.
>>>
>>
>> No magic. Consider: Does the compiler stop me from conforming `UIButton`
>> to `Error`? No. Can `UIButton` conform to `Error`? Semantically, no.
>>
>> I am arguing that the use case justifies a protocol called
>> "CaseEnumerable" with documented semantics such that only types with cases
>> (i.e., enums with one or more cases) would fulfill those semantics. Any
>> more generalization and we are designing without a justification. I'm
>> unsure what design complexity you anticipate with such a definition of
>> "CaseEnumerable."
>>
>
> I don't anticipate any design complexity with that definition. I also
> don't see where the design complexity allegedly exists with the more
> general version—you've claimed that it exists but haven't shown what it is
> or proved why it's harmful. The enum's implementation of the protocol would
> do the same thing you're describing regardless of the associated type—only
> the protocol would be more general, which would allow other non-enum types
> to represent their collection or sequence of (possibly infinite) values.
>

The complexity stems from exactly questions such as this; it is the
complexity of defining what the semantics should be such that types other
than enums may conform to the proposed protocol, and then the complexity of
generalizing return types, etc., to accommodate use cases such as possibly
infinite values, etc. Why are we doing this?

> You keep claiming that this is harmful, but I haven't seen any concrete
> reasons why—just that there is "no justification". Why is "no
> justification" sufficient for arguing against over-generalizing something
> but it's not sufficient for arguing against over-specializing something?
>

Every addition to the language needs to be justified; every design needs to
serve a purpose, solving real problems. It's not "generalizing" versus
"specializing." The question is, what is the simplest, most elegant, most
teachable change that can make possible the stated use case.

> For what it's worth, if enough people in the community think that
> constraining the associated type of this hypothetical protocol to a
> Collection type instead of Sequence is the way it should be, fine—I'd
> disagree that it's a necessary restriction (because I feel I've shown
> sufficiently that it doesn't harm the design), but it's not something that
> I would want to stand in the way of the feature as a whole.
>
>
>> * Do you think that the type of this property should still be an array,
>>> with all of the performance problems that it brings, or would you find a
>>> sufficiently array-like but more performant type to be fine?
>>>
>>
>> I have no problem with any suitable type if it can be justified, say, on
>> the grounds of performance or memory efficiency. I do have a problem if the
>> justification for a design is that it might be necessary to accommodate
>> some unspecified future use case with a currently non-existent type.
>>
>
> Then hopefully I've already shown that using a custom no-heap-allocation
> collection type in my previous messages on this topic is a performance win
> vs. allocating and populating an array.
>

I've got no problem with that.

>
>> And once that's allowed, they should be able to do so without imposing an
>>>>> overly restrictive API. What if a custom type has thousands of elements
>>>>> that are easily computed sequentially or from an index? Forcing an Array on
>>>>> the implementor is an unnecessary restriction and a burden. (More on this
>>>>> below.)
>>>>>
>>>>> 2) There is *no decrease in usability for the common use case *by
>>>>> choosing to make the protocol associated type a Sequence instead of a
>>>>> Collection or Array because the concrete type synthesized by the compiler *would
>>>>> be a more refined concrete type anyway.* In other words, anyone who
>>>>> writes "MyEnum.allValues" would get back an integer-indexable random access
>>>>> collection. I can imagine useful algorithms (more of an
>>>>> algebraic/set-theoretical nature) that could be written for types with
>>>>> countably infinite value sets. Why require breaking source/ABI
>>>>> compatibility later if we can avoid it now, at no cost for the common user?
>>>>>
>>>>> 2.1) Anyone who writes a generic algorithm constrained over
>>>>> ValueEnumerable *would* have to add an additional constraint of
>>>>> "where T: RandomAccessCollection, T.Index == Int". This is a slight
>>>>> increase in complexity, but I would argue that if someone is writing
>>>>> generic algorithms over collections (and that's what this is), then this is
>>>>> something they should already familiar with. I argue that that trade-off is
>>>>> acceptable—others might disagree.
>>>>>
>>>>> 3) There is *no good reason* for the compiler to perform slow memory
>>>>> allocation operations to shuffle around information that it *already
>>>>> knows* when it can just as easily provide a specialized compact value
>>>>> type that is far more time/space efficient. I should finish the prototype I
>>>>> was working on—the code synthesis is not markedly more difficult than it
>>>>> would be to write code that constructs an array.
>>>>>
>>>>> 4) Most users *think* they need an array when what they really want
>>>>> is to be able to get the count of elements, iterate over them, and possibly
>>>>> integer-index into them. The specialized collection type can do all of
>>>>> those things more efficiently.
>>>>>
>>>>> 4.1) If a user *truly* needs an array, they just write
>>>>> "Array(MyEnum.allValues)".
>>>>>
>>>>>
>>>>>
>>>>>> Along the lines of user ergonomics, I would advocate for as many
>>>>>> enums as possible to conform without explicit opt-in. It's true that we are
>>>>>> moving away from such magical designs, and for good reason, but the gain
>>>>>> here of enums Just Working(TM) for such a long-demanded feature has, I
>>>>>> would argue, more benefits than drawbacks. To my mind, the feature is a lot
>>>>>> like `RawRepresentable` in several ways, and it would be defensible for an
>>>>>> equal amount of magic to be enabled for it.
>>>>>>
>>>>>
>>>>> I tend to agree with this philosophically, but the same argument about
>>>>> a long-demanded feature Just Working™ could be/was made for
>>>>> Equatable/Hashable being implicit and we decided to make it opt-in instead.
>>>>> I'm worried that we're going down a road where "this set of protocols is
>>>>> opt-in and this set of protocols is always-on" and users have to remember
>>>>> which are which. (And in some cases, it's even more complex! Enums are
>>>>> always Eq/Hash if they have raw values or no associated values, for
>>>>> historical reasons but they're opt-in with associated values.)
>>>>>
>>>>
>>>> There are key differences here. Equatable and Hashable have extensive
>>>> semantic guarantees, and it is certainly not the case that a type with
>>>> members that are all Equatable fulfills those guarantees itself. It would
>>>> be undesirable to have implicit Equatable conformance because the compiler
>>>> cannot guarantee semantics and must not make such assumptions. (The
>>>> implicit conformance of certain enums to Equatable and Hashable is
>>>> interesting yet defensible because it relies on an aspect of those enums
>>>> not shared with other types, an aspect that can be profitably relied
>>>> upon--see below.) All enums that have one or more cases with no associated
>>>> values, by contrast, have an enumerable set of cases. This is something
>>>> knowable to the compiler.
>>>>
>>>
>>> I agree with this. I think an argument could be successfully made that
>>> since enums without associated values already conform to Equatable/Hashable
>>> implicitly, and enums with raw values conform to RawRepresentable
>>> implicitly, the same could be done for ValueEnumerable. But, I brought it
>>> up because it's easy to imagine that there would be concerns raised about
>>> adding more of these special cases to the language (which does add design
>>> complexity, right?).
>>>
>>> This goes to the issue above of whether it is desirable in the first
>>>> place to conflate enums that have enumerable cases with a more general
>>>> "ValueEnumerable"; the requested feature is to enumerate enum cases, and
>>>> having a set of unique cases is a feature of enums not shared with other
>>>> types. Inevitably, when you generalize, then you must ask what it means for
>>>> a non-enum type to have a set of enumerable case-like values. For example,
>>>> note that there is no particular need for an ordered collection if we are
>>>> considering enums (cf. previous conversations about supporting resilient
>>>> reordering of enum cases); by contrast, such a design would be highly
>>>> unintuitive for a range, which has an implicitly ordered set of values.
>>>> Since each enum case is guaranteed to be unique, and since Equatable and
>>>> Hashable conformance is implicitly synthesized in such a way that all cases
>>>> compare not equal to other cases, we could even use a Set for an enum's
>>>> `allCases`. Of course, to eliminate the potential need for allocating
>>>> memory, this could be a custom type with set-like semantics, but
>>>> nonetheless it allows us to guarantee that `Set(T.allCases).count ==
>>>> T.allCases.count`, which would be a nontrivial semantic requirement in the
>>>> more general case of custom `ValueEnumerable` conformance that the author
>>>> of the conforming type would need to consider--if we want to insist on that
>>>> semantic requirement at all.
>>>>
>>>
>>> These are great points! Regarding the ordering question—I agree that
>>> purely theoretically, the cases of an enum are just an unordered set of
>>> values. But, just as you are interested in meeting the needs of end users,
>>> so am I—and one of the situations where I commonly see this feature
>>> requested is to build UIs (menus, table view sections, etc.) from the cases
>>> of an enum. Any design of this feature that would require users to specify
>>> the ordering of their elements secondarily to the order they appear in
>>> source code would make this feature useless for them, and those are the
>>> people we're trying to help.
>>>
>>
>> I'm not sure I understand this use case. Are you saying that it is
>> desirable *not* to require users to specify an order for the options in
>> their UI? On the contrary, that would seem to be extremely brittle; as a
>> library author, I might rearrange the order of cases in source code--which
>> should be a completely transparent change--and this would break end users'
>> UI. If the final design for ABI stability accommodates resilient reordering
>> of cases, then the UI would change depending on which version of the
>> library is installed at any particular moment. I would imagine that we
>> should explicitly discourage any reliance on source code order of cases. If
>> a user wants to get an array, then that is simple enough, as you say.
>>
>
> What I'm saying is that users are working with cases that have an inherent
> order, and they encode that order in their source code so they don't want
> to repeat it. It's only brittle if you believe that users are likely to
> take information that has an inherent order based on the concepts they
> represent and randomly shuffle them in their source code—like "case
> thursday, monday, friday, sunday, ...".
>
> You've cited "the use case being requested" as a motivating factor for the
> design, so that means you have to consider *why* people are requesting it
> and *what* they want to do with it—not just "enumerate the cases of an
> enum", but the specific problem they're trying to solve when they ask for
> that. I just did a spot check of web search results, and combined with
> conversations I've had with colleagues on the same topic, it appears the
> people asking for this feature are primarily trying to do one of two things:
>
> 1) Enumerate cases that have a well-defined order for display in a UI.
> 2) Enumerate cases that don't have an inherent order, but where stability
> and predictability of the enumeration order is important (like generating
> the cartesian product of (suits x face-values) for a deck of cards).
>
> In both cases, users want to avoid writing boilerplate that maintains a
> separate collection of those values (which itself is also brittle if cases
> are added in the future, because even if usage sites like switches change,
> there is no compile-time checking that a manually maintained array of cases
> is exhaustive).
>

If the idea is to have a custom collection for memory and performance
reasons, then I take it as given that cases will be enumerated in ABI
order. In the ABI stability discussions, the question has arisen whether
_reordering_ of cases should be an ABI-breaking change, or whether
_renaming_ of cases should be an ABI-breaking change. If cases are stored
in source order, then reordering cases breaks ABI but renaming does not; if
cases are stored in alphabetical order, then renaming cases breaks ABI but
reordering does not. If I recall correctly, a greater number of people
thought that changing declaration order should not be an ABI-breaking
phenomenon, and that the compiler should disregard the order in which cases
are written in the source code. Suffice it to say that, should such a
design be adopted, then it would be inconsistent (and suboptimal for
performance) for this protocol to vend an API that leaks the source order.

> I suppose I'm having difficulty understanding where you stand on this
> topic because in one place you say "the feature should be designed this way
> because the intended use case is X" and then when we drill deeper into that
> use case, you say "people should not do X because it's brittle". It can't
> be both.
>

The stated use case in the proposal is, in its entirety, that enum cases
should be enumerable; I agree that it is a legitimate use case and am
exploring options for making that possible. The users you've polled seem to
have additional demands unstated in the proposal which actually do not jive
with those users who've weighed in on the ABI stability conversation. Put
another way, it seems like there is disagreement as to whether the source
order of enum cases has semantic significance. I am proceeding as though it
does not, but it appears that you think that it does. This would need to be
settled--probably as part of the ABI stability discussion--before we can
proceed here.

> So, I do accept an ordered collection as a compromise that makes the
>>> feature usable even at the cost of losing some amount of generality. But
>>> that's fine—all of these design problems are on a spectrum, from highly
>>> specific on one end to highly general on the other. The main thing I want
>>> to point out with this discussion is that we *can* make the design a bit
>>> more general without harming usability for the common use case *and* make
>>> it future-proof so we're not shutting out other potential use cases that
>>> are harder to fix due to source/ABI compatibility or adding odd special
>>> cases to the language.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20171112/9f68c040/attachment.html>