[swift-evolution] [Pitch] Synthesized static enum property to iterate over cases

Sat Sep 9 14:07:59 CDT 2017

On Fri, Sep 8, 2017 at 5:14 PM Xiaodi Wu <xiaodi.wu at gmail.com> wrote:

> On Fri, Sep 8, 2017 at 4:08 PM, Matthew Johnson via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>>
>> On Sep 8, 2017, at 12:05 PM, Tony Allevato <tony.allevato at gmail.com>
>> wrote:
>>
>>
>>
>> On Fri, Sep 8, 2017 at 9:44 AM Matthew Johnson <matthew at anandabits.com>
>> wrote:
>>
>>> On Sep 8, 2017, at 11:32 AM, Tony Allevato <tony.allevato at gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> On Fri, Sep 8, 2017 at 8:35 AM Matthew Johnson <matthew at anandabits.com>
>>> wrote:
>>>
>>>> On Sep 8, 2017, at 9:53 AM, Tony Allevato via swift-evolution <
>>>> swift-evolution at swift.org> wrote:
>>>>
>>>> Thanks for bringing this up, Logan! It's something I've been thinking
>>>> about a lot lately after a conversation with some colleagues outside of
>>>> this community. Some of my thoughts:
>>>>
>>>> AFAIK, there are two major use cases here: (1) you need the whole
>>>> collection of cases, like in your example, and (2) you just need the number
>>>> of cases. The latter seems to occur somewhat commonly when people want to
>>>> use an enum to define the sections of, say, a UITableView. They just return
>>>> the count from numberOfSections(in:) and then switch over the cases in
>>>> their cell-providing methods.
>>>>
>>>> Because of #2, it would be nice to avoid instantiating the collection
>>>> eagerly. (Also because of examples like Jonathan's, where the enum is
>>>> large.) If all the user is ever really doing is iterating over them,
>>>> there's no need to keep the entire collection in memory. This leads us to
>>>> look at Sequence; we could use something like AnySequence to keep the
>>>> current case as our state and a transition function to advance to the next
>>>> one. If a user needs to instantiate the full array from that sequence they
>>>> can do so, but they have to do it explicitly.
>>>>
>>>> The catch is that Sequence only provides `underestimatedCount`, rather
>>>> than `count`. Calling the former would be an awkward API (why is it
>>>> underestimated? we know how many cases there are). I suppose we could
>>>> create a concrete wrapper for Sequence (PrecountedSequence?) that provides
>>>> a `count` property to make that cleaner, and then have
>>>> `underestimatedCount` return the same thing if users passed this thing into
>>>> a generic operation constrained over Sequence. (The standard library
>>>> already has support wrappers like EnumeratedSequence, so maybe this is
>>>> appropriate.)
>>>>
>>>> Another question that would need to be answered is, how should the
>>>> cases be ordered? Declaration order seems obvious and straightforward, but
>>>> if you have a raw-value enum (say, integers), you could have the
>>>> declaration order and the numeric order differ. Maybe that's not a problem.
>>>> Tying the iteration order to declaration order also means that the behavior
>>>> of a program could change simply by reördering the cases. Maybe that's not
>>>> a big problem either, but it's something to call out.
>>>>
>>>> If I were designing this, I'd start with the following approach. First,
>>>> add a new protocol to the standard library:
>>>>
>>>> ```
>>>> public protocol ValueEnumerable {
>>>>   associatedtype AllValuesSequence: Sequence where
>>>> AllValuesSequence.Iterator.Element == Self
>>>>
>>>>   static var allValues: AllValuesSequence { get }
>>>> }
>>>> ```
>>>>
>>>> Then, for enums that declare conformance to that protocol, synthesize
>>>> the body of `allValues` to return an appropriate sequence. If we imagine a
>>>> model like AnySequence, then the "state" can be the current case, and the
>>>> transition function can be a switch/case that returns it and advances to
>>>> the next one (finally returning nil).
>>>>
>>>> There's an opportunity for optimization that may or may not be worth
>>>> it: if the enum is RawRepresentable with RawValue == Int, AND all the raw
>>>> values are in a contiguous range, AND declaration order is numerical order
>>>> (assuming we kept that constraint), then the synthesized state machine can
>>>> just be a simple integer incrementation and call to `init?(rawValue:)`.
>>>> When all the cases have been generated, that will return nil on its own.
>>>>
>>>> So that covers enums without associated values. What about those with
>>>> associated values? I would argue that the "number of cases" isn't something
>>>> that's very useful here—if we consider that enum cases are really factory
>>>> functions for concrete values of the type, then we shouldn't think about
>>>> "what are all the cases of this enum" but "what are all the values of this
>>>> type". (For enums without associated values, those are synonymous.)
>>>>
>>>> An enum with associated values can potentially have an infinite number
>>>> of values. Here's one:
>>>>
>>>> ```
>>>> enum BinaryTree {
>>>>   case subtree(left: BinaryTree, right: BinaryTree)
>>>>   case leaf
>>>>   case empty
>>>> }
>>>> ```
>>>>
>>>> Even without introducing an Element type in the leaf nodes, there are a
>>>> countably infinite number of binary trees. So first off, we wouldn't be
>>>> able to generate a meaningful `count` property for that. Since they're
>>>> countably infinite, we *could* theoretically lazily generate a sequence of
>>>> them! It would be a true statement to say "an enum with associated values
>>>> can have all of its values enumerated if all of its associated values are
>>>> also ValueEnumerable". But I don't think that's something we could have the
>>>> compiler synthesize generally: the logic to tie the sequences together
>>>> would be quite complex in the absence of a construct like coroutines/yield,
>>>> and what's worse, the compiler would have to do some deeper analysis to
>>>> avoid infinite recursion. For example, if it used the naïve approach of
>>>> generating the elements in declaration order, it would keep drilling down
>>>> into the `subtree` case above over and over; it really needs to hit the
>>>> base cases first, and requiring the user to order the cases in a certain
>>>> way for it to just work at all is a non-starter.
>>>>
>>>> So, enums with associated values are probably left unsynthesized. But
>>>> the interesting thing about having this be a standard protocol is that
>>>> there would be nothing stopping a user from conforming to it and
>>>> implementing it manually, not only for enums but for other types as well.
>>>> The potential may exist for some interesting algorithms by doing that, but
>>>> I haven't thought that far ahead.
>>>>
>>>> There are probably some things I'm missing here, but I'd love to hear
>>>> other people's thoughts on it.
>>>>
>>>>
>>>> There are some things I really like about this approach, but it doesn’t
>>>> quite align with a lot of the usage I have seen for manually declared
>>>> `allValues` pattern.
>>>>
>>>> One of the most common ways I have seen `allValues` used is as a
>>>> representation of static sections or rows backing table or collection
>>>> views.  Code written like this will take the section or item index provided
>>>> by a data source or delegate method and index into an `allValues` array to
>>>> access the corresponding value.  These methods usually access one or more
>>>> members of the value or pass it along to something else (often a cell)
>>>> which does so.
>>>>
>>>> If we introduce synthesis that doesn’t support this use case I think a
>>>> lot people will be frustrated so my opinion is that we need to support it.
>>>> This means users need a way to request synthesis of a `Collection` with an
>>>> `Int` index.  Obviously doing this solves the `count` problem.  The
>>>> collection would not need to be eager.  It could be implemented to produce
>>>> values on demand rather than storing them.
>>>>
>>>
>>> Great points! I was only considering the table view/section case where
>>> the enum had raw values 0..<count, but I do imagine it's possible that
>>> someone could just define `enum Section { case header, content, footer }`
>>> and then want to turn an IndexPath value into the appropriate Section.
>>>
>>> On the other hand, though, isn't that what raw value enums are for? If
>>> the user needs to do what you're saying—map specific integers to enum
>>> values—shouldn't they do so by giving those cases raw values and calling
>>> init?(rawValue:), not by indexing into a collection? Especially since they
>>> can already do that today, and the only thing they're missing is being able
>>> to retrieve the count, which a "PrecountedSequence" mentioned above, or
>>> something like it, could also provide.
>>>
>>>
>>> First, I’m making observations about what people are doing, not what
>>> they could do.
>>>
>>> Second, the raw value may not correspond to 0-based indices.  It might
>>> not even be an Int.  There is no reason to couple this common use case of
>>> `allValues` to `Int` raw values with 0-based indices.
>>>
>>
>> Do we know of any examples where a user is both (1) defining an enum with
>> integer raw values that are noncontiguous or non-zero-based and (2) need
>> declaration-ordinal-based indexing into those cases for other reasons, like
>> a table/collection view? I can't think of why someone would do that, but
>> I'm happy to consider something that I'm missing.
>>
>>
>> I don’t off-hand, but I don’t think the lack of example is a good
>> motivation for a solution that doesn’t directly address the most commonly
>> known use case for this feature.
>>
>>
>>
>>>
>>> Third, `init(rawValue:)` is a failable initializer and would require a
>>> force unwrap.  If the raw values *are* 0-based integers this is similar to
>>> the collection bounds check that would be necessary, but it moves it into
>>> user code.  People don’t like writing force unwraps.
>>>
>>
>> Yeah, this is a really good point that I wasn't fully considering. If
>> other invariants in the application hold—such as table view cell functions
>> never receiving a section index outside 0..<count—then unwrapping it just
>> forces users to address a situation that will never actually occur unless
>> UIKit is fundamentally broken.
>>
>>
>> Right, but the most crucial point is that it forces *user* to address
>> this.  They are not required to today.  It is handled by the bounds check
>> in Array.  This might sound like splitting hairs but I think there are a
>> lot of people who wouldn't view it that way.
>>
>>
>>
>>
>>>
>>>
>>> My main concern with providing a Collection with Int indices is that, at
>>> some fundamental/theoretical level, it feels like it only makes sense for
>>> enums with contiguous numeric raw values. For other kinds of enums,
>>> including those where the enum is just a "bag of things" without raw
>>> values, it feels artificial.
>>>
>>>
>>> Sure, that’s why I proposed a couple of options for addressing both use
>>> cases.  I think both have merit.  I also think we need to recognize that
>>> most people are asking for a replacement for manually writing a static
>>> array and won’t be satisfied unless we provide a solution where the
>>> synthesized property behaves similarly.
>>>
>>
>> Agreed—I just wanted to point out the distinction because an important
>> part of fleshing this out will be to partition the various "classes" of
>> enums into those that would receive an indexable Collection vs. those that
>> would receive just a Sequence.
>>
>>
>> I agree that it’s an important distinction.  To be honest, I’m not sure
>> there is a good way to solve both usages without introducing more
>> complexity than would be acceptable for something like this.  It might be a
>> problem better solved by macros or some other metaprogramming feature.  It
>> would be unfortunate to have to wait until we have those to solve this.
>> However, I don’t think it's an important enough problem to deserve a
>> solution with a lot of knobs and associated complexity.
>>
>
> Just wanted to chime in to say that I too very much like the opt-in nature
> of this `ValuesEnumerable` design proposed by Tony (modulo some
> bikeshedding about the name). Seems like Tony has thought about this very
> deeply and considered multiple angles. However, I do agree with Matthew and
> others that, at the end of the day, a custom sequence/collection solving
> multiple usages that we don't even know are in high demand seems to be
> overcomplicating things. Synthesized Equatable, Hashable, and Codable give
> people a simple answer to simple problems. Here, people just want an array
> of all cases. Give them an array of all cases. When it's not possible
> (i.e., in the case of cases with associated values), don't do it.
>

I want to push a little harder on the array part. To the people saying that
it should just be an array of values, is it *specifically* an array that
you want, or would any Int-indexable Collection do?

I think this is an important distinction because (1) if it's synthesized
via a standard library protocol, that protocol should be defined in general
terms, and (2) a user wanting to iterate over information that is already
known at compile-time should not have to suffer a slow runtime heap
allocation in order to do so. Even if the array is allocated once and
cached, and even if it's small in most cases, why duplicate information you
already have?

If the direction we want to go is to allow Int-indexing, then I believe a
custom Collection subtype would be just as usable for most people's needs,
it would afford the compiler optimization opportunities that a simple array
cannot, and if a user does need specifically an Array for some reason, they
can easily initialize one at the call site.

>
>>>
>>>> Of course there might be some cases where a manual implementation is
>>>> necessary but implementing `Collection` is not desirable for one reason or
>>>> another.  One way to solve both of these use cases would be to have a
>>>> protocol hierarchy but that seems like it might be excessively complex for
>>>> a feature like this.  Another way might be to take advantage of the fact
>>>> that in the use case mentioned above people are usually working with the
>>>> concrete type.  We could allow the compiler to synthesize an implementation
>>>> that *exceeds* the requirement of the protocol such that the synthesized
>>>> `AllValuesSequence` is actually a `Collection where Index == Int`.  I’m not
>>>> sure which option is better.
>>>>
>>>> I would also like to discuss enums with associated values.  It would
>>>> certainly be reasonable to disallow synthesis for these types in an initial
>>>> implementation.  I don’t know of any use cases off the top of my head
>>>> (although I expect some good ones do exist).  That said, I don’t think
>>>> synthesis would be prohibitive for enums with associated values so long as
>>>> the type of all associated values conforms to `ValueEnumerable`.  We should
>>>> probably support synthesis for these types eventually, possibly in the
>>>> initial implementation if there are no significant implementation barriers.
>>>>
>>>
>>> I mentioned some of those barriers above. One issue is that synthesizing
>>> the code to lazily (i.e., reëntrantly) generate a sequence whose elements
>>> are the Cartesian products of other sequences is non-trivial.
>>> (Coroutines/yield would make this a piece of cake.)
>>>
>>>
>>> The good news is that we might be in luck on this front in the Swift 5
>>> timeframe.  :)
>>>
>>
>> Fingers crossed! I'm not a concurrency expert by any means, so the most
>> exciting part of those new proposals to me is the side-effect that we might
>> get something like C# enumerators :)
>>
>>
>>
>>>
>>>
>>> The other is the issue with recursive enums, like the BinaryTree
>>> example, where the compiler has to know to synthesize them in a particular
>>> order or else it will recurse indefinitely before even producing its first
>>> value. However, this could be addressed by simply forbidding automatic
>>> synthesis of enums that have an indirect case, which is probably a
>>> reasonable limitation.
>>>
>>>
>>> Yeah, that seems like a reasonable limitation.
>>>
>>>
>>>
>>>
>>>>
>>>> That’s my two cents.
>>>>
>>>> - Matthew
>>>>
>>>>
>>>>
>>>> On Fri, Sep 8, 2017 at 3:40 AM Jonathan Hull via swift-evolution <
>>>> swift-evolution at swift.org> wrote:
>>>>
>>>>> +1000
>>>>>
>>>>> I once made a country code enum, and creating that array was simple,
>>>>> but took forever, and was prone to mistakes.
>>>>>
>>>>> Thanks,
>>>>> Jon
>>>>>
>>>>> > On Sep 8, 2017, at 2:56 AM, Logan Shire via swift-evolution <
>>>>> swift-evolution at swift.org> wrote:
>>>>> >
>>>>> > Googling ‘swift iterate over enum cases’ yields many results of
>>>>> various levels of hackery.
>>>>> > Obviously it’s trivial to write a computed property that returns an
>>>>> enum’s cases as an
>>>>> > array, but maintaining that is prone to error. If you add another
>>>>> case, you need to make sure
>>>>> > you update the array property. For enums without associated types,
>>>>> > I propose adding a synthesized static var, ‘cases', to the enum’s
>>>>> type. E.g.
>>>>> >
>>>>> > enum Suit: String {
>>>>> >    case spades = "♠"
>>>>> >    case hearts = "♥"
>>>>> >    case diamonds = "♦"
>>>>> >    case clubs = "♣"
>>>>> > }
>>>>> >
>>>>> > let values = (1…13).map { value in
>>>>> >    switch value {
>>>>> >    case 1: return “A”
>>>>> >    case 11: return “J”
>>>>> >    case 12: return “Q”
>>>>> >    case 13: return “K”
>>>>> >    default: return String(value)
>>>>> >    }
>>>>> > }
>>>>> >
>>>>> > let cards = values.flatMap { value in Suit.cases.map {
>>>>> “\($0)\(value)"  } }
>>>>> >
>>>>> > Yields [“♠A”, “ ♥ A”, …, “♣K”]
>>>>> > Thoughts?
>>>>> >
>>>>> >
>>>>> > Thanks!
>>>>> > - Logan Shire
>>>>> > _______________________________________________
>>>>> > swift-evolution mailing list
>>>>> > swift-evolution at swift.org
>>>>> > https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>>
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170909/26ecd3c5/attachment.html>