[swift-evolution] [Draft] Rename Sequence.elementsEqual

Tue Oct 17 16:41:53 CDT 2017

On Tue, Oct 17, 2017 at 14:41 Jonathan Hull <jhull at gbis.com> wrote:

> On Oct 17, 2017, at 11:46 AM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>
>
> On Tue, Oct 17, 2017 at 12:54 Jonathan Hull <jhull at gbis.com> wrote:
>
>> On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>
>>
>> On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull at gbis.com> wrote:
>>
>>> On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>>
>>>
>>> On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42 at icloud.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu at gmail.com>:
>>>>
>>>>
>>>> On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42 at icloud.com>
>>>> wrote:
>>>>
>>>>> Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution <
>>>>> swift-evolution at swift.org>:
>>>>>
>>>>>
>>>>> On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull at gbis.com> wrote:
>>>>>
>>>>>>
>>>>>> On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>>>>>
>>>>>> On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull at gbis.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
>>>>>>>
>>>>>>>>  That ordering can be arbitrary, but it shouldn’t leak internal
>>>>>>>>> representation such that the method used to create identical things affects
>>>>>>>>> the outcome of generic methods because of differences in internal
>>>>>>>>> representation.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  It would be better to say that the iteration order is
>>>>>>>>>> well-defined. That will almost always mean documented, and usually
>>>>>>>>>> predictable though obviously e.g. RNGs and iterating in random order will
>>>>>>>>>> not be predictable by design.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That's actually more semantically constrained than what Swift
>>>>>>>>>>> calls a `Collection` (which requires conforming types to be multi-pass
>>>>>>>>>>> and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly
>>>>>>>>>>> permits conforming single-pass, infinite, and/or unordered types.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think you’re talking about Sequence here, I’ve lost track of
>>>>>>>>>>> your nonsense by now. Yes, the current Swift protocol named Sequence allows
>>>>>>>>>>> unordered types. You seem to keep asserting that but not actually
>>>>>>>>>>> addressing my argument, which is *that allowing Sequences to be
>>>>>>>>>>> unordered with the current API is undesired and actively harmful, and
>>>>>>>>>>> should* *therefore** be changed*.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What is harmful about it?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> After thinking about it, I think the harmful bit is that
>>>>>>>>>> unordered sequences are leaking internal representation (In your example,
>>>>>>>>>> this is causing people to be surprised when two sets with identical
>>>>>>>>>> elements are generating different sequences/orderings based on how they
>>>>>>>>>> were created).  You are correct when you say that this problem is even true
>>>>>>>>>> for for-in.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would not say it is a problem. Rather, by definition, iteration
>>>>>>>>> involves retrieving one element after another; if you're allowed to do that
>>>>>>>>> with Set, then the elements of a Set are observably ordered in some way.
>>>>>>>>> Since it's not an OrderedSet--i.e., order doesn't matter--then the only
>>>>>>>>> sensible conclusion is that the order of elements obtained in a for...in
>>>>>>>>> loop must be arbitrary. If you think this is harmful, then you must believe
>>>>>>>>> that one should be prohibited from iterating over an instance of Set.
>>>>>>>>> Otherwise, Set is inescapably a Sequence by the Swift definition of
>>>>>>>>> Sequence. All extension methods on Sequence like drop(while:) are really
>>>>>>>>> just conveniences for common things that you can do with iterated access;
>>>>>>>>> to my mind, they're essentially just alternative ways of spelling various
>>>>>>>>> for...in loops.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think an argument could be made that you shouldn’t be able to
>>>>>>>>> iterate over a set without first defining an ordering on it (even if that
>>>>>>>>> ordering is somewhat arbitrary).  Maybe we have something like a
>>>>>>>>> “Sequenc(e)able” protocol which defines things which can be turned into a
>>>>>>>>> sequence when combined with some sort of ordering.  One possible ordering
>>>>>>>>> could be the internal representation (At least in that case we are calling
>>>>>>>>> it out specifically).  If I had to say
>>>>>>>>> “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely
>>>>>>>>> be less surprised when it returns false even though setA == setB.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well, that's a totally different direction, then; you're arguing
>>>>>>>> that `Set` and `Dictionary` should not conform to `Sequence` altogether.
>>>>>>>> That's fine (it's also a direction that some of us explored off-list a
>>>>>>>> while ago), but at this point in Swift's evolution, realistically, it's not
>>>>>>>> within the realm of possible changes.
>>>>>>>>
>>>>>>>>
>>>>>>>> I am actually suggesting something slightly different.  Basically,
>>>>>>>> Set and Dictionary’s conformance to Collection would have a different
>>>>>>>> implementation.  They would conform to another protocol declaring that they
>>>>>>>> are unordered. That protocol would fill in part of the conformance to
>>>>>>>> sequence/collection using a default ordering, which is mostly arbitrary,
>>>>>>>> but guaranteed to produce the same ordering for the same list of elements
>>>>>>>> (even across collection types).  This would be safer, but a tiny bit slower
>>>>>>>> than what we have now (We could also potentially develop a way for
>>>>>>>> collections like set to amortize the cost). For those who need to recover
>>>>>>>> speed, the new protocol would also define a property which quickly returns
>>>>>>>> a sequence/iterator using the internal ordering (I arbitrarily called it
>>>>>>>> .arbitraryOrder).
>>>>>>>>
>>>>>>>> I believe it would not be source breaking.
>>>>>>>>
>>>>>>>
>>>>>>> That is indeed something slightly different.
>>>>>>>
>>>>>>> In an ideal world--and my initial understanding of what you were
>>>>>>> suggesting--Set and Dictionary would each have a member like `collection`,
>>>>>>> which would expose the underlying data as a `SetCollection` or
>>>>>>> `DictionaryCollection` that in turn would conform to `Collection`;
>>>>>>> meanwhile, Set and Dictionary themselves would not offer methods such as
>>>>>>> `prefix`, or indexing by subscript, which are not compatible with being
>>>>>>> unordered. For those who want a particular ordering, there'd be something
>>>>>>> like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
>>>>>>> {Set|Dictionary}Collection`.
>>>>>>>
>>>>>>> What you suggest here instead would be minimally source-breaking.
>>>>>>> However, I'm unsure of where these guarantees provide benefit to justify
>>>>>>> the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
>>>>>>> still yields an arbitrary result which doesn't make sense for something
>>>>>>> _unordered_. We *could* have an underscored customization point named
>>>>>>> something like `_customOrderingPass` that is only invoked from
>>>>>>> `elementsEqual` or other such methods to pre-rearrange the internal
>>>>>>> ordering of unordered collections in some deterministic way before
>>>>>>> comparison. Is that what you have in mind?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Something like that.  Whatever we do, there will be a tradeoff
>>>>>>> between speed, correctness, and ergonomics.
>>>>>>>
>>>>>>> My suggestion trades speed for correctness, and provides a way to
>>>>>>> recover speed through additional typing (which is slightly less ergonomic).
>>>>>>>
>>>>>>
>>>>>> You haven't convinced me that this is at all improved in
>>>>>> "correctness." It trades one arbitrary iteration order for another on a
>>>>>> type that tries to model an unordered collection.
>>>>>>
>>>>>>
>>>>>>> We could do something like you suggest. I don’t think the method
>>>>>>> would need to be underscored… the ordering pass could just be a method on
>>>>>>> the protocol which defines it as unordered.  Then we could provide a
>>>>>>> special conformance for things where order really matters based on
>>>>>>> adherence to that protocol.  That might be an acceptable tradeoff.  It
>>>>>>> would give us speed at the cost of having the correct implementation being
>>>>>>> less ergonomic and more error prone (you have to remember to check that it
>>>>>>> is unordered and call the ordering method when it mattered).
>>>>>>>
>>>>>>> I’d still be a bit worried that people would make incorrect generic
>>>>>>> algorithms based on expecting an order from unordered things, but at least
>>>>>>> it would be possible for them check and handle it correctly.  I think I
>>>>>>> could get behind that tradeoff/compromise, given where we are in the swift
>>>>>>> process and Swift's obsession with speed (though I still slightly prefer
>>>>>>> the safer default).  At least the standard library would handle all the
>>>>>>> things correctly, and that is what will affect the majority of programmers.
>>>>>>>
>>>>>>
>>>>>> What is an example of such an "incorrect" generic algorithm that
>>>>>> would be made correct by such a scheme?
>>>>>>
>>>>>>
>>>>>> To start with, the one you gave as an example at the beginning of
>>>>>> this discussion: Two sets with identical elements which have different
>>>>>> internal storage and thus give different orderings as sequences.  You
>>>>>> yourself have argued that the confusion around this is enough of a problem
>>>>>> that we need to make a source-breaking change (renaming it) to warn people
>>>>>> that the results of the ‘elementsEqual’ algorithm are undefined for sets
>>>>>> and dictionaries.
>>>>>>
>>>>>
>>>>> No, I am arguing that the confusion about ‘elementsEqual’ is foremost
>>>>> a problem with its name; the result of this operation is not at all
>>>>> undefined for two sets but actually clearly defined: it returns true if two
>>>>> sets have the same elements in the same iteration order, which is a
>>>>> publicly observable behavior of sets (likewise dictionaries).
>>>>>
>>>>>
>>>>> But it is a behavior which has absolutely no meaning at all because
>>>>> the order does not depend on the elements of the set but on the history of
>>>>> how the set has been reached its current state.
>>>>> So why should I ever use this method on a set?
>>>>> What is the use case?
>>>>>
>>>>
>>>> One example: you can use it to check an instance of Set<Float> to
>>>> determine if it has a NaN value. (The “obvious” way of doing it is not
>>>> guaranteed to work since NaN != NaN.)
>>>>
>>>>
>>>> How would I do that? I'd rather expect to use a property isNaN on Float
>>>> to do that.
>>>>
>>>
>>> set.elementsEqual(set)
>>>
>>>
>>> I see why that would work (thanks to Set being a collection vs a
>>> sequence), but it still feels like a hack.  I definitely wouldn’t want to
>>> be maintaining code with that in it. Especially when compared to something
>>> like:
>>>
>>> set.contains(where: {$0.isNaN})
>>>
>>
>> The purpose of protocols is to enable useful generic code. You cannot use
>> isNaN for code that works on generic Collection, or for that matter, even
>> code that works on Set where Element : Numeric.
>>
>> Much generic code (for example, generic sorting) easily blows up when
>> encountering NaN. If you want an algorithm that is robust enough to handle
>> (or at least reject) that scenario, then you will need to check for it. It
>> is not a “hack” but a very common and accepted way of determining the
>> presence of NaN.
>>
>>
>> Just because a hack is commonly accepted or common doesn’t mean it isn’t
>> a hack.
>>
>
> It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to
> itself. Asking whether a value is reflexively equivalent to itself is
> guaranteed to be sufficient to detect the presence of NaN in all
> IEEE-compliant contexts, which is to say all past, present, and future
> versions of Swift.
>
>
> The problem isn’t Nan != NaN, but the fact that you are comparing T != T
> (where T may not even be a floating point thing) and assuming it means NaN.
>

I’m not assuming that at all; the example I gave was the use case of
generic sorting, where NaN is problematic *because* NaN != NaN. In a
generic context, I would want to detect the case where *any value*, NaN or
not, is not equivalent to itself. NaN is just a good example, because it is
a value of a built-in type that exhibits this behavior.

Using elementsEqual in this way (in a generic context which may or may not
>> involve floats) is definitely a hack, and is likely to bite you at some
>> point.
>>
>
> How is it definitely a hack when it is blessed by IEEE, and how can it
> bite me?
>
>
> Because it may return false for reasons other than NaN.
>

Which, again, is fine. If I cared about NaN specifically, then I must write
a generic algorithm that’s constrained to floating point.

I don’t see why a non-source-breaking change is suddenly off-limits.
>>>>>>
>>>>>> But more than that, any generic algorithm which is assuming that the
>>>>>> sequence is coming from an ordered source (i.e. many things using
>>>>>> first/last).  Some uses of first are ok because the programmer actually
>>>>>> means ‘any’, but anywhere where they actually mean first/last may be
>>>>>> problematic.
>>>>>>
>>>>>
>>>>> Such as...?
>>>>>
>>>>> Currently, there is no way to test for ordered-ness, so there is no
>>>>>> way for even a careful programmer to mitigate this problem.  By adding a
>>>>>> protocol which states that something is unordered, we can either branch on
>>>>>> it, or create a separate version of an algorithm for things which conform.
>>>>>>
>>>>>
>>>>> It is clearly the case that Swift’s protocol hierarchy fits sets and
>>>>> collections imperfectly; however, it is in the nature of modeling that
>>>>> imperfections are present. The question is not whether it is possible to
>>>>> incur performance, API surface area, and other trade-offs to make the model
>>>>> more faithful, but rather whether this usefully solves any problem. What is
>>>>> the problem being mitigated? As I write above, Swift’s Set and Dictionary
>>>>> types meet the semantic requirements for Collection and moonlight as
>>>>> ordered collections. What is a generic algorithm on an ordered collection
>>>>> that is  “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said,
>>>>> is not such an example.)
>>>>>
>>>>>
>>>>> On the contrary, `elementsEqual` is exactly such an example, because
>>>>> it makes no sense to use it on a Set.
>>>>>
>>>>> let s1 = Set([1,2,3,4,5,6])
>>>>> let s2 = Set([6,5,4,3,2,1])
>>>>>
>>>>> Both sets have different iteration orders. Comparing those sets with
>>>>> some other collection using `elementsEqual` will give no meaningful result
>>>>> because the order - and therefore the result of `elementsEqual` - is in
>>>>> effect random.
>>>>>
>>>>
>>>> No, it is not such an example; it’s misleadingly named but works
>>>> correctly—that is, its behavior matches exactly the documented behavior,
>>>> which relies on only the semantic guarantees of Sequence, which Set
>>>> correctly fulfills.
>>>>
>>>>
>>>> Fulfills to the letter. Again, what can you do with it if the result is
>>>> random??
>>>>
>>>
>>> The result is not random.
>>>
>>>
>>> It is undefined though.  As you said earlier, by the guarantees we have
>>> been given, it may shift over different builds/runs of a program.  Thus in
>>> one run, it might return true and then false in another (without changing
>>> our code).  As Greg pointed out, it is also possible with the guarantees we
>>> are given, for set/dict to have different orderings with copies of
>>> themselves. (It will happen for sure when deep copying a dictionary with
>>> reference-type keys).
>>>
>>
>> As I wrote to Greg, if that is a possibility for Set, then the
>> implementation is problematic for conformance to Collection and needs to be
>> re-examined.
>>
>>
>> So why not fix both things at once with a single change?
>>
>>
>> I’m not sure why you claim the order of a dictionary will definitiely
>> change on deep copying. But in any case, we are not talking about deep
>> copying here, or at least, I’m not; we’re talking about the notional
>> copying involved in the semantics of passing a set as an argument.
>>
>>
>> Because the ordering of some dictionaries depends on the memory address
>> of the key.  There is no way to copy the key in those cases without
>> changing the ordering.  You can watch their ordering shift in the debugger
>> from run to run.
>>
>
> I’m fine with that. Instances of reference types have identity. A deep
> copy of a dictionary *has different keys* from the original. This may not
> be so from the perspective of a custom == operator, but it most certainly
> is so from the perspective of reference type semantics. So to me it is fine
> (correct, even?) that a deep copy of a dictionary has a different iteration
> order—it has different keys.
>
> Again, this is something entirely different from notional copies made in
> the course of passing values as arguments.
>
> As I keep saying, relying on the behavior of a leaking internal
>>> implementation is a bad plan.
>>>
>>
>> Again, it is not a leaking internal implementation. It is a public API
>> guarantee.
>>
>>
>> The public API guarantee is that there will be some ordering that is
>> stable for a particular instance on a particular run (as long as it is not
>> mutated/copied).  The actual ordering is undefined.
>>
>> Any generic code which depends on a particular ordering will have output
>> which is undefined.
>>
>
> Generic code is incorrect that “depends on a particular ordering” because
> it assumes semantics that are not guaranteed.
>
> Just because a function returns a seemingly predictable value doesn’t mean
>> it is entirely deterministic.  There are lots of vulnerabilities in C/C++
>> due to reliance on undefined behavior.
>>
>>
>> We should add an additional guarantee to set/dict that the order returned
>>> will be the same for the same contents regardless of history (but can be
>>> otherwise arbitrary).  That will fix the behavior for algorithms like
>>> elementsEqual (i.e. They will return the same result across builds/runs).
>>> It will also implicitly provide as a result, the constraint you were
>>> arguing is needed across copies of a collection type.  I agree that that is
>>> an important guarantee.  Why not fix both issues with a single
>>> non-source-breaking change?
>>>
>>
>> As I wrote, the behavior of elementsEqual is not broken and does not
>> require fixing.
>>
>>
>> Your argument for that is tautological though.  You could argue for
>> keeping any bug in the codebase using the same reasoning.
>>
>> Why was elementsEqual created?  Isn’t it meant to check equality of two
>> sequences in a generic context where == isn’t available?
>>
>
> No no no no no no no no. That’s precisely why the name is misleading.
> elementsEqual is *not* simply a mixed-type version of ==. Remember that ==
> as implemented on a concrete sequence type *has no obligation* to use
> elementwise comparison using the element’s implementation of ==. This is
> not merely theoretical: [Float].== does *not* do an elementwise comparison
> using Float.==. By contrast, you are guaranteed a true elementwise
> comparison with elementsEqual regardless of how equivalence is defined for
> the sequence.
>
>
> elementsEqual shouldn’t fully imply == of the original type (only that the
> elements are “equivalent in the same order” which is what it means for
> generic sequences to be equivalent).
>
> But ‘==‘ does NEED to imply elementsEqual.  Otherwise you break the
> meaning of ‘=='
>

No, this is not required. For example:

+0.0 == -0.0 but +0.0.sign != -0.0.sign

Equivalence for the purposes of “==“ does *not* require substitutability
for the purposes of all public APIs. The conforming type gets to decide
what aspects are “salient.”

Again, imperfections of modeling and all that.

>
> Isn’t it problematic that two equivalent sets may not be equivalent in a
>> generic context?
>>
>
> Again, this shows why the name is unfortunate. elementsEqual is not
> supposed to be a generic version of ==, and its name is misleading. Repeat:
> elementsEqual is *not* an alternative for == in either the generic or
> concrete context. Neither one implies the other.
>
>
> What are you basing this very strong belief on?  Has the core team stated
> that elementsEqual should not be used to check the equality of sequences?
>

See Michael Ilseman’s reply for more information on the difference between
the two.

> elementsEqual can only provide a notion of equivalence within the generic
> context, of course (and not equivalence of the original types overall), but
> it is still a notion of equivalence within the context.
>

No, it is **not** a notion of equivalence of the sequence type, but of the
elements and their iteration order. Again, elementsEqual is poorly named
because **it is not a test of equivalence of the receiver to the argument
in any context whatsoever.**

As I said above (and in another email), ‘==‘ has to imply elementsEqual.
> Otherwise you break substitutability.
>

Again, substitutability for the purposes of == does *not* require
substitutability with respect to all public APIs.

What is the benefit of the guarantee you propose that justifies the
>> performance cost?
>>
>>
>> The benefit is that generic algorithms which rely on ordering of
>> unordered things would be much more robust.  It would solve your copy issue
>> as well.
>>
>> You are acting like there is a huge performance cost. There isn’t. For
>> Set, it would still have the traditional performance characteristics for
>> Sets. Any slowdown (by un-cutting corners) would be amortized over
>> insertions. (note: I am talking option #3 from my summary here, not #5,
>> which was the one with an actual performance cost)
>>
>
> I see you skipped responding to this.
>

It’s kind of beyond the scope of the discussion. What’s not disputed is
that there is a performance cost. The question as to whether the degree of
performance hit can be justified by the improvement in how “robust” the
order-dependent methods on Set become presumes that it is at all a goal to
make them “robust” in any way; I disagree. The reason Set has
order-dependent methods is that it is a necessary consequence of supporting
sequential iteration in a protocol-oriented way. It is currently documented
that a user should choose Set only if they explicitly do not care about the
order of iteration. I see no need to parse degrees of not caring.

Why is the source-breaking change of renaming things better?
>>>
>>
>> Because, in my analysis, the problem is that the method is incorrectly
>> named. The problem affects all types that conform to Sequence and not just
>> Set and Dictionary; elementsEqual is a distinct function from ==, and it
>> must either continue to be distinct or cease to exist, but its name does
>> nothing to clarify any distinction.
>>
>>
>> It also won’t fix the original problem you mention as motivation.  People
>> will still be surprised at equivalent sets having different iteration
>> orders regardless of what the function is named.
>>
>
> That is not the motivating problem. The motivating problem is that people
> think that elementsEqual is semantically equivalent to ==. It is a distinct
> method that *does not test* equivalence of two sequences.
>
>
> The motivating problem you originally gave was that people are surprised
> when Set([1,2,3]).elementsEqual(Set[3,2,1]) returns false.  You believe it
> is because they don’t get that elementsEqual works sequentially (on
> Sequence),
>

No, I believe it is because people believe that it is a test of equivalence
of two sets, which it is not.

and I say it is because people expect equivalent sets to produce equivalent
> element orders.
>

The documentation for Set makes it clear that users of the type should have
no such expectation.

> Thanks,
> Jon
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171017/e4b6ca8d/attachment.html>