<div dir="ltr">On Sun, Apr 23, 2017 at 7:54 AM, Dave Abrahams <span dir="ltr"><<a href="mailto:dabrahams@apple.com" target="_blank">dabrahams@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-"><br>
on Sun Apr 23 2017, Xiaodi Wu <<a href="http://xiaodi.wu-AT-gmail.com" rel="noreferrer" target="_blank">xiaodi.wu-AT-gmail.com</a>> wrote:<br>
<br>
> On Sat, Apr 22, 2017 at 11:00 PM, Dave Abrahams <<a href="mailto:dabrahams@apple.com">dabrahams@apple.com</a>> wrote:<br>
><br>
>><br>
>> >> > That is to say, I would expect the standard library to supply an<br>
>> >> > alternative implementation of equality for Array<T where T :<br>
>> >> > FloatingPoint>.<br>
>> >><br>
>> >> And also for Dictionary? What do you expect to happen when Double is<br>
>> >> used as a dictionary key and it happens to be NaN?<br>
>> ><br>
>> > The proposal is very clear that `Dictionary` and `sort` will always use<br>
>> > level 2 comparison.<br>
>><br>
>> Yes, but I'm not asking about my own proposal :-). My question is, what<br>
>> are *your* expectations for dictionaries or sets, specifically for the<br>
>> insertion of NaN keys and for comparison of whole collections?<br>
><br>
> My expectations for comparison of collections clearly differ from<br>
> yours.<br>
<br>
</span>Maybe, in that you *have* strong expectations when it comes to FP. I'm<br>
still trying to figure out what mine should be.<br>
<span class="gmail-"><br>
> For me, I think it is a reasonable expectation that, for all `x` and<br>
> `y` where `x == y`, `[x] == `[y]`, `[[x]] == [[y]]`, `[x, x, x] == [y,<br>
> y, y]`, etc.<br>
<br>
</span>It seems pretty obvious when you put it that way. Here's an interesting<br>
question: how much of that is because of the way array literals let you<br>
“see through” the value being compared to its elements?<br></blockquote><div><br></div><div>I don't think it's the visual representation. More to do with expectations as to what an array is.</div><div><br></div><div>If `a == b`, then `thingWrappingA == thingWrappingB`, for values of "thing" that are "wrapper-like" (I will blithely leave the term undefined) rather than, say, stateful class instances.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
Also, does the same go for != ? That's telling w.r.t. NaN.<span class="gmail-"><br></span></blockquote><div><br></div><div>Yes, I think so. I'd have to think harder to be more sure of that.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> To put it more strongly, I think that anything short of that is rather<br>
> inexplicable.<br>
<br>
</span>That's somewhat mitigated if we have to accept x != x (i.e. for NaN).<br>
<span class="gmail-"><br>
> With respect to dictionaries keys or sets, it's a subtly different<br>
> question, I think. When it comes to questions like "does this dictionary<br>
> have this key" or "does this set already have this element," I think what<br>
> the user is really asking about is a closer approximation to identity than<br>
> to equality. I get that this gets complicated if we're talking about sets<br>
> of class instances, so let me move away from the word identity and<br>
> re-phrase.<br>
><br>
> If a user asks if a set (or even an array) contains something, it's<br>
> not exactly identical to asking if a set contains an element _equal<br>
> to_ something. See:<br>
><br>
> * "Does the set of cars in your parking lot contain my car?" =><br>
> `parkingLot.contains(myCar)`<br>
> * "Does the set of cars in your parking lot contain a car that is equal to<br>
> my car?" => `parkingLot.contains { $0 == myCar }`<br>
<br>
</span>I don't think this example is a good illustration of it, but I am<br>
familiar with the principle.<br>
<span class="gmail-"><br>
> Put another way, one can simultaneously hold the thought that (1) a thing<br>
> is itself, obviously; but (2) a thing is not necessarily _equal to_ itself.<br>
<br>
</span> x === y but x != y<br>
<br>
?<br>
<br>
IMO it may be possible, but not without head explosion. Sure, it's the<br>
case for NaN, but that's just because of how IEEE defined it; there is<br>
nothing in daily experience that acts that way. IMO it is much easier<br>
to understand x == y but x !== y, that is, “these two forks are<br>
identical, but they're not the same fork.”</blockquote><div><br></div><div>Yet both these scenarios really do happen with stdlib types.</div><div><br></div><div>I'm of the strong belief that it's only potentially head-exploding for some users because the language tries to paper over scenarios where it might happen and doesn't provide a way even to express this idea in code ("Binary operator '===' cannot be applied to two 'Int' operands").</div><div><br></div><div>If, as I suggest, we put front and center the idea that _some comparisons might fail or return nil_, the resulting design is not in itself guaranteed to explode heads--I think.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
>> >> >> >> This is a bump in the rug – push it down in one place, it pops up<br>
>> >> >> >> in another. I feel like this proposal at least moves the bump to<br>
>> >> >> >> where<br>
>> >> >> fewer<br>
>> >> >> >> people will trip over it. I think it highly likely that the<br>
>> >> >> intersection of<br>
>> >> >> >> developers who understand enough about floating point to write<br>
>> truly<br>
>> >> >> >> correct concrete code, but won’t know about or discover the<br>
>> >> documented<br>
>> >> >> >> difference in generic code, is far smaller than the set of people<br>
>> who<br>
>> >> >> hit<br>
>> >> >> >> problems with the existing behavior.<br>
>> >> >> >><br>
>> >> >> ><br>
>> >> >> > So, to extend this analogy, I'd rather say that the bump is not in<br>
>> the<br>
>> >> >> rug<br>
>> >> >> > [Comparable] but rather in a section of the floor [FP NaN]. The rug<br>
>> >> might<br>
>> >> >> > overlie the bump, but the bump will always be there and people will<br>
>> >> find<br>
>> >> >> it<br>
>> >> >> > as they walk even if they don't immediately see it.<br>
>> >> >><br>
>> >> >> Correct.<br>
>> >> >><br>
>> >> >> > If we don't want people to trip over the bump while walking on the<br>
>> >> >> > rug, one very good alternative, IMHO, is to shape the rug so that<br>
>> it<br>
>> >> >> > doesn't cover the bump.<br>
>> >> >><br>
>> >> >> At what cost?<br>
>> >> >><br>
>> >> >> More specifically: why is it the right behavior, for our audience, to<br>
>> >> >> trap when Equatable comparison happens to encounter NaN? Will this<br>
>> not<br>
>> >> >> simply "crash" programs in the field that otherwise would have "just<br>
>> >> >> worked?"<br>
>> >> ><br>
>> >> > No, as I propose it, programs in the field would be automatically<br>
>> >> migrated<br>
>> >> > to an alternative set of comparison operators `&==`, `&<`, etc. that<br>
>> >> would<br>
>> >> > work exactly as `==`, `<`, etc. do today.<br>
>> >><br>
>> >> Meaning, for floating point NaN &== NaN is false, and if you want to<br>
>> >> write numeric code that accounts for NaN, you use &==.<br>
>> >><br>
>> >> OK, so... Is &== a protocol requirement, or a protocol extension, or<br>
>> >> neither? If so, to which protocol is it attached?<br>
>> ><br>
>> > Please allow me to refer you to a Gist:<br>
>> > <a href="https://gist.github.com/xwu/e864ffdf343160a8a26839388f677768" rel="noreferrer" target="_blank">https://gist.github.com/xwu/<wbr>e864ffdf343160a8a26839388f6777<wbr>68</a><br>
>> ><br>
>> > In brief, it would be a protocol requirement on Comparable with a default<br>
>> > implementation. The rationale for its being on Comparable is given in the<br>
>> > text.<br>
>><br>
>> I'm sorry, I've seen your gist; I still can't find this rationale.<br>
>><br>
><br>
> "These operators will be defined on Comparable and not on FloatingPoint. A<br>
> key rationale for this design is to permit types other than FloatingPoint,<br>
> including third-party types, to distinguish between signaling and quiet<br>
> comparison of values when some values may be unordered with respect to<br>
> others. (Another rationale for this design is that &< corresponds to what<br>
> is currently spelled as <, which can be used as a predicate for<br>
> Sequence.sorted.)"<br>
<br>
</div></div>Thanks.<br>
<div><div class="gmail-h5"><br>
>> > I am not married to its being a requirement vs. an extension, but my<br>
>> > initial thought here is that there might be reason to provide an<br>
>> > alternative implementation in a conforming type, say for performance<br>
>> > reasons on Float.<br>
>> ><br>
>> >> > I would quibble with the notion that all such generic algorithms<br>
>> >> > currently "just work,"<br>
>> >><br>
>> >> I never claimed they do! They don't, because Equatable.== for floating<br>
>> >> point is not an equivalence relation. That's part of what we aim to<br>
>> >> fix.<br>
>> >><br>
>> >> You are proposing to fix that same problem a different way, one that<br>
>> leaves<br>
>> >> NaNs a bit out-in-the-cold (not necessarily bad), but also explicitly<br>
>> >> modifies generic algorithms so they continue to silently produce<br>
>> >> unspecified results (bad!)<br>
>> ><br>
>> > To clarify, no, I would not have the stdlib's generic algorithms continue<br>
>> > to produce unspecified results. I propose changes to them which align<br>
>> their<br>
>> > behavior with what you and Ben have proposed.<br>
>><br>
>> OK, I guess I must've misunderstood your earlier statements.<br>
>><br>
>> So this, IMO, is not tenable, at least in its current form:<br>
>><br>
>> ,----<br>
>> | The default implementation for contains, elementsEqual, split, and<br>
>> | starts(with:) on Sequence where Iterator.Element : Equatable, and for<br>
>> | index(of:) on Collection where Iterator.Element : Equatable, will<br>
>> | (notionally) use the following predicate:<br>
>> |<br>
>> | {<br>
>> | ($0 &== $1) || (<br>
>> | ($0 <=> $1) == .none &&<br>
>> | ($0 <=> $0) == .none &&<br>
>> | ($1 <=> $1) == .none)<br>
>> | }<br>
>> `----<br>
>><br>
>> The first problem here is that I can't figure out what this means and I<br>
>> doubt any normal user could either.<br>
><br>
> I think it is a little ill-served by the notation. However the concept is<br>
> simple. Sometimes, a thing cannot be compared even to itself.<br>
<br>
</div></div>As noted above, I disagree that that is a simple concept. Is it even<br>
mathematically well-founded? What you are expressing, at least on the<br>
surface, *seems* to be different from the mathematical notion of<br>
incomparability, which—despite its confusing name—is actually simple<br>
<<a href="https://en.wikipedia.org/wiki/Comparability" rel="noreferrer" target="_blank">https://en.wikipedia.org/<wbr>wiki/Comparability</a>>: x is incomparable to y if<br>
neither x > y nor x < y. The most common case is when x == y.</blockquote><div><br></div><div>Hmm, not an expert, but I don't see that definition. MathWorld <<a href="http://mathworld.wolfram.com/ComparableElements.html">http://mathworld.wolfram.com/ComparableElements.html</a>> confirms that the Wikipedia formulation is correct:</div><div><br></div><div>x is incomparable to y if neither x ***>=*** y nor x ***<=*** y.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">In a<br>
strict-weak ordering, every element is incomparable with itself. The<br>
complexity of your predicate suggests that you mean something much more<br>
complicated. Is there any reason not to require that when (x <=> y) ==<br>
nil, (x <=> x) == nil? That would simplify things a lot.<br>
<br>
Also, IMO it is confusing to say, “cannot be compared to itself,” when<br>
in practice that means “you can compare it to itself, and you get nil.”<br>
This is as much a result of comparison as any other.</blockquote><div><br></div><div>Well, x is said to be comparable to x if either x <= x or x >= x. Otherwise, x and x are incomparable.</div><div><br></div><div>I propose we model this by returning a result of type `Comparison?`, such that this scenario produces a nil value rather than a `Comparison` case. In that sense, one might say that there is no comparison result.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
> The prime example we speak of is NaN != NaN. This is the only major<br>
> concept that the design I propose here would require a user to<br>
> understand.<br>
><br>
> In this notation, if `x` cannot be compared to itself, `x <=> x` is nil.<br>
> For `contains` and other methods that are asking "do you have a thing" more<br>
> than they are asking "do you have a thing that is equivalent to a thing,"<br>
> we'll regard all values that can't be compared even to themselves as "the<br>
> same"; therefore, if `x <=> x` is nil and `y <=> y` is nil, then a<br>
> collection with an element `x` will be regarded as "containing" `y`.<br>
><br>
> How would this change in semantics<br>
>> be reflected in the documentation for these algorithms? How would you<br>
>> describe their requirements and results? All these algorithms currently<br>
>> have simple, understandable descriptions.<br>
>><br>
><br>
> The description would be augmented by an explanation along the lines of<br>
> this: "Some types can represent values that do not compare equal to<br>
> themselves; one example is floating point NaN ("not a number"). For the<br>
> purposes of { contains | split | etc. }, every value not equal to itself is<br>
> considered indistinguishable from every other value not equal to itself."<br>
><br>
>> Secondarily, I can't expect any generic algorithm author to understand<br>
>> what he's getting with this.<br>
>><br>
><br>
> Obviously, we'd have to have real-world data to back it up, but the beauty<br>
> of `Comparison?` being an optional enum is that all an author has to do is<br>
> handle all the cases. The compiler can even help with that.<br>
> In other words, all your generic algorithm author has to do is to<br>
> decide *something* for the question, "What should I do when x <=> y<br>
> returns nil?"<br>
<br>
</div></div>Is that an easy question to answer? It doesn't look that way, to me.</blockquote><div><br></div><div>For behaviors such as "always sort NaN greater than everything else" or "always sort NaN less than everything else," the way to answer the question would be to inspect whether `x` is comparable to `x` and/or `y` is comparable to `y`.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
><br>
><br>
>> > Any automatically migrated third-party generic code would indeed<br>
>> > continue to exhibit the same behavior as in Swift 3--but not only<br>
>> > do I not consider that to be a problem, I consider it to be a<br>
>> > requirement of source compatibility which is absolutely essential.<br>
>><br>
>> Well, we have not promised to preserve unspecified behaviors, so<br>
>> technically we can handle this either way. And we all seem to be agreed<br>
>> that any code that uses, e.g., the default sort(), *will* change its<br>
>> behavior with respect to NaN. So this is really a matter of degree.<br>
>><br>
>> > It would not, however, be _invisible_ to the reader of the generic<br>
>> > algorithm. The use of my proposed `&==` in a generic context should<br>
>> > stand out and prompt re-evaluation. That is to say, by using a<br>
>> > different spelling, we would have a visible hint in the code that a<br>
>> > generic algorithm may produce unspecified results with NaN.<br>
>><br>
>> That's a very nice feature.<br>
>><br>
>> >>> but the result is that they would behave exactly as they do today and<br>
>> >>> therefore would at least be no more broken.<br>
>> >><br>
>> >> If that's all we acheive, we should do nothing.<br>
>> ><br>
>> > I should hope that it's not all we achieve. But, consider the<br>
>> > following two alternatives: migrated code exhibits identical<br>
>> > behavior to Swift 3, or migrated code silently exhibits different<br>
>> > behavior that is "fixed."<br>
>><br>
>> I think we're both agreed that the latter *will* happen with<br>
>><br>
>> sort(floatsContainingNaN)<br>
>><br>
>> or<br>
>><br>
>> floatsContainingNaN.contains(.<wbr>NaN)<br>
><br>
> Well, I do not agree that `[+0.0].contains(-0.0)` should return `false`,<br>
> but we can discuss that on the side.<br>
<br>
</div></div>I'm not wedded to any particular answer there. The only reason I think<br>
we proposed it is that it corresponds to an IEEE comparison level.<br>
<span class="gmail-"><br>
> Otherwise, the key difference here is that code that's _correctly written_<br>
> for FP *would never use* a statement like<br>
> `floatsContainingNaN.contains(<wbr>.nan)` because with its current behavior it'd<br>
> be equivalent to `false` (or at least, it's not reliably `true`). The same<br>
> cannot be said for `Array<Double>.==`, which can be profitably used when<br>
> the user knows that `[.nan] != [.nan]`.<br>
<br>
</span>Okay, take elementsEqual then. I don't see why array1 == array2 is any<br>
different from array1.elementsEqual(array2).<br></blockquote><div><br></div><div>Indeed, `elementsEqual` requires special consideration. I had the same thought while writing the Gist and commented; "One consequence of this design is that `[Double.nan].contains(.nan)` returns true, as ordinarily expected. Consideration may be given to improving the naming of `elementsEqual` so as not to suggest that all elements of the receiver and argument literally compare `.some(.equal)`."</div><div><br></div><div>I am not suggesting this particular name, but `array1.elementsIndistinguishable(from: array2)` expresses something of the idea.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div class="gmail-h5">
>> > I am very disturbed by the possibility of the latter. It is the<br>
>> > only part of this proposal that keeps me up at night.<br>
>><br>
>> I agree it's concerning, but I don't think fixing just part of this<br>
>> problem is going to help either ;-).<br>
>><br>
>> > As it turns out, some people really do understand how floating<br>
>> > point comparison works, and they might have even carefully written<br>
>> > code that behaves correctly, relying on the current behavior when<br>
>> > things are compared. Please don't "fix" that code. If an array of<br>
>> > type [Float] starts to distinguish between +0.0 and -0.0 as you<br>
>> > propose, I'm quite sure that there is at least some code of my own<br>
>> > that will be quite broken.<br>
>><br>
>> yep, sadly. But IIUC that's inevitable.<br>
>><br>
>> >> > Standard library changes to `sort` and other functions will make them<br>
>> >> > "just work" with no distinguishable difference to the end user as<br>
>> >> > compared to this proposal here.<br>
>> >><br>
>> >> I'm sorry, I don't know what "this proposal here" means. Is that yours<br>
>> >> or the one Ben and I offered? It's certainly different from the results<br>
>> >> of our proposal.<br>
>> >><br>
>> >> The big problem with our proposal, AFAICT, is that<br>
>> >><br>
>> >> floatsIncludingNaNs.sort()<br>
>> >><br>
>> >> works but<br>
>> >><br>
>> >> floatsIncludingNaNs.sort(>)<br>
>> >><br>
>> >> does not. That is a real problem, but it *is* a difference from the<br>
>> >> current behavior, where neither one works.<br>
>> >><br>
>> ><br>
>> > Hmm, I get the sense that some of my replies to you have been lost. I<br>
>> have<br>
>> > explicitly proposed a design where `floatsIncludingNaNs.sort()` produces<br>
>> > the same behavior as what is proposed by you and Ben. I'd like to refer<br>
>> you<br>
>> > again to the fleshed out Gist:<br>
>> ><br>
>> > <a href="https://gist.github.com/xwu/e864ffdf343160a8a26839388f677768" rel="noreferrer" target="_blank">https://gist.github.com/xwu/<wbr>e864ffdf343160a8a26839388f6777<wbr>68</a><br>
>><br>
>> Sorry I misread your earlier remarks. I don't think I missed them.<br>
>><br>
>> >> > It would be an improvement over how the algorithms work today with<br>
>> >> > NaN.<br>
>> >> ><br>
>> >> > The major difference to the end user between what I propose and<br>
>> >> > this proposal here will surface when _new_ code is written that<br>
>> >> > uses `==` in the generic context, when working with types whose<br>
>> >> > values may compare unordered. Since I propose `<=>` to return a<br>
>> >> > value of type `Comparison?`, using the revised operator `==` is an<br>
>> >> > assertion that the result of comparison is not unordered. A user is<br>
>> >> > welcome to use `&==` or a custom predicate if that is not their<br>
>> >> > intention.<br>
>> >><br>
>> >> The problem with this is that there's still no simple way to get an<br>
>> ^^^^^^<br>
>> >> equivalence relation or a total order over all Doubles, including NaNs.<br>
>> ><br>
>> > There is. Given two values x and y, `x &< y || (y <=> y) == nil` is<br>
>> > identical to the `<` that you propose.<br>
>><br>
>> Err, I rest my case?<br>
>><br>
><br>
> It'd be easier with some more syntactic sugar, I guess.<br>
<br>
</div></div>I don't think so. The problem, IMO, is that you're expressing something<br>
hard to understand that falls outside anyone's experience (except maybe<br>
their experience with NaN).<br>
<span class="gmail-"><br>
> But my point above is that it is not difficult to describe what is<br>
> happening in prose. Here, simply, if a value is not equal to itself,<br>
> then it's ordered after all values that are equal to themselves.<br>
><br>
>> >> Now, I'm totally willing to have the discussion about how NaNs have no<br>
>> >> business being used as dictionary keys, or sort keys, or searched for,<br>
>> >> or any of the other things we do with day-to-day values. That's not<br>
>> >> something I really have an opinion on, yet.<br>
>> ><br>
>> > I would not assert that NaN has no business being used here; again, my<br>
>> > alternative design accommodates all of these use cases.<br>
>> ><br>
>> > Where we differ is that, in the case of a generic algorithm, my<br>
>> > alternative design would result in the author of that algorithm either<br>
>> > explicitly accommodating the presence of unordered values or asserting<br>
>> > their absence.<br>
>><br>
>> That is a nice property, except that IIUC you're talking about granting<br>
>> an exception for the most common forms of algorithms.<br>
><br>
> Not that I'm aware of? All algorithms will need to be modified to deal with<br>
> what happens if `(x <=> y) == nil`. I've just listed the ways in which<br>
> stdlib functions will do so.<br>
<br>
</span>An algorithm written in terms of any of the algorithms whose semantics<br>
you're talking about adjusting (sort, contains, elementsEqual...) will<br>
pick up that algorithm's decision without its author making any explicit<br>
choice.</blockquote><div><br></div><div>Sure. However, consider that `sort`, `contains`, `elementsEqual` all have corresponding `sort(by:)`, `contains(where:)`, `elementsEqual(_:by:)`. When I write `sort()`, I'm also saying, "Let the stdlib provide me with a default sort." I don't think that anyone who writes `sort()` is ignorant of the fact that they must accept a default predicate since they are _choosing_ not to specify one themselves.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
>> Aside: I object to characterizing these things as unordered.<br>
><br>
> Sorry, just using the terminology I read about. For instance, IEEE says<br>
> that NaN compares _unordered_ to itself.<br>
><br>
>> For the purposes of the algorithms that are written to accomodate<br>
>> them, they must be very much ordered, if not with respect to each<br>
>> other, at *least* with respect to other values in the space. In<br>
>> other words, there needs to be an underlying strict-weak order or the<br>
>> algorithms won't work.<br>
><br>
> Not a fixed one, though, unless I'm misunderstanding? It would be trivial<br>
> in my design for `sort` to order NaN values to the _end of the array_, as<br>
> opposed to ordering them greater than all other values, with the result<br>
> that NaN comes last whether you sort ascending or descending. Not that I'm<br>
> proposing this behavior, but I don't *think* that it breaks anything.<br>
<br>
</span>Well, that's just like saying you can sort with different predicates.<br>
<br>
I'm not sure it's OK to break the invariant that, in the absence of<br>
equal elements,<br>
<br>
x.sorted(by: <).elementsEqual(x.sorted(by: >).reversed())<span class="gmail-"><br></span></blockquote><div><br></div><div>Aha. A harebrained idea strikes: `elementsEqual` could ignore NaN.</div><div><br></div><div>[.nan, .nan, 42, 42].elementsEqual([42, .nan, 42, .nan]) // true</div><div><br></div><div>Just thinking out loud; not seriously proposing it. I do not have a problem with observing the invariant you state by sorting all NaN values always greater than or less than non-NaN values.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> Put another way, as far as I can tell, values like NaN only need to have a<br>
> specified sequence with respect to other values _for the purposes of any<br>
> particular operation at hand_. Therefore, I've proposed a design where it's<br>
> the generic algorithm and not the type that makes the decision for how<br>
> these values are placed in sequence.<br>
<br>
</span>Is that a known-useful property, or just a flexibility that is plausibly<br>
useful, someday?<span class="gmail-"><br></span></blockquote><div><br></div><div>This is what permits a generic `Comparable.max` and `Comparable.min` to both avoid spitting out NaN.</div><div><br></div><div>It also enables potentially more advanced algorithms: a partially ordered set S can have more than one maximal element; a mathematically rigorous algorithm could be provided (in a third-party library, say) to return all of them. Any such work in this domain requires the baseline Comparable protocol to recognize this concept of incomparability. It cannot be retroactively bolted on by a third party.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
>> It is not an avoidable problem--this is the bump in the rug that<br>
>> > cannot be smoothed out.<br>
>> ><br>
>> > I would posit that it is not possible to write an arbitrary generic<br>
>> > algorithm that (a) compares floating point values; (b) doesn't account<br>
>> for<br>
>> > NaN; and (c) behaves correctly, where correctly here means that it<br>
>> returns<br>
>> > what an average user would expect who is not thinking of floating point<br>
>> > comparison foibles.<br>
>><br>
>> Existence proof: any algorithm that, internally, uses binary search over<br>
>> a sorted collection of Comparabble values, or stores Comparable values<br>
>> in a tree-based dictionary, but does not care exactly how two distinct<br>
>> values sort, will work just fine<br>
>><br>
>> ...provided average users expect the existence of a distinct -0.0<br>
>> ;-)<br>
>><br>
><br>
> Ha, that's not what I was trying to get at. I fully expect that there will<br>
> be _some_ algorithms that will work out in this way. But, it is not<br>
> possible to say that any particular design of Comparable will smooth over<br>
> wrinkles with NaN such that generic algorithms in general can ignore the<br>
> possibility of NaN and yet handle them "correctly."<br>
<br>
</span>I guess if I agree that min and max should never return NaN unless<br>
that's the only value in the sequence, I have to agree with that<br>
statement.<br>
<span class="gmail-"><br>
>> > For instance, generic `max` produces what to the average user is<br>
>> > nonsense if NaN compares greater than everything.<br>
>> ><br>
>> >> I am, however, concerned that ordinary valid computations can lead to<br>
>> >> NaN and that allowing the appearance of a NaN to turn into a trap<br>
>> >> much later in the program, where it is finally compared with<br>
>> >> something, is not a behavior that would work for ordinary users.<br>
>><br>
>> That is my central worry with anything that turns operations on NaN into<br>
>> a trap. I'd very much appreciate hearing your thoughts.<br>
>><br>
><br>
> There are a limited number of ordinary operations that actually generate<br>
> NaN.<br>
><br>
> If using arithmetic operators, there's 0/0 (which, as everyone already<br>
> knows, traps if you're doing integer math--you'll notice I lobbied to<br>
> remove `/` from protocols refined by both Integer and FloatingPoint, so now<br>
> it's not even possible to accidentally do this in generic code unless you<br>
> unwisely make your own `Divisible` protocol).<br>
<br>
</span>Thanks for that. But people *will* divide Doubles outside of a generic<br>
context, and store the results. It happens a *lot*.<span class="gmail-"><br></span></blockquote><div><br></div><div>Sure. Is your principal worry that people will run into this behavior while writing simple code in the context of using concrete Doubles? It is an inescapable possibility with trapping `==`. It is also clearly different from precedent in a lot of languages to name the default FP comparison operator `&==`. That said, conceptually it seems similar to making `+` trap on overflow. Were the designers of that feature simply not worried about people adding lots of Int8 values?</div><div><br></div><div>Anyway, I look forward to Steve's expanded thoughts on this particular point.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> Other than that, you'd have to be already working with infinite values and<br>
> the arithmetic operators, or you'd have to invoke some of the trigonometric<br>
> and transcendental functions--but those are specific to floating point and<br>
> not accounting for NaN would be pretty silly there.<br>
><br>
> In general, I really think there's something valuable to trapping when you<br>
> try to propagate NaN through code that doesn't expect it.<br>
<br>
</span>Oh, I agree. The problem is that nobody is talking about stopping<br>
propagation... unless you happen to compare the value somewhere along<br>
the way.<br></blockquote><div><br></div><div>How convenient that (quoting from Wikipedia) it is recorded for all the world that "[f]loating-point operations **other than ordered comparisons** normally propagate a quiet NaN" [emphasis mine]. It is true that the IEEE default is "non-stop" exception handling; the consequence of a design such as mine would be that such IEEE floating-point defaults would need their own distinct notation.</div><div><br></div><div>Simply, all of the following cannot be true:</div><div><br></div><div>(a) `==` gives the same answer in all contexts (i.e., if `a == b` in generic contexts, then `a == b` in non-generic contexts)</div><div>(b) `==` behaves according to IEEE default "non-stop" exception handling rules for FP values and gives the IEEE-prescribed result</div><div>(c) `==` is useful for a generic algorithm that works with both FP and non-FP values and avoids unspecified behavior with FP values</div><div><br></div><div>Currently, (a) and (b) are both true. I propose (a) and (c), providing (b) with a different spelling. You propose (b) and (c).</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> After all, _something_ has gone wrong if you're copying "NaN GB" of<br>
> data, or you're at "NaN%" progress on a task. And those are innocuous<br>
> examples because it's probable that the value is being used for user<br>
> interface display and not much else. With other uses, certainly<br>
> nothing good can come of an unaccounted-for NaN.<br>
<br>
</span>Now, wait just a darn-tootin' minute there, pardner. Well, if that were<br>
strictly true, we'd just make them always trap, right? Non-signaling<br>
NaNs exist because somebody thought it was “good” to be able to finish a<br>
calculation and get *some* results out of it even if other parts of the<br>
results come from unaccounted-for nonsense.<span class="gmail-"><br></span></blockquote><div><br></div><div>Hmm, I have always assumed that the most pressing motivation is to permit you to compute `sin(cos(tan(atan(acos(asin(x))))))` and deal with NaN at the end instead of checking after every function call, which would make writing out a mathematical formula like this quite untenable.</div><div><br></div><div>It is true that by allowing propagation, you can serialize NaN into some long-term storage format for arbitrary periods of time before dealing with it, but my understanding is that it's actually recommended for placeholders in deserialized data to be __signaling NaNs__. The standard is pretty terse on the topic (unless I'm missing some text), but writes: "Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as complex-affine infinities or extremely wide range) that are not in the scope of this standard."</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> Is it really the correct thing to supply some "default" answer,<br>
> however explainable, when a user unintentionally asks, "Is my number x<br>
> less than not-a-number?" As a healthcare provider, would it be OK for<br>
> me to prescribe NaN doses of medication? An EMR written in Swift might<br>
> tell me it's not less than the minimum dose!<br>
<br>
</span>Well, I'm afraid you haven't really addressed my concern, which is that<br>
NaNs may propagate a long way before we find out they have appeared, if<br>
we find out at all, and because of that propagation, trapping might be<br>
worse than continuing.<br></blockquote><div><br></div><div>I think that this concern would have to be addressed with either empirical data or expertise more than I can offer. It seems that you're arguing that no function should ever halt execution when it encounters NaN.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="gmail-">
>> Again, I'm concerned that NaNs will arise as the result of computations<br>
>> involving external inputs, and that programs that would otherwise<br>
>> harmlessly propagate NaN values (just as non-signaling NaN was<br>
>> designed!) will trap in the field, very far from the original source of<br>
>> the problem, which means bugs will be very hard to correct.<br>
>><br>
><br>
> Is it that common to get NaN as a result of external inputs?<br>
<br>
</span>I honestly don't know.<br>
<span class="gmail-"><br>
> JSON, for example, doesn't even permit NaN.<br>
<br>
</span>Yeah, but you only need to encode zero in your JSON to produce a NaN<br>
from a simple division.<br>
<span class="gmail-"><br>
> I agree that it is possible to propagate NaN in a useful way, and<br>
> indeed I would propose to expand the option to propagate values that<br>
> are unordered with respect to themselves by having operators defined<br>
> on Comparable. However, my disagreement with you here is that we<br>
> should not assume that _unintentional_ propagation is _harmless_<br>
> propagation.<br>
<br>
</span>I didn't say it was harmless. I'm just not sure whether it's as harmful<br>
as trapping far from the source of the problem would be. I honestly<br>
don't know the answers here. I've used very little FP in the<br>
applicatiopn programming I've done, so I don't have a good read on what<br>
goes wrong for people.</blockquote><div><br></div><div>In my limited experience, generally what goes wrong is that feeding NaN into algorithms often leads to nonsensical results. Other than simply displaying the result, if you want to actually use it to do interesting work, you end up taking decisions you do not really intend on the basis of those results.</div></div></div></div>