[swift-evolution] Partial Ranges (was: Strings in Swift 4)

Sun Feb 5 12:16:09 CST 2017

on Sun Feb 05 2017, Jonathan Hull <jhull-AT-gbis.com> wrote:

>>>>> Just out of curiosity, what are the use-cases for an infinite sequence
>>>>> (as opposed to a sequence which is bounded to the type’s representable
>>>>> values)?
>>>> 
>>>> 1. The type may not have an inherent expressible bound (see BigInt,
>
>>>>  UnsafePointer, and *many* real-life Index types).
>>> 
>>> If I understand what you are saying, this is why I was arguing that
>>> these partial ranges should not, by themselves, conform to Sequence.
>>> In cases where you do have a natural expressible bound, then it should
>>> conditionally conform to sequence.  
>> 
>> I'm sorry, I don't understand that last sentence.  The only way to
>> conditionally conform in one particular case is to make the conformance
>> condition include detection of that case. I could guess at what you mean
>> but I'd rather you explain yourself.  Specifically, which partial ranges
>> should conform to Sequence, and which shouldn't, in your view?
>
> Oh, sorry, I was referring to an earlier post I made in this thread,
> but didn’t make that clear.  I’ll give a longer explanation of my
> reasoning behind the statement that partial ranges shouldn’t conform
> to Sequence (on their own) below... but the short version is that the
> conformance, like cake, is a lie... and we are using the trap to lie
> to ourselves about it.

*If* the trap is a lie (and I'm not saying it is), it's a lie we tell
ourselves about Int, because practically speaking, most people can't
program correctly always keeping in mind that Int has finite expressive
range.  This is a conscious decision made by the Swift language a long
time ago.  It is *not* a lie about ranges, but if you make a range based
on Int it will contain a lie, just like everything else that uses Int.

>>> In other cases, you should have to write that conformance yourself if
>>> you want it.
>> 
>> You've asserted that things should be a certain way (which I don't fully
>> understand but I hope you'll explain), but you haven't said anything to
>> clarify *why* you think they should be that way.
>
> I believe that the ill-defineness/fuzziness of this issue that others
> have eluded to and the desire to trap is telling us that we are
> fighting the underlying structure of the types.  

There's no “desire to trap.”  Ideally memory and cycles would be free
and unlimited, and we'd use infinite precision integers everywhere, and
never run out of memory.  

> I get that trapping can save us from an entire class of nasty bugs,
> but I also believe that they should be a last resort in design because
> they have a tendency to let us hide subtle design mistakes from
> ourselves.

We definitely try to avoid introducing situations that can trap where
they can better be prevented by the type system.

> If we continue down this path, we will find more and more edge cases
> that need to be papered over, until the design eventually needs to be
> scrapped or buttressed with compiler magic.  For example, let’s say we
> define a PartialRange as a simple struct with optional endpoints, then
> we conform it to Sequence.  Now we can say ’n…’ and get a sequence.
> But what happens when we say ‘…n’?  

You don't get a Sequence.

> Where does that sequence start if it has an implicit start point of
> negative infinity?  Well, we promised a Sequence, so our only real
> option is to trap (for some types anyway).  For UInt, ‘…n’ is a
> perfectly valid sequence (though we could trap on that as well for
> consistency sake and say that you have to say '0…n’).  We are using
> trapping to let us ignore the fact that we are promising something
> (conformance to Sequence) that isn’t really true...

This is a red herring.  Nobody has proposed making ...n a Sequence.

> In this case, I believe we are trying to shoehorn a number of
> different fuzzily defined requirements that almost, but don’t quite
> fit together:
>
> 1) RangeExpressions should work where Ranges work now
> 2) We want to be able to slice a collection using ’n…’ and ‘…n’, and
>    the implicit boundary should be filled in by the collection
> 3) We want to be able to create a sequence from ’n…’ that can be
>    zipped and iterated over
>
> Looking at our requirements above, let’s start with #1.  Ranges (which
> are countable) conform to Collection.  We keep saying we need these to
> conform to Sequence (because of requirement 3), but I think what we
> really want/mean in a lot of cases is conformance to Collection.  

I have always said they should be Collections, but since Collection
refines Sequence I'm just as happy to argue the point with anyone who
says they shouldn't be Sequences.

> The problem is that we don’t quite have enough information available
> to do that, so PartialRanges cannot be used in generic functions on
> Collection.  This is exposing a weakness of our Sequence/Collection
> protocols around infinite Sequences/Collections that we talked about a
> few months back, but ran out of time to solve (and I didn’t have
> bandwidth at the time to work on it).

We had several discussions about infinite collections last year.  My
conclusion was that they were useful and there was no reason to prohibit
them, so we should lift the reachability requirement for endIndex.

> As others have pointed out, 2 & 3 are similar, but have slightly
> different semantics.  In the case of 2, we have an implicit boundary
> being filled in: ’n…’ means a range from ’n’ up to that boundary. 

Presumably, the type of n needs to have a representation for “infinity”
in some cases, because after all, given `x = n...`,
`x[x.startIndex...x.endIndex]` must be an infinite range.

> We have explicitly stated we don’t want it to trap here, because that
> would make it useless.  

Traps are the response to precondition violations.  There's no reasonable
precondition this could violate, because we simply wouldn't provide the
API if there was.

> In the case of 3, we are giving it a different meaning: ’n…’ means
> starting with n, count higher until you run out of numbers… then,
> crash.  

No, “n...” does not mean that.  There's absolutely no counting until you
combine it with “for...in” in the same way that “n...” doesn't mean
you're going to slice anything until you combine it with a a
collection's subscript.

> Here you are explicitly asking for trapping behavior.  

Definitely not, IMO.  You're asking for an infinite loop, to within the
machine's representational ability.

> These appear very similar at first glance, but need to be thought
> about subtly differently. For example, we have to be extremely careful
> using #3 with functional map/filter/reduce, 

Yes, you need to use lazy algorithms if you don't want your program to
stop.

> and things like ‘count’ would instantly crash (how do you count an
> infinity?).

Whether it's instant is a matter of QOI.

> We need to be really careful with #3 in general, because it breaks so
> many of the implicit guarantees around Collection

What implicit guarantees are you saying Collection gives?

> (thus fulfilling #3, stops us from fulfilling #1). As I showed above,
> it also breaks the guarantees for Sequence when the starting bound is
> missing.
>
> What I am suggesting is a small shift in the design, which lets us
> meet all 3 of the above design goals by aligning them to have the same
> semantics (those of requirement 2).
>
> That is, partial ranges are just partially defined ranges, and they
> don’t give us a sequence or collection by themselves.  In fact, they
> don’t even work as a Range by themselves.  To work as a
> RangeExpression, Sequence, or Collection, we need to be able to
> provide that implicit boundary and make it explicit.  

Again, a partial range of BigInts or Floats is a useful thing I want to
be able to express.  In the former case there's no expressible upper
bound and in the latter the upper bound is unreachable.

> In the case of slicing collections, the collection provides that
> implicit boundary.  In the case of types, we would have a protocol
> that defines the implicit/natural boundaries for that type. Then we
> can conditionally conform to Collection based on that protocol. Types
> which are unable to conform to the protocol (like BigInt) would not
> gain the conditional conformance to Collection/Sequence, and would
> have to do it themselves, if desired.

BigInt is not supposed to conform to Collection; only
LowerBoundedRange<BigInt> is supposed to do that.

> For clarity, in my mind, PartialRange is a struct which has optional
> bounds and conforms to the RangeExpression protocol. RangeExpression
> is a protocol which defines a function that returns an actual Range
> given a set of concrete bounds to replace any missing ones.  In the
> case of slicing a collection, the collection is calling that function
> with its own bounds to get the resulting range. In the case of ‘for i
> in 0…’, the protocol defining the implicit/natural bounds gives us the
> bounds for that function, and the resulting range is used.  Ranges
> themselves conform to RangeExpression by just returning self.
>
> Now, things like ‘…n’ work just fine.  

They would work fine as proposed.  In the proposal the type of “...n” is
different from that of “n...”

What do *you* mean by “work just fine?”  Would they be collections?

> We could even have ‘…’ mean “the whole thing” if we decide we want
> that.  

That's achievable independently of any of the rest of this:
https://github.com/apple/swift/blob/unicode-rethink/test/Prototypes/Unicode.swift#L93

> We can conform to collection in most cases.

What is “we?”  PartialRange?  It seems to me that no matter how you
measure, it can't conform to Collection in most cases, because many
Comparables aren't also countable.

> For things like BigInt, conformance to Sequence would have to be
> provided explicitly (if desired)… but I view that as a good thing.
> When you go to add that conformance, you again run into issues like:
> what does ‘…n’ do?  There are different answers/hacks (maybe you trap,
> maybe you just start at zero or something), each with tradeoffs. It is
> good that those tradeoffs are limited to the edge cases, as opposed to
> spread across all RangeExpressions (breaking the ability to conform to
> collection, for example).  My preference would be not to provide
> Sequence conformance for PartialRanges of BigInt at all, forcing the
> user to think about and provide explicit bounds.

“Forcing users to think about” bounds on the expressive capability of
Int runs counter to Swift's design.  That ship sailed a long time ago.

>>>> 2. I keep repeating variants of this example:
>>>> 
>>>> func listElements<
>>>>   S: Sequence, N: Number
>>>>> (of s: S, numberedFrom start: N) {
>>>>   for (n, e) in zip(start..., s) {
>>>>     print("\(n). \(e)")
>>>>   }
>>>> }
>>>> 
>>>> which avoids incorrect behavior when N turns out to be a type that
>>>> can't represent values high enough to list everything in s—**if and
>>>> only if** `start...` is an unbounded range, rather than one that
>>>> implicitly gets its upper bound from its type.
>>> 
>>> I really worry about the possibility of long-term foot-gunning in this
>>> case.  I showed this to a friend who maintains old/abandoned codebases
>>> for a living, and he said “Oh god, that isn’t going to actually crash
>>> until 5 years in, when it gets passed some rare type of file… and by
>>> then the original programmer will have left.”  
>> 
>> Then either he's using the wrong language, or he needs to come to grips
>> with the realities of software that's both reliable and efficient.
>> Swift explicitly acknowledges that in efficient code there are inherent
>> representational limitations that force a choice between stopping the
>> program and continuing with incorrect behavior.  Swift explicitly
>> chooses to stop the program in these cases.  Practically speaking, every
>> program written in Swift is full of checks for these conditions, and
>> this is nothing new.
>
> Haha, he makes a bunch of money because of other people’s poor choice
> of programming languages/frameworks.  But as a result, he has seen a
> lot of old code and he has a good sense of what will break years down
> the line, and what is difficult to refactor...
>
> As I said above, my main worry is that we are fooling ourselves by
> providing the trap, and actually creating more likelihood of (subtle)
> issues/bugs in the long run.

Then, as Ben suggested, your argument is with `Int`, not with partial ranges.

>>> He hit on exactly what I had been feeling as well (but hadn’t been
>>> able to put into words until I heard his).  The thought is that this
>>> will help the programmer find an error by trapping, 
>> 
>> It will also prevent a program from wandering off into
>> unpredictable/broken behavior in the hands of a user.  If your software
>> keeps running but writes a corrupted file, that's usually much worse
>> than a hard stop.
>
> You forget that my version also has mistake prevention built in… at
> compile time, instead of runtime.

No, I don't forget .  “Preventing mistakes” by creating a reality that
doesn't match anyone's mental model when programming doesn't actually
work.  Programmers subconsciously erase the expressive limitations of
Int from their mental model, in the same way they erase the possibility
of running out of stack space for making a function call... and the
explicitly chosen semantics of Swift encourage that erasure.  Nothing
you've proposed changes the basic computational model for Int.

>>> but because it is dependent on the interactions of two subsystems, it
>>> will often create one of those crashes where the stars have to align
>>> for it to happen (which are my least favorite type of bug/crash to
>>> debug).
>> 
>> You could look at it that way, but even in a programming language such
>> as Python, whose integers are effectively all BigInts, a similar
>> “stars-have-to-align” scenario occurs when you run out of memory.  On
>> modern opeerating systems that effectively means your whole system
>> grinds to a halt. It is no better than a crash, and often leads to a
>> crash (sorry, you have no stack space left with which to make this
>> function call!)  These things are a fact of life.
>
> True, but that doesn’t mean we should add more stars-have-to-align
> cases.

But we *don't* have a language where everything is BigInt.  As Ben said,
if you want to “fix” the behavior of 0..., your quarrel is with Int, and
thus with the basic design of Swift.

-- 
-Dave