[swift-evolution] Partial Ranges (was: Strings in Swift 4)

Ben Cohen ben_cohen at apple.com
Wed Feb 8 20:31:45 CST 2017


Hi Jonathan,

It looks like the majority of this is about finite vs infinite collections. This should be a separate thread, and one we certainly need to have, but doesn’t need to be mixed together with the wider one-sided ranges discussion. For now, I think it’s reasonable to say that 1) left-one-sided ranges of countable things can be sequences, 2) right-sided ranges shouldn’t, and 3) if infinite collections end up being a thing, left-sided ranges should be one.


> On Feb 7, 2017, at 9:18 AM, Jonathan Hull via swift-evolution <swift-evolution at swift.org> wrote:
> 
> 
>>> Oh, sorry, I was referring to an earlier post I made in this thread,
>>> but didn’t make that clear.  I’ll give a longer explanation of my
>>> reasoning behind the statement that partial ranges shouldn’t conform
>>> to Sequence (on their own) below... but the short version is that the
>>> conformance, like cake, is a lie... and we are using the trap to lie
>>> to ourselves about it.
>> 
>> *If* the trap is a lie (and I'm not saying it is), it's a lie we tell
>> ourselves about Int, because practically speaking, most people can't
>> program correctly always keeping in mind that Int has finite expressive
>> range.  This is a conscious decision made by the Swift language a long
>> time ago.  It is *not* a lie about ranges, but if you make a range based
>> on Int it will contain a lie, just like everything else that uses Int.
> 
> I think I touched a 3rd rail here by mentioning trapping.  Trapping is a symptom of a design issue here, not the underlying problem.
> 
> I wasn’t saying that the trap itself is a lie, just that it lets us lie to ourselves about how the types fit together.  Basically we are trying to shove things into collection which aren’t collections, and then things like ‘count’ and ‘dropLast()’ start failing. You can no longer write reliable generic code under those conditions.
> 
> I am not arguing against trapping in general here, or that Int shouldn’t trap.  I am saying that PartialRange and Collection don’t mesh correctly, and it is REALLY useful for them to mesh correctly.  I am also saying that problem can be fixed by providing an upper bound (such as Int.max) instead of saying it is infinite (with respect to Collection).  As you pointed out, for something like Int, it won’t even get to the trap before locking up the computer by counting through billions of numbers anyway, so it shouldn’t make much practical difference there.  It will make a big practical difference in being able to call things like ‘count’ and ‘dropLast()’ and write generic code that works for all Collections.
> 
> 
> 
>>> If we continue down this path, we will find more and more edge cases
>>> that need to be papered over, until the design eventually needs to be
>>> scrapped or buttressed with compiler magic.  For example, let’s say we
>>> define a PartialRange as a simple struct with optional endpoints, then
>>> we conform it to Sequence.  Now we can say ’n…’ and get a sequence.
>>> But what happens when we say ‘…n’?  
>> 
>> You don't get a Sequence.
>> 
>>> Where does that sequence start if it has an implicit start point of
>>> negative infinity?  Well, we promised a Sequence, so our only real
>>> option is to trap (for some types anyway).  For UInt, ‘…n’ is a
>>> perfectly valid sequence (though we could trap on that as well for
>>> consistency sake and say that you have to say '0…n’).  We are using
>>> trapping to let us ignore the fact that we are promising something
>>> (conformance to Sequence) that isn’t really true...
>> 
>> This is a red herring.  Nobody has proposed making ...n a Sequence.
> 
> If ’n…’ and ‘…n’ are represented by different types, then I remove my objection about adhering to Sequence.  I could have sworn the last version I saw had them as a single type (and thus there was no way to enable one without the other), but that may not have been an official version.  If they are separate types, then it isn’t an issue (and trapping when the sequence reaches above Int’s limit is perfectly fine here). 
> 

Don’t worry, you’re not imagining it :)

There have been some prototypes pass by on this list of one-sided ranges where left- and right-sided ranges were covered by the same type. I think one example had it so the upper and lower bound were optional. This has the benefit of simplicity, but misses out on the opportunity for the type system to enforce certain behavior so I don’t think would be the final implementation we want.

> I am still worried about adhering to Collection though.
> 
> 

Will take this up on a separate thread.

>> We had several discussions about infinite collections last year.  My
>> conclusion was that they were useful and there was no reason to prohibit
>> them, so we should lift the reachability requirement for endIndex.
> 
> I agree that it could be useful, but I believe we need to be able to spell infinite collections differently than finite collections in the type system.  That would solve all of these issues.
> 
> 
>>> As others have pointed out, 2 & 3 are similar, but have slightly
>>> different semantics.  In the case of 2, we have an implicit boundary
>>> being filled in: ’n…’ means a range from ’n’ up to that boundary. 
>> 
>> Presumably, the type of n needs to have a representation for “infinity”
>> in some cases, because after all, given `x = n...`,
>> `x[x.startIndex...x.endIndex]` must be an infinite range.
> 
> I could potentially be talked into something like this.  How would it work for things like Int, which don’t currently have a concept of infinity?  Would it be added?
> 
> 

This needs to be part of the Infinite Collection discussion, but you could basically do something like this along similar lines to Optional (not necessarily with this naming…):

enum InfiniteIndex<T: Comparable>: Comparable {
	case Finite(T)
	case Infinity
	// Comparable requirements
}

> 
>>> In the case of 3, we are giving it a different meaning: ’n…’ means
>>> starting with n, count higher until you run out of numbers… then,
>>> crash.  
>> 
>> No, “n...” does not mean that.  There's absolutely no counting until you
>> combine it with “for...in” in the same way that “n...” doesn't mean
>> you're going to slice anything until you combine it with a a
>> collection's subscript.
> 
> Oh, I had thought you wanted conformance of ’n…’ to sequence so you could stick it in things like zip.  I have no issue at all with passing it around as a PartialRange, and then using it for whatever as a PartialRange.
> 

To try and clarify what Dave is saying: n… does not mean “starting with n, count higher until you run out of numbers” any more than “0…5” means “count from 0 through 5”. They don’t start counting when you create them any more than they start slicing. In both cases, they do conform to Sequence, so have a makeIterator() method, that returns an iterator that has a next() method, that counts up by one each call, starting from the lower bound. But unlike 0…5, 0… doesn’t return nil once it hits some point, it just keeps going. The crashing happens because you call next too many times on a Bound type that can’t increase any further. for i in n… { } crashing is to be expected just like `var i = 0; while true { i += 1 }` is.

> 
>>> These appear very similar at first glance, but need to be thought
>>> about subtly differently. For example, we have to be extremely careful
>>> using #3 with functional map/filter/reduce, 
>> 
>> Yes, you need to use lazy algorithms if you don't want your program to
>> stop.
> 
> Touche.  I kind of feel like we should make sequences/collections from Ranges (and now PartialRanges) lazy by default.
> 

This probably won’t fly. The main reason why all map/filter operations aren’t lazy by default is because of the sometimes-unexpected consequences of passing stateful closures to them (i.e. if the closure mutates some external state, then give different results over multiple passes, a no-no for collections). This problem is universal over any kind of sequence, so it would probably be very surprising to be lazy for some subset of them (other than the explicitly lazy ones, of course).

> 
>>> We need to be really careful with #3 in general, because it breaks so
>>> many of the implicit guarantees around Collection
>> 
>> What implicit guarantees are you saying Collection gives?
> 
> Basically that I can call the methods enabled by Collection on any collection.  That may no longer be safely true with an infinite collection. I don’t want to have a situation when writing algorithms generic over Collection where some portion of the methods I call (e.g. count) may no longer work.  Suddenly the standard library is full of minefields where I have to know whether the implementation of a method calls things like ‘count' to know if it is safe to call with my collection.
> 
> As a straw man example, let’s say someone creates a library with a version of something like zip on collections, which pre-allocates the array buffer based on the size of the collections which are passed to it. In the calculation of that size, the implementer takes the min of the counts of both collections, which is a completely valid thing to do based on the type. It would trap in the case of an infinite collection, so now the users of the function need to understand a bit of the implementation to know when they can use it safely.
> 
> 
>>> That is, partial ranges are just partially defined ranges, and they
>>> don’t give us a sequence or collection by themselves.  In fact, they
>>> don’t even work as a Range by themselves.  To work as a
>>> RangeExpression, Sequence, or Collection, we need to be able to
>>> provide that implicit boundary and make it explicit.  
>> 
>> Again, a partial range of BigInts or Floats is a useful thing I want to
>> be able to express.  In the former case there's no expressible upper
>> bound and in the latter the upper bound is unreachable.
> 
> Sure.  I don’t think I have ever suggested that these wouldn’t be able to be expressed as PartialRanges.  Just that if you want to be able to plug it in as a (non-partial) Range or (finite) Collection, you need a way to provide the missing/implicit bound, because that information is required by those types.
> 
> 
>>> In the case of slicing collections, the collection provides that
>>> implicit boundary.  In the case of types, we would have a protocol
>>> that defines the implicit/natural boundaries for that type. Then we
>>> can conditionally conform to Collection based on that protocol. Types
>>> which are unable to conform to the protocol (like BigInt) would not
>>> gain the conditional conformance to Collection/Sequence, and would
>>> have to do it themselves, if desired.
>> 
>> BigInt is not supposed to conform to Collection; only
>> LowerBoundedRange<BigInt> is supposed to do that.
> 
> I meant PartialRanges of BigInt, not BigInt itself.  I don’t see how it can conform to Collection without an upper bound.
> 
> 
>>> For clarity, in my mind, PartialRange is a struct which has optional
>>> bounds and conforms to the RangeExpression protocol. RangeExpression
>>> is a protocol which defines a function that returns an actual Range
>>> given a set of concrete bounds to replace any missing ones.  In the
>>> case of slicing a collection, the collection is calling that function
>>> with its own bounds to get the resulting range. In the case of ‘for i
>>> in 0…’, the protocol defining the implicit/natural bounds gives us the
>>> bounds for that function, and the resulting range is used.  Ranges
>>> themselves conform to RangeExpression by just returning self.
>>> 
>>> Now, things like ‘…n’ work just fine.  
>> 
>> They would work fine as proposed.  In the proposal the type of “...n” is
>> different from that of “n…”
> This is something I had not realized.  I remove my objection about Sequences, but not Collections.
> 
> 
>> What do *you* mean by “work just fine?”  Would they be collections?
> 
> Yes, as long as they are also countable.
> 
> 
>>> We could even have ‘…’ mean “the whole thing” if we decide we want
>>> that.  
>> 
>> That's achievable independently of any of the rest of this:
>> https://github.com/apple/swift/blob/unicode-rethink/test/Prototypes/Unicode.swift#L93
> 
> Good to know
> 
>> 
>>> We can conform to collection in most cases.
>> 
>> What is “we?”  PartialRange?  It seems to me that no matter how you
>> measure, it can't conform to Collection in most cases, because many
>> Comparables aren't also countable.
> 
> Yes, of course.  I didn’t want to complicate my argument by talking about countable everywhere (I only mentioned it once near the beginning).  We are only talking about countable things here.  Non-countable things can still be PartialRanges and RangeExpressions, of course, but they wouldn’t conform to Sequence/Collection out of the tin.
> 
> 
>>> For things like BigInt, conformance to Sequence would have to be
>>> provided explicitly (if desired)… but I view that as a good thing.
>>> When you go to add that conformance, you again run into issues like:
>>> what does ‘…n’ do?  There are different answers/hacks (maybe you trap,
>>> maybe you just start at zero or something), each with tradeoffs. It is
>>> good that those tradeoffs are limited to the edge cases, as opposed to
>>> spread across all RangeExpressions (breaking the ability to conform to
>>> collection, for example).  My preference would be not to provide
>>> Sequence conformance for PartialRanges of BigInt at all, forcing the
>>> user to think about and provide explicit bounds.
>> 
>> “Forcing users to think about” bounds on the expressive capability of
>> Int runs counter to Swift's design.  That ship sailed a long time ago.
> 
> I want to force users to think about the fact that Conforming to collection means providing explicit bounds.  If that isn’t possible because the “Collection” you are trying to make doesn’t have those bounds, then you are fighting the type system.
> 
> 
>>>>> He hit on exactly what I had been feeling as well (but hadn’t been
>>>>> able to put into words until I heard his).  The thought is that this
>>>>> will help the programmer find an error by trapping, 
>>>> 
>>>> It will also prevent a program from wandering off into
>>>> unpredictable/broken behavior in the hands of a user.  If your software
>>>> keeps running but writes a corrupted file, that's usually much worse
>>>> than a hard stop.
>>> 
>>> You forget that my version also has mistake prevention built in… at
>>> compile time, instead of runtime.
>> 
>> No, I don't forget .  “Preventing mistakes” by creating a reality that
>> doesn't match anyone's mental model when programming doesn't actually
>> work.  Programmers subconsciously erase the expressive limitations of
>> Int from their mental model, in the same way they erase the possibility
>> of running out of stack space for making a function call... and the
>> explicitly chosen semantics of Swift encourage that erasure.  Nothing
>> you've proposed changes the basic computational model for Int.
> 
> I don’t want to change anything about Int.  What I really want to change (well tweak, really) is the collection model (so that it takes into account finite vs infinite). But I figure that ship sailed with Swift 3.  So instead I am suggesting we tweak this design so that it doesn’t expose those issues.
> 
> Thanks,
> Jon
> 
> 
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170208/ca2c3747/attachment.html>


More information about the swift-evolution mailing list