[swift-evolution] Strings in Swift 4

Sun Feb 5 12:26:04 CST 2017

> On Feb 5, 2017, at 02:22, Jonathan Hull via swift-evolution <swift-evolution at swift.org> wrote:
> 
>>>>> Just out of curiosity, what are the use-cases for an infinite sequence
>>>>> (as opposed to a sequence which is bounded to the type’s representable
>>>>> values)?
>>>> 
>>>> 1. The type may not have an inherent expressible bound (see BigInt,
>>>>  UnsafePointer, and *many* real-life Index types).
>>> 
>>> If I understand what you are saying, this is why I was arguing that
>>> these partial ranges should not, by themselves, conform to Sequence.
>>> In cases where you do have a natural expressible bound, then it should
>>> conditionally conform to sequence.  
>> 
>> I'm sorry, I don't understand that last sentence.  The only way to
>> conditionally conform in one particular case is to make the conformance
>> condition include detection of that case. I could guess at what you mean
>> but I'd rather you explain yourself.  Specifically, which partial ranges
>> should conform to Sequence, and which shouldn't, in your view?
> 
> Oh, sorry, I was referring to an earlier post I made in this thread, but didn’t make that clear.  I’ll give a longer explanation of my reasoning behind the statement that partial ranges shouldn’t conform to Sequence (on their own) below... but the short version is that the conformance, like cake, is a lie... and we are using the trap to lie to ourselves about it.
> 
> 
>>> In other cases, you should have to write that conformance yourself if
>>> you want it.
>> 
>> You've asserted that things should be a certain way (which I don't fully
>> understand but I hope you'll explain), but you haven't said anything to
>> clarify *why* you think they should be that way.
> 
> I believe that the ill-defineness/fuzziness of this issue that others have eluded to and the desire to trap is telling us that we are fighting the underlying structure of the types.  I get that trapping  can save us from an entire class of nasty bugs, but I also believe that they should be a last resort in design because they have a tendency to let us hide subtle design mistakes from ourselves.
> 
> If we continue down this path, we will find more and more edge cases that need to be papered over, until the design eventually needs to be scrapped or buttressed with compiler magic.  For example, let’s say we define a PartialRange as a simple struct with optional endpoints, then we conform it to Sequence.  Now we can say ’n…’ and get a sequence.  But what happens when we say ‘…n’?  

My opinion: ...n should definitely not be a Sequence, and I think any design for one-sided ranges needs to ensure this in the type system not at runtime. There might be practical benefits for not doing this, but I think the bar for accepting that trade-off should be high.

> Where does that sequence start if it has an implicit start point of negative infinity?  Well, we promised a Sequence, so our only real option is to trap (for some types anyway).  For UInt, ‘…n’ is a perfectly valid sequence

...n is not a valid sequence regardless of the type for n. Just because some types might have a "well, I guess that makes sense" value for the lower bound doesn't mean that's the appropriate choice for all uses. I'm not sure why you think UInt has a valid value for ...n but Int doesn't. Why is 0 any more special than Int.min?

> (though we could trap on that as well for consistency sake and say that you have to say '0…n’).  We are using trapping to let us ignore the fact that we are promising something (conformance to Sequence) that isn’t really true...

In what way is it not true? Things conform to Sequence when they have an implementation of makeIterator() that gives you a value (in O(1) time) that yields values via next() based on some sensible documented behavior (though some protocols like Collection impose some further rules on their Sequence conformance). We are not seeking some higher truth here, we are building useful tools.

> In this case, I believe we are trying to shoehorn a number of different fuzzily defined requirements that almost, but don’t quite fit together:
> 
> 1) RangeExpressions should work where Ranges work now
> 2) We want to be able to slice a collection using ’n…’ and ‘…n’, and the implicit boundary should be filled in by the collection
> 3) We want to be able to create a sequence from ’n…’ that can be zipped and iterated over

These definitions don't seem fuzzy to me, they seem sufficiently precise, especially when you put back the part you dropped from my original definition for 3: "The behavior of that sequence’s iterator is that it starts at the lower bound, and increments indefinitely." 1 is an implementation detail IMO rather than a functional goal.

> Looking at our requirements above, let’s start with #1.  Ranges (which are countable) conform to Collection.  We keep saying we need these to conform to Sequence (because of requirement 3), but I think what we really want/mean in a lot of cases is conformance to Collection.  
> The problem is that we don’t quite have enough information available to do that, so PartialRanges cannot be used in generic functions on Collection.  This is exposing a weakness of our Sequence/Collection protocols around infinite Sequences/Collections that we talked about a few months back, but ran out of time to solve (and I didn’t have bandwidth at the time to work on it).

Don't get hung up on the Sequence/Collection difference. I don't think it impacts the question at hand. At some point, this list needs to have a discussion of whether there should be "infinite" collections, breaking the rule that endIndex be "reachable". Having one-sided ranges be collections has the benefit of making it clear they are multi-pass, and also allowing zip(1..., array) to retain the array's random-access-collection-ness, which is a big win (which also requires enhancing zip to cater to this, something that will be a lot easier with conditional conformance). But I don't think this question affects your concerns about infinite ranges eventually trapping, it should happen either way.

> As others have pointed out, 2 & 3 are similar, but have slightly different semantics.  In the case of 2, we have an implicit boundary being filled in: ’n…’ means a range from ’n’ up to that boundary.

No. 'n...' does not mean this. Collection.subscript(_: one-sided range) means this.

> We have explicitly stated we don’t want it to trap here, because that would make it useless.  In the case of 3, we are giving it a different meaning: ’n…’ means starting with n, count higher until you run out of numbers… then, crash.

No, it means starting with n, count upwards. It is up to the type for n as to what to do when a practical upper bound is reached.

>  Here you are explicitly asking for trapping behavior.  

No, we are defining n... to increment from n indefinitely. It is the definition of Int that has the trapping behavior.

> These appear very similar at first glance, but need to be thought about subtly differently. For example, we have to be extremely careful using #3 with functional map/filter/reduce, and things like ‘count’ would instantly crash (how do you count an infinity?).  

The idea of infinite sequences isn't new. 1... is just shorthand for sequence(first: 1) { $0 + 1 }, which you can already write. Yes, you need to be careful with it and functions that consume a sequence in its entirety, like .map or .reversed. .lazy.map is fine though. This problem is not solved by giving 1... an implied upper bound of Int.max unless you think it's OK to create array of nine quintillion elements.

The idea of an infinite collection, and the consequences for count etc, is new for the std lib but can be considered separately from this discussion. If ruled in, countable one-sided ranges should be a collection, if ruled out, they should just be sequences, but either way, the problem with Sequence.map remains.

> We need to be really careful with #3 in general, because it breaks so many of the implicit guarantees around Collection (thus fulfilling #3, stops us from fulfilling #1). As I showed above, it also breaks the guarantees for Sequence when the starting bound is missing.

> 
> What I am suggesting is a small shift in the design, which lets us meet all 3 of the above design goals by aligning them to have the same semantics (those of requirement 2).
> 
> That is, partial ranges are just partially defined ranges, and they don’t give us a sequence or collection by themselves.  In fact, they don’t even work as a Range by themselves.  To work as a RangeExpression, Sequence, or Collection, we need to be able to provide that implicit boundary and make it explicit.  In the case of slicing collections, the collection provides that implicit boundary.  In the case of types, we would have a protocol that defines the implicit/natural boundaries for that type. Then we can conditionally conform to Collection based on that protocol. Types which are unable to conform to the protocol (like BigInt) would not gain the conditional conformance to Collection/Sequence, and would have to do it themselves, if desired.
> 
> For clarity, in my mind, PartialRange is a struct which has optional bounds and conforms to the RangeExpression protocol. RangeExpression is a protocol which defines a function that returns an actual Range given a set of concrete bounds to replace any missing ones.  In the case of slicing a collection, the collection is calling that function with its own bounds to get the resulting range. In the case of ‘for i in 0…’, the protocol defining the implicit/natural bounds gives us the bounds for that function, and the resulting range is used.  Ranges themselves conform to RangeExpression by just returning self.
> 
> Now, things like ‘…n’ work just fine.  We could even have ‘…’ mean “the whole thing” if we decide we want that.  We can conform to collection in most cases.

So in your design the "natural" end point for integers is their maximum value? So when you write 0..., you are actually writing 0...Int.max (because 0 without an explicit type is inferred to be Int). And your view is that this is safer because it won't trap, it will just silently stop when it gets to this number? Implicit silent suppression of overflow is antithetical to the current design of the standard library.

Under most circumstances, hitting that maximum is going to be unintentional and an error on the part of the programmer. In the case of Int, hitting Int.max almost always means something has gone horribly wrong with your loop, which should have terminated long ago. Let's hope not having that trap doesn't result in transferring a quadrillion dollars to someone else's account. In the case of small types where overflow is more likely, implicitly swallowing overflow is even worse. When a programmer chooses to use a small representation, they are signing up for responsibility to handle likely overflow. If they truly want to stop at Int8.max explicitly, they should write that. But they might not – they might have forgotten that they're dealing with a small representation. Silently swallowing these overflows leads to bugs that are very hard to catch or diagnose and can have very unpleasant consequences in production in both the small and large representation cases.

> For things like BigInt, conformance to Sequence would have to be provided explicitly (if desired)… but I view that as a good thing.  When you go to add that conformance, you again run into issues like: what does ‘…n’ do?  There are different answers/hacks (maybe you trap, maybe you just start at zero or something), each with tradeoffs. It is good that those tradeoffs are limited to the edge cases, as opposed to spread across all RangeExpressions (breaking the ability to conform to collection, for example).  My preference would be not to provide Sequence conformance for PartialRanges of BigInt at all, forcing the user to think about and provide explicit bounds.

As a user of BigInt, I have explicitly chosen a type that theoretically has no upper bound, and I should be aware of the consequences of using that type. Having made that choice, I would find being forced to pick an arbitrary upper bound out of thin air extremely annoying and of no practical benefit.

> 
>>>> 2. I keep repeating variants of this example:
>>>> 
>>>> func listElements<
>>>>   S: Sequence, N: Number
>>>>> (of s: S, numberedFrom start: N) {
>>>>   for (n, e) in zip(start..., s) {
>>>>     print("\(n). \(e)")
>>>>   }
>>>> }
>>>> 
>>>> which avoids incorrect behavior when N turns out to be a type that
>>>> can't represent values high enough to list everything in s—**if and
>>>> only if** `start...` is an unbounded range, rather than one that
>>>> implicitly gets its upper bound from its type.
>>> 
>>> I really worry about the possibility of long-term foot-gunning in this
>>> case.  I showed this to a friend who maintains old/abandoned codebases
>>> for a living, and he said “Oh god, that isn’t going to actually crash
>>> until 5 years in, when it gets passed some rare type of file… and by
>>> then the original programmer will have left.”  
>> 
>> Then either he's using the wrong language, or he needs to come to grips
>> with the realities of software that's both reliable and efficient.
>> Swift explicitly acknowledges that in efficient code there are inherent
>> representational limitations that force a choice between stopping the
>> program and continuing with incorrect behavior.  Swift explicitly
>> chooses to stop the program in these cases.  Practically speaking, every
>> program written in Swift is full of checks for these conditions, and
>> this is nothing new.
> 
> Haha, he makes a bunch of money because of other people’s poor choice of programming languages/frameworks.  But as a result, he has seen a lot of old code and he has a good sense of what will break years down the line, and what is difficult to refactor...
> 
> As I said above, my main worry is that we are fooling ourselves by providing the trap, and actually creating more likelihood of (subtle) issues/bugs in the long run.
> 
>>> He hit on exactly what I had been feeling as well (but hadn’t been
>>> able to put into words until I heard his).  The thought is that this
>>> will help the programmer find an error by trapping, 
>> 
>> It will also prevent a program from wandering off into
>> unpredictable/broken behavior in the hands of a user.  If your software
>> keeps running but writes a corrupted file, that's usually much worse
>> than a hard stop.
> 
> You forget that my version also has mistake prevention built in… at compile time, instead of runtime.    
> 
> 
>>> but because it is dependent on the interactions of two subsystems, it
>>> will often create one of those crashes where the stars have to align
>>> for it to happen (which are my least favorite type of bug/crash to
>>> debug).
>> 
>> You could look at it that way, but even in a programming language such
>> as Python, whose integers are effectively all BigInts, a similar
>> “stars-have-to-align” scenario occurs when you run out of memory.  On
>> modern opeerating systems that effectively means your whole system
>> grinds to a halt. It is no better than a crash, and often leads to a
>> crash (sorry, you have no stack space left with which to make this
>> function call!)  These things are a fact of life.
> 
> True, but that doesn’t mean we should add more stars-have-to-align cases.
> 
> 
>>> I think this example also shows how my suggestion of requiring an
>>> extra protocol for conformance to Sequence would actually be much more
>>> effective in preventing the programmer error…
>>> 
>>> You would not have been able to write the function the way you did,
>>> because writing ‘start…’ as a sequence would have required the
>>> addition of ‘where N:FiniteComparable’ (or whatever we call the
>>> protocol providing natural bounds).  In having to write that N needs
>>> bounds, you are much more likely to be thinking of what those bounds
>>> might be.  
>> 
>> I explicitly *don't want* N to need to have expressible bounds.  When we
>> get a BigInt type, it will be especially effective in these scenarios.
> 
> I still maintain that the issue here is with the design of the function (hiding the importance of the type’s bounds) combined with the problems around infinite sequences. See my thoughts on BigInt sequences above…
> 
> It isn’t as pretty, but this version shouldn’t have the landmine (caveat: written in Mail):
> 
> 	func listElements<S: Sequence, N:Number>(of s:S, numberedFrom start:N) {
> 		var bigN = BigInt(start)
> 		for item in s {
> 			print(“\(bigN). \(item)”
> 			bigN += 1
> 		}
> 	}
> 
> I suspect that running into the issue where you had to add ’N:FiniteComparable’ and then not wanting to specify the bounds would have led you to write a less pretty, but more robust version like the one above.
> 
> Thanks,
> Jon
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170205/e775ddb9/attachment.html>