[swift-evolution] Strings in Swift 4

Sun Feb 5 04:22:30 CST 2017

>>>> Just out of curiosity, what are the use-cases for an infinite sequence
>>>> (as opposed to a sequence which is bounded to the type’s representable
>>>> values)?
>>> 
>>> 1. The type may not have an inherent expressible bound (see BigInt,
>>>  UnsafePointer, and *many* real-life Index types).
>> 
>> If I understand what you are saying, this is why I was arguing that
>> these partial ranges should not, by themselves, conform to Sequence.
>> In cases where you do have a natural expressible bound, then it should
>> conditionally conform to sequence.  
> 
> I'm sorry, I don't understand that last sentence.  The only way to
> conditionally conform in one particular case is to make the conformance
> condition include detection of that case. I could guess at what you mean
> but I'd rather you explain yourself.  Specifically, which partial ranges
> should conform to Sequence, and which shouldn't, in your view?

Oh, sorry, I was referring to an earlier post I made in this thread, but didn’t make that clear.  I’ll give a longer explanation of my reasoning behind the statement that partial ranges shouldn’t conform to Sequence (on their own) below... but the short version is that the conformance, like cake, is a lie... and we are using the trap to lie to ourselves about it.

>> In other cases, you should have to write that conformance yourself if
>> you want it.
> 
> You've asserted that things should be a certain way (which I don't fully
> understand but I hope you'll explain), but you haven't said anything to
> clarify *why* you think they should be that way.

I believe that the ill-defineness/fuzziness of this issue that others have eluded to and the desire to trap is telling us that we are fighting the underlying structure of the types.  I get that trapping  can save us from an entire class of nasty bugs, but I also believe that they should be a last resort in design because they have a tendency to let us hide subtle design mistakes from ourselves.

If we continue down this path, we will find more and more edge cases that need to be papered over, until the design eventually needs to be scrapped or buttressed with compiler magic.  For example, let’s say we define a PartialRange as a simple struct with optional endpoints, then we conform it to Sequence.  Now we can say ’n…’ and get a sequence.  But what happens when we say ‘…n’?  Where does that sequence start if it has an implicit start point of negative infinity?  Well, we promised a Sequence, so our only real option is to trap (for some types anyway).  For UInt, ‘…n’ is a perfectly valid sequence (though we could trap on that as well for consistency sake and say that you have to say '0…n’).  We are using trapping to let us ignore the fact that we are promising something (conformance to Sequence) that isn’t really true...

In this case, I believe we are trying to shoehorn a number of different fuzzily defined requirements that almost, but don’t quite fit together:

1) RangeExpressions should work where Ranges work now
2) We want to be able to slice a collection using ’n…’ and ‘…n’, and the implicit boundary should be filled in by the collection
3) We want to be able to create a sequence from ’n…’ that can be zipped and iterated over

Looking at our requirements above, let’s start with #1.  Ranges (which are countable) conform to Collection.  We keep saying we need these to conform to Sequence (because of requirement 3), but I think what we really want/mean in a lot of cases is conformance to Collection.  The problem is that we don’t quite have enough information available to do that, so PartialRanges cannot be used in generic functions on Collection.  This is exposing a weakness of our Sequence/Collection protocols around infinite Sequences/Collections that we talked about a few months back, but ran out of time to solve (and I didn’t have bandwidth at the time to work on it).

As others have pointed out, 2 & 3 are similar, but have slightly different semantics.  In the case of 2, we have an implicit boundary being filled in: ’n…’ means a range from ’n’ up to that boundary. We have explicitly stated we don’t want it to trap here, because that would make it useless.  In the case of 3, we are giving it a different meaning: ’n…’ means starting with n, count higher until you run out of numbers… then, crash.  Here you are explicitly asking for trapping behavior.  These appear very similar at first glance, but need to be thought about subtly differently. For example, we have to be extremely careful using #3 with functional map/filter/reduce, and things like ‘count’ would instantly crash (how do you count an infinity?).  We need to be really careful with #3 in general, because it breaks so many of the implicit guarantees around Collection (thus fulfilling #3, stops us from fulfilling #1). As I showed above, it also breaks the guarantees for Sequence when the starting bound is missing.

What I am suggesting is a small shift in the design, which lets us meet all 3 of the above design goals by aligning them to have the same semantics (those of requirement 2).

That is, partial ranges are just partially defined ranges, and they don’t give us a sequence or collection by themselves.  In fact, they don’t even work as a Range by themselves.  To work as a RangeExpression, Sequence, or Collection, we need to be able to provide that implicit boundary and make it explicit.  In the case of slicing collections, the collection provides that implicit boundary.  In the case of types, we would have a protocol that defines the implicit/natural boundaries for that type. Then we can conditionally conform to Collection based on that protocol. Types which are unable to conform to the protocol (like BigInt) would not gain the conditional conformance to Collection/Sequence, and would have to do it themselves, if desired.

For clarity, in my mind, PartialRange is a struct which has optional bounds and conforms to the RangeExpression protocol. RangeExpression is a protocol which defines a function that returns an actual Range given a set of concrete bounds to replace any missing ones.  In the case of slicing a collection, the collection is calling that function with its own bounds to get the resulting range. In the case of ‘for i in 0…’, the protocol defining the implicit/natural bounds gives us the bounds for that function, and the resulting range is used.  Ranges themselves conform to RangeExpression by just returning self.

Now, things like ‘…n’ work just fine.  We could even have ‘…’ mean “the whole thing” if we decide we want that.  We can conform to collection in most cases.

For things like BigInt, conformance to Sequence would have to be provided explicitly (if desired)… but I view that as a good thing.  When you go to add that conformance, you again run into issues like: what does ‘…n’ do?  There are different answers/hacks (maybe you trap, maybe you just start at zero or something), each with tradeoffs. It is good that those tradeoffs are limited to the edge cases, as opposed to spread across all RangeExpressions (breaking the ability to conform to collection, for example).  My preference would be not to provide Sequence conformance for PartialRanges of BigInt at all, forcing the user to think about and provide explicit bounds.

>>> 2. I keep repeating variants of this example:
>>> 
>>> func listElements<
>>>   S: Sequence, N: Number
>>>> (of s: S, numberedFrom start: N) {
>>>   for (n, e) in zip(start..., s) {
>>>     print("\(n). \(e)")
>>>   }
>>> }
>>> 
>>> which avoids incorrect behavior when N turns out to be a type that
>>> can't represent values high enough to list everything in s—**if and
>>> only if** `start...` is an unbounded range, rather than one that
>>> implicitly gets its upper bound from its type.
>> 
>> I really worry about the possibility of long-term foot-gunning in this
>> case.  I showed this to a friend who maintains old/abandoned codebases
>> for a living, and he said “Oh god, that isn’t going to actually crash
>> until 5 years in, when it gets passed some rare type of file… and by
>> then the original programmer will have left.”  
> 
> Then either he's using the wrong language, or he needs to come to grips
> with the realities of software that's both reliable and efficient.
> Swift explicitly acknowledges that in efficient code there are inherent
> representational limitations that force a choice between stopping the
> program and continuing with incorrect behavior.  Swift explicitly
> chooses to stop the program in these cases.  Practically speaking, every
> program written in Swift is full of checks for these conditions, and
> this is nothing new.

Haha, he makes a bunch of money because of other people’s poor choice of programming languages/frameworks.  But as a result, he has seen a lot of old code and he has a good sense of what will break years down the line, and what is difficult to refactor...

As I said above, my main worry is that we are fooling ourselves by providing the trap, and actually creating more likelihood of (subtle) issues/bugs in the long run.

>> He hit on exactly what I had been feeling as well (but hadn’t been
>> able to put into words until I heard his).  The thought is that this
>> will help the programmer find an error by trapping, 
> 
> It will also prevent a program from wandering off into
> unpredictable/broken behavior in the hands of a user.  If your software
> keeps running but writes a corrupted file, that's usually much worse
> than a hard stop.

You forget that my version also has mistake prevention built in… at compile time, instead of runtime.    

>> but because it is dependent on the interactions of two subsystems, it
>> will often create one of those crashes where the stars have to align
>> for it to happen (which are my least favorite type of bug/crash to
>> debug).
> 
> You could look at it that way, but even in a programming language such
> as Python, whose integers are effectively all BigInts, a similar
> “stars-have-to-align” scenario occurs when you run out of memory.  On
> modern opeerating systems that effectively means your whole system
> grinds to a halt. It is no better than a crash, and often leads to a
> crash (sorry, you have no stack space left with which to make this
> function call!)  These things are a fact of life.

True, but that doesn’t mean we should add more stars-have-to-align cases.

>> I think this example also shows how my suggestion of requiring an
>> extra protocol for conformance to Sequence would actually be much more
>> effective in preventing the programmer error…
>> 
>> You would not have been able to write the function the way you did,
>> because writing ‘start…’ as a sequence would have required the
>> addition of ‘where N:FiniteComparable’ (or whatever we call the
>> protocol providing natural bounds).  In having to write that N needs
>> bounds, you are much more likely to be thinking of what those bounds
>> might be.  
> 
> I explicitly *don't want* N to need to have expressible bounds.  When we
> get a BigInt type, it will be especially effective in these scenarios.

I still maintain that the issue here is with the design of the function (hiding the importance of the type’s bounds) combined with the problems around infinite sequences. See my thoughts on BigInt sequences above…

It isn’t as pretty, but this version shouldn’t have the landmine (caveat: written in Mail):

	func listElements<S: Sequence, N:Number>(of s:S, numberedFrom start:N) {
		var bigN = BigInt(start)
		for item in s {
			print(“\(bigN). \(item)”
			bigN += 1
		}
	}

I suspect that running into the issue where you had to add ’N:FiniteComparable’ and then not wanting to specify the bounds would have led you to write a less pretty, but more robust version like the one above.

Thanks,
Jon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170205/446fafd3/attachment.html>