[swift-evolution] [Review] SE-0065 A New Model for Collections and Indices

Tue Apr 12 20:39:18 CDT 2016

On Tue, Apr 12, 2016 at 4:27 AM, Brent Royal-Gordon
<brent at architechies.com> wrote:
>>> (On the other hand, it might be that I'm conceiving of the purpose of `limitedBy` differently from you—I think of it as a safety measure, but you may be thinking of it specifically as an automatic truncation mechanism.)
>>
>> Hi Brent,
>>
>> Could you explain what kind of safety do you have in mind?  Swift will
>> guarantee memory safety even if you attempt to advance an index past
>> endIndex using the non-limiting overload.
>
> By "safety" here, I mean what I will call "index safety": not accidentally using an index which would violate the preconditions of the methods or properties you are planning to use it with. I think it's too easy to accidentally overrun the permitted range of indices, and the API should help you avoid doing that.

Hi Brent,

Thank you for the explanation.  Assuming that I am interpreting you
correctly, I can't agree with your conclusions though.

I want to make a point that avoiding precondition violations by
removing preconditions is not the solution.  When you design an API,
it frequently has some constraints on the arguments or on the
execution environment, which, when violated, prevent the API from
performing the operation correctly.  As an API designer, you document
these constraints, but for the implementation you have two choices:

1.  You rigorously check the constraints, and return back some
indicator of the failure back to the caller (a nil return, a thrown
error etc.)  This decision makes the checks guaranteed, permanent,
documented part of your API behavior.  Users can now rely on these
checks for their normal logic.  In this case, the constraints are not
preconditions that the user is required to satisfy.

2.  You perform the operation anyway, relying on the caller to satisfy
the constraints.  The constraints become preconditions.  The
implementation can still check some of the constraints for either QoI
reasons or memory safety reasons.  Apart from preventing memory safety
violations, the implementation is not required to check the
constraints.

Now the question becomes, given a certain API design problem, how do
you make a choice between (1) and (2)?  In some cases, the answer is
clear:

- You can be forced to make your constraints into checks in API
behavior.  Example: a network i/o operation can fail.  This is a
completely normal thing that networks do sometimes, and the caller of
networking code should be expected to deal with networking failures,
or such code can't work on a real network.

- Some preconditions just can't be checked at all, or can't be checked
in time that is acceptable for the intended API purpose.  A trivial
example is UnsafePointer.pointee (we can't know whether the pointer
actually points to an instance of a correct type).  Another example is
sort predicates -- we can't check that the order defined by the
predicate is indeed a strict weak ordering for all values of the type.
We can check that the predicate defines a strict weak order for all
values in a particular dataset, but that requires O(n^2) time while
sort() complexity is O(n log n).

What about those cases when both (1) and (2) are implementable?  There
exists a trade-off between making the API "resilient" to programming
mistakes and hiding bugs by not stopping the application when an issue
happens.  We are very conservative in Swift about hiding errors.

If your APIs document that they check preconditions and trap if they
are not satisfied, then developers can rely on it, and you are not
allowed to weaken remove the checks in future.  If the app can recover
from a precondition failure, the recovery path just becomes a part of
the API behavior.  You can end up with a system where every API can
signal failure back to the calling code.  Now users have to handle
these failures.  There are two ways to handle them:

- force-unwrap, or 'try!'.

- follow the advice of people who advocate never using force-unwrap
and 'try!'.  The code that does so will likely have a lot of untested,
untestable, and dead branches that try to deal defensively with
situations that can't happen, but the API forces them to do so anyway.

I don't think either option leads to good code.  Thus, it is only
reasonable that we design the API based on a purpose that we intend
for this API.

Let's talk about how this applies to Collection.

For example, Collection APIs in question that work with indices are
primitive APIs that the rest of Collection APIs build upon.  One of
these APIs, basically the reason why indices exist, is
Collection.subscript(Index).  Today the behavior is unspecified if the
index is not valid.  If we follow the principle that you outlined:

> not accidentally using an index which would violate the preconditions of the methods or properties you are planning to use it with.

Collection.subscript(Index) should return an optional.  Does this
match your expectations?  If so, how do you imagine even trivial
algorithms written?

for i in data.indices {
  data[i]! = data[i]! * 2   // assume that 'data[i]!' can be made settable.
}

This code has two force-unwraps in it, despite being completely safe.
If you follow the "never force-unwrap" school of thought, then you end
up with two branches, and an error return from this function, that are
dead, untestable code.

If every API can fail, then you can't write useful code.  You need to
have some fundamental basis that you can rely on, and trust to operate
correctly.

The primitive Collection APIs are the basis of the Collection API.  If
they would be burdened with required range checks, then, just as you
are saying, the code would be full of redundant range checks.  For
example, UnsafeBufferPointer would also be burdened with them.

> I think it's too easy to accidentally overrun the permitted range of indices, and the API should help you avoid doing that.

I wholeheartedly agree that APIs should help you to avoid making
mistakes.  There are multiple ways in which they can do so:

1.  (best) Make wrong code not compile.  Swift has a strong type
system that allows us to express complex constraints between types,
preventing some mistakes from being compiled.

2.  Check preconditions where it is possible (implementable), and when
the performance hit is reasonable, and deterministically trap as soon
as the violation was detected.

Unfortunately, validity of indices is a complex dynamic property of
the index-collection instance pair.  We can't encode it in the static
type system.  Consider the following example:

---
var a = [1, 2, 3]
let i: Int = a.check(a.startIndex)
a.removeAll()
a[i] // trap
---

Ok, mutation can invalidate indices.  Can immutability help?

---
let a = [1, 2, 3]
let b: [Int] = []
let i = a.check(a.startIndex)
b[i] // trap
---

Since it is not possible to encode the validity relationship in the
type system, we want to encourage these operations to trap if they
detect violations.

Now let's go back to the demangling algorithm.  We know that we have
weaknesses in the String parsing, and it is important for us to know
how well we are covering these use cases, and how we need to improve.
I think this is a great problem.

I implemented it using three different approaches:
https://gist.github.com/gribozavr/ed95f71b762d25bee2991dd9c0191b34

A high-level point that I'd like to make is that when using
collections, you need to make use of algorithms to the biggest extent
possible.  This allows to express the semantics of your operation in
your domain using a high-level vocabulary.

I would like to draw attention to the following part:

// Approach #3: change Collection.index(_:stepsFrom:limitedBy:) to return an
// optional index.
//
// This method has to perform the range check to stop advancing the index when
// it reaches the limit. Currently it just discards the information about
// whether it reached the limit or not. Instead, it can cheaply return it to
// the caller.
//
// Note that the same logic does not apply to other
// Collection.index(_:stepsFrom:) overloads.

We will change the index(_:stepsFrom:limitedBy:) overload to return an
optional, and we will see what other implications it has, and how it
fits into the rest of the system.

Thanks again, Brent.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/