[swift-evolution] [Draft]: Introducing a striding(by:) method on 3.0 ranges

Sun Apr 10 13:36:33 CDT 2016

Le 10 avr. 2016 à 13:06, Dave Abrahams via swift-evolution <swift-evolution at swift.org> a écrit :
> 
> on Sun Apr 10 2016, Michel Fortin <swift-evolution at swift.org> wrote:
> 
>> Le 10 avr. 2016 à 6:17, Brent Royal-Gordon via swift-evolution <swift-evolution at swift.org> a écrit :
>> 
>>> Remember, conditional branches are relatively slow, and we want to
>> avoid them where we can. If this is, for instance, the test of a loop,
>> the extra branch is not a good thing.
>> 
>> Perhaps it's a bit extreme, but my idea for stride is that it should
>> only have one branch in the loop condition and absolutely no other
>> branch. 
> 
> I agree that would be ideal.  When the stride is statically known, I am
> confident that the optimizer can eliminate the branch.  When the stride
> is not statically known, presumably you need the branch anyway, though
> one might have to hope for the optimizer to copy and split the loop
> (i.e. create two specializations based on sign).
> 
>> The primitive stride could be defined like this:
>> 
>> 	stride(from: 0, compare: <, to: 10, by: 2)
> 
> Now you're adding a function call in lieu of a branch.  We could easily
> build stride so it stores a comparison function based on the sign of the
> stride, if we thought this would be better.  Would it?

In the best case scenario the optimiser inlines everything and you have a straightforward loop with no indirect jump through a function pointer. We're already counting on most of the `stride` mechanics being inlined anyway, so I don't think it's a stretch to assume inlining will work correctly for this. To improve likelihood of inlining, don't make that function depend on a runtime value such as the sign of the stride.

But there's no guaranty with the optimizer. In the worse case we store a function pointer on the stack and do an indirect jump to that function at every loop iteration to do the comparison. I think that's better than adding a branch at every iteration, but that could depend on the CPU cache and branch prediction and various other factors (code locality, etc.). I won't try to predict if the worse case is worse or better than with what we have now since I'm not even sure how to benchmark this.

But let's go back to the current implementation for a minute.

`stride` should work with any integer type. That should include custom ones such as an implementation of BigInt. If for some reason the optimizer cannot inline the `stride < 0` comparison in the `next()` function (where stride is of the generic type Element), it will not be able to deduce that the result is always the same (there's no `pure` annotation in the language). So it will pessimistically evaluate `stride < 0` at every iteration. In itself that comparison could be costly if we're talking about BigInt. And to that you add the cost of branching on the result of that comparison.

So if you take this into account, storing the comparator as part of the stride makes the cost more predictable: not only there is one branch less in `next()`, but you avoid evaluating the condition which has an unknown cost: the `stride < 0` part.

And ideally you should make sure the optimizer can replace the indirect call with a direct call to the comparator. You do that by not making the comparator choice dependent on a runtime value, hence why I suggested having distinct "down" variants for the convenience functions: `stride(from:downTo:by:)` and `stride(from:downThrough:by:)` and `Range.strindingDown(by:)`.

-- 
Michel Fortin
https://michelf.ca