[swift-evolution] [Review] SE-0007 Remove C-style for-loops with conditions and incrementers

Thu Dec 10 23:12:13 CST 2015

Hold the presses.

David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += 1
    }

…runs much faster if you actually use both i and j inside the loop:

    var sum = 0
    for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
        if i % 2 == 0 { continue }
        sum += i-j
    }

Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.

With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P

Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:

    https://gist.github.com/pcantrell/6bbe80e630d227ed0262 <https://gist.github.com/pcantrell/6bbe80e630d227ed0262>

Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).

David and Thorsten, you might want to reconsider your reviews?

Results:

—————— Paul’s comparison ——————

zip+stride

  Iter 0: 0.519110977649689
  Iter 1: 0.503385007381439
  Iter 2: 0.503321051597595
  Iter 3: 0.485216021537781
  Iter 4: 0.524757027626038
  Iter 5: 0.478078007698059
  Iter 6: 0.503880977630615
  Iter 7: 0.498068988323212
  Iter 8: 0.485781013965607
         ——————————————
  Median: 0.524757027626038

C-style

  Iter 0: 0.85480797290802
  Iter 1: 0.879491031169891
  Iter 2: 0.851797997951508
  Iter 3: 0.836017966270447
  Iter 4: 0.863684952259064
  Iter 5: 0.837742984294891
  Iter 6: 0.839070022106171
  Iter 7: 0.849772989749908
  Iter 8: 0.819278955459595
         ——————————————
  Median: 0.863684952259064

Zip+stride takes 0.607579217692143x the time of C-style for

—————— David’s comparison ——————

zip+stride

  Iter 0: 1.15285503864288
  Iter 1: 1.1244450211525
  Iter 2: 1.24192994832993
  Iter 3: 1.02782195806503
  Iter 4: 1.13640999794006
  Iter 5: 1.15879601240158
  Iter 6: 1.12114900350571
  Iter 7: 1.21364599466324
  Iter 8: 1.10698300600052
         ——————————————
  Median: 1.13640999794006

C-style

  Iter 0: 0.375869989395142
  Iter 1: 0.371365010738373
  Iter 2: 0.356527984142303
  Iter 3: 0.384984970092773
  Iter 4: 0.367590010166168
  Iter 5: 0.365644037723541
  Iter 6: 0.384257972240448
  Iter 7: 0.379297018051147
  Iter 8: 0.363133013248444
         ——————————————
  Median: 0.367590010166168

Zip+stride takes 3.09151491202482x the time of C-style for

—————— David’s comparison, actually using indices in the loop ——————

zip+stride

  Iter 0: 0.328687965869904
  Iter 1: 0.332105994224548
  Iter 2: 0.336817979812622
  Iter 3: 0.321089029312134
  Iter 4: 0.338591992855072
  Iter 5: 0.348567008972168
  Iter 6: 0.34687602519989
  Iter 7: 0.34755402803421
  Iter 8: 0.341500997543335
         ——————————————
  Median: 0.338591992855072

C-style

  Iter 0: 0.422354996204376
  Iter 1: 0.427953958511353
  Iter 2: 0.403640985488892
  Iter 3: 0.415378987789154
  Iter 4: 0.403639018535614
  Iter 5: 0.416707038879395
  Iter 6: 0.415345013141632
  Iter 7: 0.417587995529175
  Iter 8: 0.415713012218475
         ——————————————
  Median: 0.403639018535614

Zip+stride takes 0.838848518865867x the time of C-style for

Program ended with exit code: 0

Cheers,

Paul

> On Dec 10, 2015, at 5:36 PM, David Owens II <david at owensd.io> wrote:
> 
> Here’s my basic test case:
> 
> let first = 10000000
> let second = 20000000
> 
> class LoopPerfTests: XCTestCase {
>     
>     func testZipStride() {
>         self.measureBlock {
>             var sum = 0
>             for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
>                 if i % 2 == 0 { continue }
>                 sum += 1
>             }
>             print(sum)
>         }
>     }
>     
>     func testCStyleFor() {
>         self.measureBlock {
>             var sum = 0
>             for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
>                 if i % 2 == 0 { continue }
>                 sum += 1
>             }
>             print(sum)
>         }
> 
>     }
> 
> }
> 
> Non-optimized timings:
> testCStyleFor - 0.126s
> testZipStride - 2.189s
> 
> Optimized timings:
> testCStyleFor - 0.008s
> testZipStride - 0.015s
> 
> That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.
> 
> Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.
> 
> -David
> 
>> On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell at pobox.com <mailto:cantrell at pobox.com>> wrote:
>> 
>>> Is there any guarantee that these two loops have the exact same runtime performance?
>>> 
>>> for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
>>>    if i % 2 == 0 { continue }
>>>    print(i, j)
>>> }
>>> 
>>> for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
>>>    if i % 2 == 0 { continue }
>>>    print(i, j)
>>> }
>> 
>> 
>> In a quick and dirty test, the second is approximately 34% slower.
>> 
>> I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.
>> 
>> P
>> 
>> 
>> 
>>> On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> 
>>> 
>>>> On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>> 
>>>> Yes, performance is one thing neglected by the discussions and the proposal.
>>> 
>>> This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.
>>> 
>>> Is there any guarantee that these two loops have the exact same runtime performance?
>>> 
>>> for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
>>>    if i % 2 == 0 { continue }
>>>    print(i, j)
>>> }
>>> 
>>> for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
>>>    if i % 2 == 0 { continue }
>>>    print(i, j)
>>> }
>>> 
>>> 
>>> And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.
>>> 
>>> These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.
>>> 
>>> -David
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20151210/ccfa190f/attachment.html>