[swift-evolution] [Review] SE-0007 Remove C-style for-loops with conditions and incrementers

Stephen Canon scanon at apple.com
Fri Dec 11 10:45:28 CST 2015


If Swift is to be a systems language someday, then we also need to be able to write (parts of) Accelerate in Swift.

– Steve

> On Dec 11, 2015, at 11:27 AM, Erica Sadun via swift-evolution <swift-evolution at swift.org> wrote:
> 
> For many of these number-crunching performance-stretching scenarios, many I suggest once again, that if you're doing serious number crunching that Accelerate or similar approaches is to be preferred? As for c-style-for vs while, the two are mechanically convertible.
> 
> <Screen Shot 2015-12-08 at 3.54.37 PM.png>
> 
> Where heavy performance is not required, for-in is more readable, maintainable, and optimizable to a sufficient extent that I do not see it as a bar to conversion.
> 
> -- E
> 
> 
>> On Dec 11, 2015, at 8:44 AM, Paul Cantrell via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> 
>> Your revised results are now right in line with what I get in my test harness, so that’s reassuring!
>> 
>> I’d quibble with this:
>> 
>>> The optimized builds are still slower than the for-in “equivalent” functionality.
>> 
>> That’s not an accurate summary. Depending on precisely what’s in the loop, the for-in flavor is clocking in anywhere from 80% slower to 20% faster.
>> 
>> None of this performance testing undercuts your entirely valid concerns about syntax. We have, I think, widespread agreement on the list that the C-style for is very rarely used in most Swift code in the wild — but if your usage patterns are unusual and you use it a lot, I can see why you’d be reluctant to part with it!
>> 
>> It’s a question, then, of whether it’s worth having a leaner language at the expense of making some less-common code more verbose when optimized. I’m not sure that any of the C-style audits people have done on the list have been games. Are there other game developers on the list using Swift who could do the audit on their code?
>> 
>> Cheers,
>> 
>> Paul
>> 
>> 
>>> On Dec 11, 2015, at 2:33 AM, David Owens II <david at owensd.io <mailto:david at owensd.io>> wrote:
>>> 
>>> I don’t know what you did, your gist 404s. 
>>> 
>>> Here’s an update with the while-loop: https://gist.github.com/owensd/b438bea90075907fa6ec <https://gist.github.com/owensd/b438bea90075907fa6ec> and using both i and j within the loops. This is a simple OS X framework project with unit tests.
>>> 
>>> Debug Build:
>>> testZipStride - 2.496s
>>> testCStyleFor - 0.210s
>>> testWhileLoop - 0.220s
>>> 
>>> Release Build:
>>> testZipStride - 0.029s
>>> testCStyleFor - 0.018s
>>> testWhileLoop - 0.019s
>>> 
>>> I ran these tests from my MacBook Pro, the previous tests were from my iMac.
>>> 
>>> When you use the sum += (i - j) construct, I think all you are ending up with a hot-path that the optimizer can end up optimizing better (my guess is that the i-j turns into a constant expression - after all, the difference is always 1, but I don’t know enough about the SIL representation to confirm that). If you use a code path where that expression is not constant time (again, assuming my suspicion is correct), the zip+stride is against slower.
>>> 
>>> I would argue the following:
>>> 
>>> The code is not objectively easier to read or understand with the zip+stride construct (arguably, they are not even semantically equivalent).
>>> The debug builds are prohibitively slower, especially in the context of high-performance requirement code (I’m doing a lot of prototyping Swift in the context of games, so yes, performance matters considerably).
>>> The optimized builds are still slower than the for-in “equivalent" functionality.
>>> The optimizer is inconsistent, like all optimizers are (this is a simple truth, not a value judgement - optimizers are not magic, they are code that is run like any other code and can only do as well as they are coded under the conditions they are coded against), at actually producing similar results with code that ends up with slightly different shapes.
>>> There is not functionally equivalent version of the code that I can write that is not more verbose, while requiring artificial scoping constructs, to achieve the same behavior.
>>> 
>>> So no, there is no evidence that I’ve seen to reconsider my opinion that this proposal should not be implemented. If there is evidence to show that my findings are incorrect or a poor summary of the general problem I am seeing, then of course I would reconsider my opinion.
>>> 
>>> -David
>>> 
>>>> On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell at pobox.com <mailto:cantrell at pobox.com>> wrote:
>>>> 
>>>> Hold the presses.
>>>> 
>>>> David, I found the radical differences in our results troubling, so I did some digging. It turns out that the zip+stride code:
>>>> 
>>>>     var sum = 0
>>>>     for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
>>>>         if i % 2 == 0 { continue }
>>>>         sum += 1
>>>>     }
>>>> 
>>>> …runs much faster if you actually use both i and j inside the loop:
>>>> 
>>>>     var sum = 0
>>>>     for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
>>>>         if i % 2 == 0 { continue }
>>>>         sum += i-j
>>>>     }
>>>> 
>>>> Weird, right? This is with optimization on (default “production” build). It smells like a compiler quirk.
>>>> 
>>>> With that tweak, the zip+stride approach actually clocks in faster than the C-style for. Yes, you read that right: faster. Also smells like a quirk. Am I doing something fantastically stupid in my code? Or maybe it’s just my idiosyncratic taste in indentation? :P
>>>> 
>>>> Here’s my test case, which was a command-line app with manual timing, followed by David’s dropped into the same harness, followed by David’s but with sum += i-j instead of sum += 1:
>>>> 
>>>>     https://gist.github.com/pcantrell/6bbe80e630d227ed0262 <https://gist.github.com/pcantrell/6bbe80e630d227ed0262>
>>>> 
>>>> Point is: no big performance difference here; even a performance advantage (that is probably a compiler artifact).
>>>> 
>>>> David and Thorsten, you might want to reconsider your reviews?
>>>> 
>>>> Results:
>>>> 
>>>> —————— Paul’s comparison ——————
>>>> 
>>>> zip+stride
>>>> 
>>>>   Iter 0: 0.519110977649689
>>>>   Iter 1: 0.503385007381439
>>>>   Iter 2: 0.503321051597595
>>>>   Iter 3: 0.485216021537781
>>>>   Iter 4: 0.524757027626038
>>>>   Iter 5: 0.478078007698059
>>>>   Iter 6: 0.503880977630615
>>>>   Iter 7: 0.498068988323212
>>>>   Iter 8: 0.485781013965607
>>>>          ——————————————
>>>>   Median: 0.524757027626038
>>>> 
>>>> C-style
>>>> 
>>>>   Iter 0: 0.85480797290802
>>>>   Iter 1: 0.879491031169891
>>>>   Iter 2: 0.851797997951508
>>>>   Iter 3: 0.836017966270447
>>>>   Iter 4: 0.863684952259064
>>>>   Iter 5: 0.837742984294891
>>>>   Iter 6: 0.839070022106171
>>>>   Iter 7: 0.849772989749908
>>>>   Iter 8: 0.819278955459595
>>>>          ——————————————
>>>>   Median: 0.863684952259064
>>>> 
>>>> Zip+stride takes 0.607579217692143x the time of C-style for
>>>> 
>>>> —————— David’s comparison ——————
>>>> 
>>>> zip+stride
>>>> 
>>>>   Iter 0: 1.15285503864288
>>>>   Iter 1: 1.1244450211525
>>>>   Iter 2: 1.24192994832993
>>>>   Iter 3: 1.02782195806503
>>>>   Iter 4: 1.13640999794006
>>>>   Iter 5: 1.15879601240158
>>>>   Iter 6: 1.12114900350571
>>>>   Iter 7: 1.21364599466324
>>>>   Iter 8: 1.10698300600052
>>>>          ——————————————
>>>>   Median: 1.13640999794006
>>>> 
>>>> C-style
>>>> 
>>>>   Iter 0: 0.375869989395142
>>>>   Iter 1: 0.371365010738373
>>>>   Iter 2: 0.356527984142303
>>>>   Iter 3: 0.384984970092773
>>>>   Iter 4: 0.367590010166168
>>>>   Iter 5: 0.365644037723541
>>>>   Iter 6: 0.384257972240448
>>>>   Iter 7: 0.379297018051147
>>>>   Iter 8: 0.363133013248444
>>>>          ——————————————
>>>>   Median: 0.367590010166168
>>>> 
>>>> Zip+stride takes 3.09151491202482x the time of C-style for
>>>> 
>>>> —————— David’s comparison, actually using indices in the loop ——————
>>>> 
>>>> zip+stride
>>>> 
>>>>   Iter 0: 0.328687965869904
>>>>   Iter 1: 0.332105994224548
>>>>   Iter 2: 0.336817979812622
>>>>   Iter 3: 0.321089029312134
>>>>   Iter 4: 0.338591992855072
>>>>   Iter 5: 0.348567008972168
>>>>   Iter 6: 0.34687602519989
>>>>   Iter 7: 0.34755402803421
>>>>   Iter 8: 0.341500997543335
>>>>          ——————————————
>>>>   Median: 0.338591992855072
>>>> 
>>>> C-style
>>>> 
>>>>   Iter 0: 0.422354996204376
>>>>   Iter 1: 0.427953958511353
>>>>   Iter 2: 0.403640985488892
>>>>   Iter 3: 0.415378987789154
>>>>   Iter 4: 0.403639018535614
>>>>   Iter 5: 0.416707038879395
>>>>   Iter 6: 0.415345013141632
>>>>   Iter 7: 0.417587995529175
>>>>   Iter 8: 0.415713012218475
>>>>          ——————————————
>>>>   Median: 0.403639018535614
>>>> 
>>>> Zip+stride takes 0.838848518865867x the time of C-style for
>>>> 
>>>> Program ended with exit code: 0
>>>> 
>>>> Cheers,
>>>> 
>>>> Paul
>>>> 
>>>>> On Dec 10, 2015, at 5:36 PM, David Owens II <david at owensd.io <mailto:david at owensd.io>> wrote:
>>>>> 
>>>>> Here’s my basic test case:
>>>>> 
>>>>> let first = 10000000
>>>>> let second = 20000000
>>>>> 
>>>>> class LoopPerfTests: XCTestCase {
>>>>>     
>>>>>     func testZipStride() {
>>>>>         self.measureBlock {
>>>>>             var sum = 0
>>>>>             for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0, by: -2)) {
>>>>>                 if i % 2 == 0 { continue }
>>>>>                 sum += 1
>>>>>             }
>>>>>             print(sum)
>>>>>         }
>>>>>     }
>>>>>     
>>>>>     func testCStyleFor() {
>>>>>         self.measureBlock {
>>>>>             var sum = 0
>>>>>             for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
>>>>>                 if i % 2 == 0 { continue }
>>>>>                 sum += 1
>>>>>             }
>>>>>             print(sum)
>>>>>         }
>>>>> 
>>>>>     }
>>>>> 
>>>>> }
>>>>> 
>>>>> Non-optimized timings:
>>>>> testCStyleFor - 0.126s
>>>>> testZipStride - 2.189s
>>>>> 
>>>>> Optimized timings:
>>>>> testCStyleFor - 0.008s
>>>>> testZipStride - 0.015s
>>>>> 
>>>>> That’s a lot worse than 34%; even in optimized builds, that’s 2x slower and in debug builds, that’s 17x slower. I think it’s unreasonable to force people to write a more verbose while-loop construct to simply get the performance they need.
>>>>> 
>>>>> Also, the readability argument is very subjective; for example, I don’t find the zip version more readability. In fact, I think it obscures what the logic of the loop is doing. But again, that’s subjective.
>>>>> 
>>>>> -David
>>>>> 
>>>>>> On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell at pobox.com <mailto:cantrell at pobox.com>> wrote:
>>>>>> 
>>>>>>> Is there any guarantee that these two loops have the exact same runtime performance?
>>>>>>> 
>>>>>>> for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
>>>>>>>    if i % 2 == 0 { continue }
>>>>>>>    print(i, j)
>>>>>>> }
>>>>>>> 
>>>>>>> for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
>>>>>>>    if i % 2 == 0 { continue }
>>>>>>>    print(i, j)
>>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> In a quick and dirty test, the second is approximately 34% slower.
>>>>>> 
>>>>>> I’d say that’s more than acceptable for the readability gain. If you’re in that rare stretch of critical code where the extra 34% actually matters, write it using a while loop instead.
>>>>>> 
>>>>>> P
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>>>>> 
>>>>>>>> Yes, performance is one thing neglected by the discussions and the proposal.
>>>>>>> 
>>>>>>> This is my primary objection to to this proposal; it assumes (or neglects?) that all of the types used can magically be inlined to nothing but the imperative code. This isn’t magical, someone has to implement the optimizations to do this.
>>>>>>> 
>>>>>>> Is there any guarantee that these two loops have the exact same runtime performance?
>>>>>>> 
>>>>>>> for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
>>>>>>>    if i % 2 == 0 { continue }
>>>>>>>    print(i, j)
>>>>>>> }
>>>>>>> 
>>>>>>> for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
>>>>>>>    if i % 2 == 0 { continue }
>>>>>>>    print(i, j)
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> And can you guarantee that performance is relatively the same across debug and release builds? Because historically, Swift has suffered greatly in this regard with respects to the performance of optimized versus non-optimized builds.
>>>>>>> 
>>>>>>> These types of optimizer issues are real-world things I’ve had to deal with (and have written up many blog posts about). I get the desire to simplify the constructs, but we need an escape hatch to write performant code when the optimizer isn’t up to the job.
>>>>>>> 
>>>>>>> -David
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> swift-evolution mailing list
>>>>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>> 
>>> 
>> 
>>  _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>  _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20151211/9271536e/attachment-0001.html>


More information about the swift-evolution mailing list