[swift-evolution] [Review] SE-0007 Remove C-style for-loops with conditions and incrementers
Wallacy
wallacyf at gmail.com
Fri Dec 11 11:11:43 CST 2015
FWIW:
Performance differences in loops exist in any language. And the vast
majority of cases it is only a matter of compiler optimization difference:
Using this example:
http://stackoverflow.com/a/2959105
"For instance, the following code shows that the "do-while" is a bit
faster. This because the "jmp" istruction is not used by the "do-while"
loop."
int main(int argc, char* argv[]) { int i; char x[100];
//"FOR" LOOP:for(i=0; i<100; i++ ) { x[i] = 0; }
//"WHILE" LOOP:
i=0; while(i<100 ) { x[i++] = 0; }
//"DO-WHILE" LOOP:
i=0; do { x[i++] = 0; } while(i<100);
return 0; }
*// "FOR" LOOP:*
010013C8 mov dword ptr [ebp-0Ch],0
010013CF jmp wmain+3Ah (10013DAh)
for(i=0; i<100; i++ )
{ x[i] = 0;
010013D1 mov eax,dword ptr [ebp-0Ch] <<< UPDATE i
010013D4 add eax,1
010013D7 mov dword ptr [ebp-0Ch],eax
010013DA cmp dword ptr [ebp-0Ch],64h <<< TEST
010013DE jge wmain+4Ah (10013EAh) <<< COND JUMP
010013E0 mov eax,dword ptr [ebp-0Ch] <<< DO THE JOB..
010013E3 mov byte ptr [ebp+eax-78h],0
010013E8 jmp wmain+31h (10013D1h) <<< UNCOND JUMP
}
*// "WHILE" LOOP:*
i=0;
010013EA mov dword ptr [ebp-0Ch],0
while(i<100 )
{ x[i++] = 0;
010013F1 cmp dword ptr [ebp-0Ch],64h <<< TEST
010013F5 jge wmain+6Ah (100140Ah) <<< COND JUMP
010013F7 mov eax,dword ptr [ebp-0Ch] <<< DO THE JOB..
010013FA mov byte ptr [ebp+eax-78h],0
010013FF mov ecx,dword ptr [ebp-0Ch] <<< UPDATE i
01001402 add ecx,1
01001405 mov dword ptr [ebp-0Ch],ecx
01001408 jmp wmain+51h (10013F1h) <<< UNCOND JUMP
}
*// "DO-WHILE" LOOP:*
i=0; . 0100140A mov dword ptr [ebp-0Ch],0
do
{ x[i++] = 0;
01001411 mov eax,dword ptr [ebp-0Ch] <<< DO THE JOB..
01001414 mov byte ptr [ebp+eax-78h],0
01001419 mov ecx,dword ptr [ebp-0Ch] <<< UPDATE i
0100141C add ecx,1
0100141F mov dword ptr [ebp-0Ch],ecx
01001422 cmp dword ptr [ebp-0Ch],64h <<< TEST
01001426 jl wmain+71h (1001411h) <<< COND JUMP
}
while(i<100);
Em sex, 11 de dez de 2015 às 06:34, David Owens II via swift-evolution <
swift-evolution at swift.org> escreveu:
> I don’t know what you did, your gist 404s.
>
> Here’s an update with the while-loop:
> https://gist.github.com/owensd/b438bea90075907fa6ec and using both i and
> j within the loops. This is a simple OS X framework project with unit tests.
>
> *Debug Build:*
>
> - testZipStride - 2.496s
> - testCStyleFor - 0.210s
> - testWhileLoop - 0.220s
>
>
> *Release Build:*
>
> - testZipStride - 0.029s
> - testCStyleFor - 0.018s
> - testWhileLoop - 0.019s
>
>
> I ran these tests from my MacBook Pro, the previous tests were from my
> iMac.
>
> When you use the sum += (i - j) construct, I think all you are ending up
> with a hot-path that the optimizer can end up optimizing better (my guess
> is that the i-j turns into a constant expression - after all, the
> difference is always 1, but I don’t know enough about the SIL
> representation to confirm that). If you use a code path where that
> expression is not constant time (again, assuming my suspicion is correct),
> the zip+stride is against slower.
>
> I would argue the following:
>
>
> 1. The code is not objectively easier to read or understand with the
> zip+stride construct (arguably, they are not even semantically equivalent).
> 2. The debug builds are prohibitively slower, especially in the
> context of high-performance requirement code (I’m doing a lot of
> prototyping Swift in the context of games, so yes, performance matters
> considerably).
> 3. The optimized builds are still slower than the for-in “equivalent"
> functionality.
> 4. The optimizer is inconsistent, like all optimizers are (this is a
> simple truth, not a value judgement - optimizers are not magic, they are
> code that is run like any other code and can only do as well as they are
> coded under the conditions they are coded against), at actually producing
> similar results with code that ends up with slightly different shapes.
> 5. There is not functionally equivalent version of the code that I can
> write that is not more verbose, while requiring artificial scoping
> constructs, to achieve the same behavior.
>
>
> So no, there is no evidence that I’ve seen to reconsider my opinion that
> this proposal should not be implemented. If there is evidence to show that
> my findings are incorrect or a poor summary of the general problem I am
> seeing, then of course I would reconsider my opinion.
>
> -David
>
> On Dec 10, 2015, at 9:12 PM, Paul Cantrell <cantrell at pobox.com> wrote:
>
> Hold the presses.
>
> David, I found the radical differences in our results troubling, so I did
> some digging. It turns out that the zip+stride code:
>
> var sum = 0
> for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0,
> by: -2)) {
> if i % 2 == 0 { continue }
> sum += 1
> }
>
> …runs *much* faster if you actually use both i and j inside the loop:
>
> var sum = 0
> for (i, j) in zip(first.stride(to: 0, by: -1), second.stride(to: 0,
> by: -2)) {
> if i % 2 == 0 { continue }
> sum += *i-j*
> }
>
> Weird, right? This is with optimization on (default “production” build).
> It smells like a compiler quirk.
>
> With that tweak, the zip+stride approach actually clocks in faster than
> the C-style for. Yes, you read that right: *faster*. Also smells like a
> quirk. Am I doing something fantastically stupid in my code? Or maybe it’s
> just my idiosyncratic taste in indentation? :P
>
> Here’s my test case, which was a command-line app with manual timing,
> followed by David’s dropped into the same harness, followed by David’s but
> with sum += i-j instead of sum += 1:
>
> https://gist.github.com/pcantrell/6bbe80e630d227ed0262
>
> Point is: *no big performance difference here; even a performance
> advantage* (that is probably a compiler artifact).
>
> David and Thorsten, you might want to reconsider your reviews?
>
> Results:
>
> —————— Paul’s comparison ——————
>
> zip+stride
>
> Iter 0: 0.519110977649689
> Iter 1: 0.503385007381439
> Iter 2: 0.503321051597595
> Iter 3: 0.485216021537781
> Iter 4: 0.524757027626038
> Iter 5: 0.478078007698059
> Iter 6: 0.503880977630615
> Iter 7: 0.498068988323212
> Iter 8: 0.485781013965607
> ——————————————
> Median: 0.524757027626038
>
> C-style
>
> Iter 0: 0.85480797290802
> Iter 1: 0.879491031169891
> Iter 2: 0.851797997951508
> Iter 3: 0.836017966270447
> Iter 4: 0.863684952259064
> Iter 5: 0.837742984294891
> Iter 6: 0.839070022106171
> Iter 7: 0.849772989749908
> Iter 8: 0.819278955459595
> ——————————————
> Median: 0.863684952259064
>
> Zip+stride takes 0.607579217692143x the time of C-style for
>
> —————— David’s comparison ——————
>
> zip+stride
>
> Iter 0: 1.15285503864288
> Iter 1: 1.1244450211525
> Iter 2: 1.24192994832993
> Iter 3: 1.02782195806503
> Iter 4: 1.13640999794006
> Iter 5: 1.15879601240158
> Iter 6: 1.12114900350571
> Iter 7: 1.21364599466324
> Iter 8: 1.10698300600052
> ——————————————
> Median: 1.13640999794006
>
> C-style
>
> Iter 0: 0.375869989395142
> Iter 1: 0.371365010738373
> Iter 2: 0.356527984142303
> Iter 3: 0.384984970092773
> Iter 4: 0.367590010166168
> Iter 5: 0.365644037723541
> Iter 6: 0.384257972240448
> Iter 7: 0.379297018051147
> Iter 8: 0.363133013248444
> ——————————————
> Median: 0.367590010166168
>
> Zip+stride takes 3.09151491202482x the time of C-style for
>
> —————— David’s comparison, actually using indices in the loop ——————
>
> zip+stride
>
> Iter 0: 0.328687965869904
> Iter 1: 0.332105994224548
> Iter 2: 0.336817979812622
> Iter 3: 0.321089029312134
> Iter 4: 0.338591992855072
> Iter 5: 0.348567008972168
> Iter 6: 0.34687602519989
> Iter 7: 0.34755402803421
> Iter 8: 0.341500997543335
> ——————————————
> Median: 0.338591992855072
>
> C-style
>
> Iter 0: 0.422354996204376
> Iter 1: 0.427953958511353
> Iter 2: 0.403640985488892
> Iter 3: 0.415378987789154
> Iter 4: 0.403639018535614
> Iter 5: 0.416707038879395
> Iter 6: 0.415345013141632
> Iter 7: 0.417587995529175
> Iter 8: 0.415713012218475
> ——————————————
> Median: 0.403639018535614
>
> Zip+stride takes 0.838848518865867x the time of C-style for
>
> Program ended with exit code: 0
>
> Cheers,
>
> Paul
>
> On Dec 10, 2015, at 5:36 PM, David Owens II <david at owensd.io> wrote:
>
> Here’s my basic test case:
>
> let first = 10000000
> let second = 20000000
>
> class LoopPerfTests: XCTestCase {
>
> func testZipStride() {
> self.measureBlock {
> var sum = 0
> for (i, j) in zip(first.stride(to: 0, by:
> -1), second.stride(to: 0, by: -2)) {
> if i % 2 == 0 { continue }
> sum += 1
> }
> print(sum)
> }
> }
>
> func testCStyleFor() {
> self.measureBlock {
> var sum = 0
> for var i = first, j = second; i > 0 && j > 0; i -= 1, j -= 2 {
> if i % 2 == 0 { continue }
> sum += 1
> }
> print(sum)
> }
>
> }
>
> }
>
>
> Non-optimized timings:
>
> - testCStyleFor - 0.126s
> - testZipStride - 2.189s
>
>
> Optimized timings:
>
> - testCStyleFor - 0.008s
> - testZipStride - 0.015s
>
>
> That’s a lot worse than 34%; even in optimized builds, that’s 2x slower
> and in debug builds, that’s 17x slower. I think it’s unreasonable to force
> people to write a more verbose while-loop construct to simply get the
> performance they need.
>
> Also, the readability argument is very subjective; for example, I don’t
> find the zip version more readability. In fact, I think it obscures what
> the logic of the loop is doing. But again, that’s subjective.
>
> -David
>
> On Dec 10, 2015, at 2:41 PM, Paul Cantrell <cantrell at pobox.com> wrote:
>
> Is there any guarantee that these two loops have the exact same runtime
> performance?
>
> for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
> if i % 2 == 0 { continue }
> print(i, j)
> }
>
> for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
> if i % 2 == 0 { continue }
> print(i, j)
> }
>
>
> In a quick and dirty test, the second is approximately 34% slower.
>
> I’d say that’s more than acceptable for the readability gain. If you’re in
> that rare stretch of critical code where the extra 34% actually matters,
> write it using a while loop instead.
>
> P
>
>
>
> On Dec 10, 2015, at 4:07 PM, David Owens II via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>
> On Dec 10, 2015, at 1:57 PM, thorsten--- via swift-evolution <
> swift-evolution at swift.org> wrote:
>
> Yes, performance is one thing neglected by the discussions and the
> proposal.
>
>
> This is my primary objection to to this proposal; it assumes (or
> neglects?) that all of the types used can magically be inlined to nothing
> but the imperative code. This isn’t magical, someone has to implement the
> optimizations to do this.
>
> Is there any guarantee that these two loops have the exact same runtime
> performance?
>
> for (i, j) in zip(10.stride(to: 0, by: -1), 20.stride(to: 0, by: -2)) {
> if i % 2 == 0 { continue }
> print(i, j)
> }
>
> for var i = 10, j = 20; i > 0 && j > 0; i -= 1, j -= 2 {
> if i % 2 == 0 { continue }
> print(i, j)
> }
>
>
> And can you guarantee that performance is relatively the same across debug
> and release builds? Because historically, Swift has suffered greatly in
> this regard with respects to the performance of optimized versus
> non-optimized builds.
>
> These types of optimizer issues are real-world things I’ve had to deal
> with (and have written up many blog posts about). I get the desire to
> simplify the constructs, but we need an escape hatch to write performant
> code when the optimizer isn’t up to the job.
>
> -David
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20151211/50bca11a/attachment.html>
More information about the swift-evolution
mailing list