<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 12, 2016, at 2:25 AM, Gerriet M. Denkmann via swift-users <<a href="mailto:swift-users@swift.org" class="">swift-users@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">uint64_t nbrBytes = 4e8;<br class="">uint64_t count = 0;<br class="">for( uint64_t byteIndex = 0; byteIndex < nbrBytes; byteIndex++ )<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>count += byteIndex;<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>if ( ( byteIndex & 0xffffffff ) == 0 ) { count += 1.3; } (AAA) <br class="">};<br class=""><br class="">Takes 260 msec.<br class=""><br class="">Btw.: Without the (AAA) line the whole loop is done in 10 μsec. A really clever compiler!<br class="">And with “count += 1” instead of “count += 1.3” it takes 410 msec. Very strange. <br class="">But this is beside the point here.<br class=""><br class=""><br class="">Now Swift:<br class="">let nbrBytes = 400_000_000<br class="">var count = 0<br class="">for byteIndex in 0 ..< nbrBytes<br class="">{<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>count += byteIndex<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>if ( ( byteIndex & 0xffffffff ) == 0 ) {count += Int(1.3);}<br class="">}<br class=""><br class="">takes 390 msec - about 50 % more.<br class=""><br class="">Release build with default options.</div></div></blockquote><br class=""></div><div>You'll need to read the generated assembly code if you want to analyze performance of this sort of small arithmetic loop. Performance of this kind of code can be greatly affected by small optimization changes.</div><div><br class=""></div><div>clang's `count +=1` code is vectorized, and the `count += 1.3` code is not vectorized. For whatever reason that vectorization is unsuccessful and the vectorized loop runs slower (at least on your machine and on my machine). Optimization is hard.</div><div><br class=""></div><div>The Swift loop runs slower because Swift performs arithmetic overflow checks that C does not, and in this case swiftc was unable to optimize them all away.</div><div><br class=""></div><div>If you use &+ instead of +, or compile with -Ounchecked, then Swift won't perform the overflow checks. Unfortunately in this case you then get the same slower vectorized code from swiftc as you did from clang's `count += 1` case; presumably both clang and swiftc get this pessimization from LLVM. I couldn't find a way to disable LLVM vectorization from swiftc.</div><br class=""><div class=""><br class=""></div><div class="">-- </div><div class="">Greg Parker <a href="mailto:gparker@apple.com" class="">gparker@apple.com</a> Runtime Wrangler</div><div class=""><br class=""></div><div class=""><br class=""></div></body></html>