[swift-dev] Shouldn't the optimizer make this manual loop-unrolling unnecessary?

Sat Dec 12 08:37:10 CST 2015

Thank you. I've filed:
https://bugs.swift.org/browse/SR-203

On Fri, Dec 11, 2015 at 7:04 PM, Mark Lacey <mark.lacey at apple.com> wrote:

>
> On Dec 11, 2015, at 6:05 AM, Jens Persson via swift-dev <
> swift-dev at swift.org> wrote:
>
> Correction: The test I'm running is actually using V4<V4<Float>>.
> Manually unrolling the loop makes adding V4<V4<Float>> as fast as adding
> SIMD float4x4.
> Using the (un-unrolled) for loop will be about 4 times slower.
> My question is still: Shouldn't the optimizer be able to handle that for
> loop / make my manual unrolling unnecessary?
> /Jens
>
> On Fri, Dec 11, 2015 at 8:28 AM, Jens Persson <jens at bitcycle.com> wrote:
>
>> I've been doing a lot of performance testing related to generic value
>> types and SIMD lately, and I've built Swift from sources in order to get an
>> idea of what's coming up optimizerwise. Things have improved and the
>> optimizer is impressive overall. But I still see no improvement in the case
>> exemplified below.
>>
>> Manually unrolling the simple for loop will make it ~ 4 times faster (and
>> exactly the same as when SIMD float4):
>>
>> struct V4<T> {
>>     var elements: (T, T, T, T)
>>     /.../
>>     subscript(index: Int) -> T { /.../ }
>>     /.../
>>     func addedTo(other: V4) -> V4 {
>>         var r = V4()
>>         // Manually unrolling makes code ~ 4 times faster:
>>         // for i in 0 ..< 4 { r[i] = self[i] + other[i] }
>>         r[0] = self[0] + other[0]
>>         r[1] = self[1] + other[1]
>>         r[2] = self[2] + other[2]
>>         r[3] = self[3] + other[3]
>>         return r
>>     }
>>     /.../
>> }
>>
>> Shouldn't the optimizer be able to handle that for loop and make the
>> manual unrolling unnecessary?
>>
>
> In theory, yes. In practice there are some fairly complex phase ordering
> issues in the SIL optimizer, and certain optimizations (like general loop
> unrolling) that are only done in the LLVM optimizer. The LLVM optimizer
> runs after all the SIL-level optimizations, which may mean that SIL-level
> optimization opportunities are exposed by the LLVM optimizer but by then it
> is too late to do anything about them.
>
> (compiled the test with -O -whole-module-optimizations, also tried
>> -Ounchecked but with same results.)
>>
>
> Would you mind opening an issue on https://bugs.swift.org will a small
> stand-alone test case that compiles successfully, and report your results
> there?
>
> Mark
>
>

-- 
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20151212/0cb7274e/attachment.html>