[swift-dev] Shouldn't the optimizer make this manual loop-unrolling unnecessary?
Jens Persson
jens at bitcycle.com
Fri Dec 11 08:05:57 CST 2015
Correction: The test I'm running is actually using V4<V4<Float>>.
Manually unrolling the loop makes adding V4<V4<Float>> as fast as adding
SIMD float4x4.
Using the (un-unrolled) for loop will be about 4 times slower.
My question is still: Shouldn't the optimizer be able to handle that for
loop / make my manual unrolling unnecessary?
/Jens
On Fri, Dec 11, 2015 at 8:28 AM, Jens Persson <jens at bitcycle.com> wrote:
> I've been doing a lot of performance testing related to generic value
> types and SIMD lately, and I've built Swift from sources in order to get an
> idea of what's coming up optimizerwise. Things have improved and the
> optimizer is impressive overall. But I still see no improvement in the case
> exemplified below.
>
> Manually unrolling the simple for loop will make it ~ 4 times faster (and
> exactly the same as when SIMD float4):
>
> struct V4<T> {
> var elements: (T, T, T, T)
> /.../
> subscript(index: Int) -> T { /.../ }
> /.../
> func addedTo(other: V4) -> V4 {
> var r = V4()
> // Manually unrolling makes code ~ 4 times faster:
> // for i in 0 ..< 4 { r[i] = self[i] + other[i] }
> r[0] = self[0] + other[0]
> r[1] = self[1] + other[1]
> r[2] = self[2] + other[2]
> r[3] = self[3] + other[3]
> return r
> }
> /.../
> }
>
> Shouldn't the optimizer be able to handle that for loop and make the
> manual unrolling unnecessary?
>
> (compiled the test with -O -whole-module-optimizations, also tried
> -Ounchecked but with same results.)
>
> /Jens
>
>
--
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20151211/6f33523f/attachment.html>
More information about the swift-dev
mailing list