[swift-users] Zero cost abstraction 2D Iterator (equivalent to two nested for loops) impossible?

Wed Jan 4 02:56:43 CST 2017

Hi,

I'm working on some low-level pixel processing code (stuff that is not
possible to do using standard API or on eg GPU), and I had lots of eg these:

for y in someStartY ..< someStopY {
    for x in someStartX ..< someStopX {
        ... pixels[x, y] ...
    }
}

So I implemented some (value) types like eg IntPoint2D, IntSize2D,
IntRect2D and I made an IntRect2DIterator so that IntRect2D could be a
Sequence over its (discrete) points. With this I could rewrite the above
like so:

for pixelPosAsIntPoint2D in someIntRect2D {
    ... pixels[pixelPosAsIntPoint2D] ...
}

For some reason the first version (two nested for loops for x and y) is
always a bit faster than the abstracted version no matter how I write it
(tried using eg &+ instead of + etc).

Is it possible to write as a zero cost abstraction like this, if so, how?
If not, why?

PS

Note that eg this:

for y in someStartY ..< someStopY {
    for x in someStartX ..< someStopX {
       let pixelPosAsIntPoint2D = IntPoint2D(x: x, y: y)
        ... pixels[pixelPosAsIntPoint2D] ...
    }
}

is exactly as fast as the top example (using just ... pixels[x, y] ...). So
the difference in execution time seems to be due to something in the
Iterator and not eg the pixel accessing subscript taking the 2d int point
type instead of separate x and y ints.

Here is one Iterator variant that I have tested:

struct Rect2DIntPointIterator : IteratorProtocol, Sequence {

    let startX, startY, stopX, stopY: Int

    var px, py: Int

    init(rect: Rect2DInt) {

        startX = rect.origin.x

        startY = rect.origin.y

        stopX = rect.maxX

        stopY = rect.maxY

        px = startX

        py = startY

    }

    mutating func next() -> Point2DInt? {

        defer { px = px &+ 1 }

        if px == stopX {

            px = startX

            py = py &+ 1

            if py == stopY { return nil }

        }

        return Point2DInt(x: px, y: py)

    }

}

And here are typical execution times from an example test:
2.1 seconds using my Iterator (the fastest I can get it, no matter how I
try to rewrite it).
1.5 seconds using nested x, y for loops.

I'm pretty sure my testing is done thoroughly (measuring average of many
runs, using random data, avoiding dead code elimination, whole module
optimization, etc).

I have tried profiling the code and looking at the disassmbly but I'm
failing to understand what's going on.

So the ultimate answer would be in the form of a (2d, Int) Rectangle type
whose (2d, Int) Points can be iterated in a for loop, at zero cost compared
to doing the same using two nested for loops. Or an explanation of why this
is impossible.

DS

/Jens
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20170104/418a7f73/attachment.html>