[swift-evolution] Pitch: Wrap calls to NSFileHandle and NSData in autorelease pools

Charles Srstka cocoadev at charlessoft.com
Fri Jul 14 12:11:53 CDT 2017


MOTIVATION:

Meet Bob. Bob is a developer with mostly C++ and Java experience, but who has been learning Swift. Bob needs to write an app to parse some proprietary binary data format that his company requires. Bob’s written this app, and it’s worked pretty well on Linux:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = handle.readData(ofLength: bufsize)
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Later, Bob needs to port this same app to macOS. All seems to work well, until Bob tries opening a large file of many gigabytes in size. Suddenly, the simple act of running the app causes Bob’s Mac to completely lock up, beachball, and finally pop up with the dreaded “This computer is out of system memory” message. If Bob’s particularly unlucky, things will locked up tight enough that he can’t even recover from there, and may have to hard-reboot the machine.

What happened?

Experienced Objective-C developers will spot the problem right away; the Foundation APIs that Bob used generated autoreleased objects, which would never be released until Bob’s loop finished. However, Bob’s never programmed in Objective-C, and to him, this behavior is completely undecipherable.

After a copious amount of time spent Googling for answers and asking for help on various mailing lists and message boards, Bob finally gets the recommendation from someone to try wrapping the file handle read in an autorelease pool. So he does:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = autoreleasepool { handle.readData(ofLength: bufsize) }
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Unfortunately, Bob’s program still eats RAM like Homer Simpson in an all-you-can-eat buffet. Turns out the data.withUnsafeBytes call *also* causes the data to be autoreleased. What Bob really needs to do is to wrap the whole thing in an autorelease pool, creating a Pyramid of Doom:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        autoreleasepool {
            let data = handle.readData(ofLength: bufsize)
            
            if data.isEmpty {
                break // error: ‘break’ is allowed only inside a loop, if, do, or switch
            }
            
            data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
                // do something with bytes
            }
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

However, when Bob tries to run this, he now gets a compile error on the ‘break’ statement; it’s no longer possible to break out of the loop, since everything inside the autorelease block is in a closure.

Bob is now regretting his decision not to become an insurance adjuster instead.

Bob’s problem, of course, can be solved by using *two* autorelease pools, one when getting the data, and the next when working with it. But this situation is confusing to newcomers to the language, since autorelease pools are not really part of Swift’s idiom, and aren’t mentioned anywhere in the usual Swift documentation. Thus, without Objective-C experience, autorelease-related issues are completely mysterious and baffling, particularly since, as a struct, it isn’t obvious that Objective-C will be involved at all when using the Data type. Even to experienced Objective-C developers, autorelease pools in Swift can become awkward since, unlike with Objective-C, they can’t simply be tacked onto a loop without losing flow control via break and continue.

PROPOSED SOLUTION:

In the Foundation overlay, wrap calls to Objective-C NSFileHandle and NSData APIs that generate autoreleased objects in an autorelease pool, so that they behave the way a user new to the language would expect, and in a manner consistent with how they likely behave on other platforms which lack the Objective-C bridge.

This would likely add a small performance overhead, but this should be negligible compared to the overhead involved in reading from the disk which will occur when using a FileHandle. In addition, if Data objects are being accessed frequently enough for performance to be an issue, it’s likely that enough of them to be generated to make memory overhead an issue if an autorelease pool is not used.

IMPACT ON EXISTING CODE:

Code that currently works around these issues with an autorelease pool may end up double-wrapping until these manual workarounds are removed.
	
Charles

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170714/2e38e8d7/attachment.html>


More information about the swift-evolution mailing list