[swift-users] Data races with copy-on-write

Romain Jacquinot rjacquinot at me.com
Tue Dec 5 16:23:31 CST 2017


Also, on the official Swift blog (https://developer.apple.com/swift/blog/?id=10 <https://developer.apple.com/swift/blog/?id=10>), it is stated that:

"One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you.
[…]
In Swift, Array, String, and Dictionary are all value types. They behave much like a simple int value in C, acting as a unique instance of that data. You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back. Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.”

Yet, data race can occur here:

public class MyClass {

    private var _myArray: Array<Int> = []
    private var _lock = NSLock()

    public var myArray: Array<Int> {
        lock.lock()
        defer {
            lock.unlock()
        }
        let copy = _myArray
        return copy
    }

    public func doSomethingThatMutatesArray() {
        _lock.lock()
        _myArray.append(1) // data race here: write of size 8 by thread 1
        _lock.unlock()
    }
}


let myClass = MyClass()

func mutateArray() {
    myClass.doSomethingThatMutatesArray()
    var arrayCopy = myClass.myArray
    arrayCopy.append(2) // data race here: read of size 8 by thread 10
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 100_000

q1.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q2.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q3.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

for _ in 0..<iterations {
     mutateArray()
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

It also can occur for instance if I replace the mutateArray() function with the following method:

func enumerateArray() {
    myClass.doSomethingThatMutatesArray()
    for element in myClass.myArray {  // data race here: read of size 8 by thread 5
        let _ = element
    }
}

How can I return a “predictable” copy from the MyClass.myArray getter, so that I can later mutate the copy without synchronization like in the mutateArray() function?

> On Dec 5, 2017, at 9:23 PM, Romain Jacquinot via swift-users <swift-users at swift.org> wrote:
> 
> Hi Jens,
> 
> In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:
> 
> public class MyClass {
> 
> 	private var _myArray: Array<Int> = []
> 	private var _lock = NSLock()
> 
> 	public var myArray: Array<Int> {
> 		_lock.lock()
> 		defer { _lock.unlock() }
> 		return _myArray
> 	}
> 
> 	public func doSomethingThatMutatesArray() {
> 		_lock.lock()
> 		_myArray.append(1)
> 		_lock.unlock()
> 	}
> }
> 
> Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:
> 
> let myClass = MyClass()
> 
> func callFromMultipleThreads() {
> 	let arrayCopy = myClass.myArray
> 	arrayCopy.append(2) // race condition here
> }
> 
> This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.
> 
>> On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens at mooseyard.com <mailto:jens at mooseyard.com>> wrote:
>> 
>> 
>> 
>>> On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users at swift.org <mailto:swift-users at swift.org>> wrote:
>>> 
>>> The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.
>> 
>> I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.
>> 
>> Romain:
>> From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.
>> 
>> It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.
>> 
>> —Jens
> 
> _______________________________________________
> swift-users mailing list
> swift-users at swift.org
> https://lists.swift.org/mailman/listinfo/swift-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20171205/484d6ca7/attachment.html>


More information about the swift-users mailing list