[swift-dev] Performance issues in automatic reference counting (ARC)?

Michael Gottesman mgottesman at apple.com
Tue Jan 31 12:32:52 CST 2017


> On Jan 31, 2017, at 12:07 AM, Mikio Takeuchi <mikio.takeuchi at gmail.com> wrote:
> 
> Hi Michael,
> 
> > If you are interested in the perf difference with ARC atomics, Roman recently added a mode to the compiler called -assume-single-threaded that uses non-atomic reference counts anywhere.
> 
> I think that is not exactly true.   As of now, -assume-single-threaded option can eliminate atomic reference counts only for reference types. 
> 
> I tried -assume-single-threaded for compiling applications as well as swift runtime, and found that atomic reference counts were still used for value types and improvements were limited because of them.
> 
> SIL Instructions on value types (such as CopyValue) are not subtype of RefCountingInst, therefore they don't have a mechanism to represent atomicity. 

This is not an issue since currently copy value is lowered right after SILGen to instructions that /do/ have atomicity. My understanding is that the assume single threaded option runs a pass late to set these all to non-atomic.

> COW requires reference counts and, because of the lack of information, atomic reference counts are assumed at many places in their codegen and runtime.

IMO again, this is not an issue due to the late pass.

> 
> I made a prototype which returns the right atomicity based on the compiler option in order to eliminate atomic reference counts from generated code. I also modified value witness functions to eliminate atomic reference counts from them.  With these changes, atomic reference counts almost disappeared. 

The value witness functions I think are the larger potential issue.

> 
> If it makes sense, I am happy to contribute my changes to the community.  
> 
> I understand there are two problems with my prototype. 
> 1) We may need to introduce a mechanism to represent atomicity for value types as well.  It will open an opportunity for compiler to use non-atomic reference counts for thread-local values. 

Again, I do not think that is an issue since we lower these away today. This may require changes at a later time though once these value operations go further back into the compiler.

> 2) We need to either extend value witness table to add non-atomic version of functions, or pass atomicity information to the functions as an extra parameter.

Since your option makes an assumption that the whole program is single threaded, why couldn't we just emit the value witness functions such that they use non-atomic reference counts?

> 
> Since they are not trivial changes, I would like your endorsement before starting serious work.

Send a PR and lets talk about it. [i.e. what Roman said ; )]

> 
> -- Mikio
> 
> 
> 2016-12-18 5:49 GMT+09:00 Michael Gottesman via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>>:
> 
>> On Dec 17, 2016, at 11:13 AM, Brian Gesiak via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>> 
>> Hello all!
>> 
>> I really enjoyed Chris Lattner's slides from his talk at IBM <http://researcher.watson.ibm.com/researcher/files/us-lmandel/lattner.pdf <http://researcher.watson.ibm.com/researcher/files/us-lmandel/lattner.pdf>>. 
>> 
>> The speaker notes mention ARC:
>> 
>> "There are two principle downsides to ARC that people cite: one is the need for atomic increment/decrements, which can be slow." [...] "The performance problems it can cause are real in some important cases"
>> 
>> Can someone point me to a good resource that explains these problems? I guess atomic reference count changes create overhead in multithreaded applications? Are there more detailed explorations into this topic?
> 
> With a proper concurrency model, I believe you can make most reference counting operations local (my opinion). I have done some explorations in this area in the past using what I call thread local vs global reference counts and using marked concurrency boundaries to mediate transitions in between them (moving from thread local -> atomic of course if one escapes in an undefined way).
> 
> If you are interested in the perf difference with ARC atomics, Roman recently added a mode to the compiler called -assume-single-threaded that uses non-atomic reference counts anywhere.
> 
> There are some interesting optimizations in this area as well, specifically even today, COW gives a nice guarantee of thread localness allowing you to eliminate atomic reference counts once you have a uniqued cow data structure.
> 
> Michael
> 
>> 
>> Thanks!
>> 
>> - Brian Gesiak
>> 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
> 
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org <mailto:swift-dev at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170131/55a52dd1/attachment.html>


More information about the swift-dev mailing list