[swift-dev] Performance issues in automatic reference counting (ARC)?
Roman Levenstein
rlevenstein at apple.com
Tue Jan 31 11:02:20 CST 2017
Hi Mikio,
> On Jan 31, 2017, at 12:07 AM, Mikio Takeuchi via swift-dev <swift-dev at swift.org> wrote:
>
> Hi Michael,
>
> > If you are interested in the perf difference with ARC atomics, Roman recently added a mode to the compiler called -assume-single-threaded that uses non-atomic reference counts anywhere.
>
> I think that is not exactly true. As of now, -assume-single-threaded option can eliminate atomic reference counts only for reference types.
>
> I tried -assume-single-threaded for compiling applications as well as swift runtime, and found that atomic reference counts were still used for value types and improvements were limited because of them.
That’s correct. -assume-single-threaded does not cover all cases. I think it is even mentioned somewhere in the comments. It was meant as a quick hack to allow some experiments in this direction.
>
> SIL Instructions on value types (such as CopyValue) are not subtype of RefCountingInst, therefore they don't have a mechanism to represent atomicity. COW requires reference counts and, because of the lack of information, atomic reference counts are assumed at many places in their codegen and runtime.
>
> I made a prototype which returns the right atomicity based on the compiler option in order to eliminate atomic reference counts from generated code. I also modified value witness functions to eliminate atomic reference counts from them. With these changes, atomic reference counts almost disappeared.
Cool! Thanks for working on this!
Could you report some preliminary performance improvement due to these changes? It would be interesting to see the comparison with the vanilla Swift compiler, the existing -assume-single-threaded option and your improvements.
>
> If it makes sense, I am happy to contribute my changes to the community.
>
> I understand there are two problems with my prototype.
> 1) We may need to introduce a mechanism to represent atomicity for value types as well. It will open an opportunity for compiler to use non-atomic reference counts for thread-local values.
> 2) We need to either extend value witness table to add non-atomic version of functions, or pass atomicity information to the functions as an extra parameter.
>
> Since they are not trivial changes, I would like your endorsement before starting serious work.
It would be interesting to see your code in any case. Could you share a link to the branch with your changes?
Regarding the question about the future work on the prototype, let’s first see the code and taken implementation approach. Then we could discuss its design. etc.
-Roman
>
> -- Mikio
>
>
> 2016-12-18 5:49 GMT+09:00 Michael Gottesman via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>>:
>
>> On Dec 17, 2016, at 11:13 AM, Brian Gesiak via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>>
>> Hello all!
>>
>> I really enjoyed Chris Lattner's slides from his talk at IBM <http://researcher.watson.ibm.com/researcher/files/us-lmandel/lattner.pdf <http://researcher.watson.ibm.com/researcher/files/us-lmandel/lattner.pdf>>.
>>
>> The speaker notes mention ARC:
>>
>> "There are two principle downsides to ARC that people cite: one is the need for atomic increment/decrements, which can be slow." [...] "The performance problems it can cause are real in some important cases"
>>
>> Can someone point me to a good resource that explains these problems? I guess atomic reference count changes create overhead in multithreaded applications? Are there more detailed explorations into this topic?
>
> With a proper concurrency model, I believe you can make most reference counting operations local (my opinion). I have done some explorations in this area in the past using what I call thread local vs global reference counts and using marked concurrency boundaries to mediate transitions in between them (moving from thread local -> atomic of course if one escapes in an undefined way).
>
> If you are interested in the perf difference with ARC atomics, Roman recently added a mode to the compiler called -assume-single-threaded that uses non-atomic reference counts anywhere.
>
> There are some interesting optimizations in this area as well, specifically even today, COW gives a nice guarantee of thread localness allowing you to eliminate atomic reference counts once you have a uniqued cow data structure.
>
> Michael
>
>>
>> Thanks!
>>
>> - Brian Gesiak
>>
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org <mailto:swift-dev at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
>
>
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org <mailto:swift-dev at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-dev <https://lists.swift.org/mailman/listinfo/swift-dev>
>
>
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170131/85447a85/attachment.html>
More information about the swift-dev
mailing list