<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 16, 2017, at 6:48 PM, Jiho Choi via swift-dev <<a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi,<div class=""><br class=""></div><div class="">I was curious about the overhead of ARC and started profiling some benchmarks found in the Computer Language Benchmark Game (<a href="http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift" class="">http://benchmarksgame.alioth.debian.org/u64q/measurements.php?lang=swift</a>). So far, it seems that ARC sequence optimization is surprisingly good and most benchmarks don't have to perform ARC operations as often as I expected. I have some questions regarding this finding.</div><div class=""><br class=""></div><div class="">I compiled all benchmarks with "-O -wmo" flags and counted the number of calls to ARC runtime (e.g., swift_rt_swift_retain) using Pin.</div><div class=""><br class=""></div><div class="">1. Reference counting is considered to have high overhead due to frequent counting operations which also have to be atomic. At least for the benchmarks I tested, it is not the case and there is almost no overhead. Is it expected behavior? Or is it because the benchmarks are too simple (they are all single-file programs)? How do you estimate the overhead of ARC would be?</div></div></div></blockquote><div><br class=""></div>It is possible that the optimizer eliminated many reference counting operations here. Also my understanding is that while atomic operations are more expensive than non-atomic operations, the real cost only comes into play if you actually have contention due to bouncing cache lines. In a single-threaded workload the overhead is not that great.</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">2. I also tried to compile the same benchmarks with "-Xfrontend -assume-single-threaded" to measure the overhead of atomic operations. Looking at the source code of this experimental pass and SIL optimizer's statistic, the pass seems to work as expected to convert all ARC operations in user code into nonatomic. However, even when using this flag, there are some atomic ARC runtime called from the user code (not library). More strangely, SIL output said all ARC operations in the user code have turned into nonatomic. The documentation says ARC operations are never implicit in SIL. So if there is no atomic ARC at SIL-level, I expect the user code would never call atomic ARC runtime. Am I missing something?</div></div></div></blockquote><div><br class=""></div>IRGen still emits atomic reference counting operations when it produces value witness operations. I think there’s a PR open right now to address this: <a href="https://github.com/apple/swift/pull/7421" class="">https://github.com/apple/swift/pull/7421</a></div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">3. Are there more realistic benchmarks available? Swift's official benchmarks also seem pretty small.</div></div></div></blockquote><div><br class=""></div>Contributions are welcome :-)</div><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Jiho</div></div>
_______________________________________________<br class="">swift-dev mailing list<br class=""><a href="mailto:swift-dev@swift.org" class="">swift-dev@swift.org</a><br class="">https://lists.swift.org/mailman/listinfo/swift-dev<br class=""></div></blockquote></div><br class=""></body></html>