<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 3. Aug 2017, at 20:52, Taylor Swift via swift-evolution <<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class="">In an effort to get this thread back on track, I tried implementing cos(_:) in pure generic Swift code, with the BinaryFloatingPoint protocol. It deviates from the _cos(_:) intrinsic by no more than 5.26362703423544e-11. Adding more terms to the approximation only has a small penalty to the performance for some reason.<br class=""><br class=""></div>To make the benchmarks fair, and explore the idea of distributing a Math module without killing people on the cross-module optimization boundary, I enabled some of the unsafe compiler attributes. All of these benchmarks are cross-module calls, as if the math module were downloaded as a dependency in the SPM.<br class=""><br class=""><span style="font-family:monospace,monospace" class="">== Relative execution time (lower is better) ==<br class=""><br class=""></span><span style="font-family:monospace,monospace" class=""><span style="background-color:rgb(217,210,233)" class="">llvm intrinsic : 3.133</span><br class=""></span></div><div class=""><span style="font-family:monospace,monospace" class=""><span style="background-color:rgb(234,209,220)" class="">glibc cos() : 3.124</span><br class=""></span></div><div class=""><span style="font-family:monospace,monospace" class=""><br class="">no attributes : 43.675<br class="">with specialization : 4.162<br class="">with inlining : 3.108<br class="">with inlining and specialization : 3.264</span><br class=""><br class=""></div>As you can see, the pure Swift generic implementation actually beats the compiler intrinsic (and the glibc cos() but I guess they’re the same thing) when inlining is used, but for some reason generic specialization and inlining don’t get along very well.<br class=""><br class=""></div>Here’s the source implementation. It uses a taylor series (!) which probably isn’t optimal but it does prove that cos() and sin() can be implemented as generics in pure Swift, be distributed as a module outside the stdlib, and still achieve competitive performance with the llvm intrinsics.<br class=""><br class=""><span style="font-family:monospace,monospace" class="">@_inlineable<br class="">//@_specialize(where F == Float)<br class="">//@_specialize(where F == Double)<br class="">public<br class="">func cos<F>(_ x:F) -> F where F:BinaryFloatingPoint<br class="">{<br class=""> let x:F = abs(x.remainder(dividingBy: 2 * F.pi)),<br class=""> quadrant:Int = Int(x * (2 / F.pi))<br class=""><br class=""> switch quadrant<br class=""> {<br class=""> case 0:<br class=""> return cos(on_first_quadrant: x)<br class=""> case 1:<br class=""> return -cos(on_first_quadrant: F.pi - x)<br class=""> case 2:<br class=""> return -cos(on_first_quadrant: x - F.pi)<br class=""> case 3:<br class=""> return -cos(on_first_quadrant: 2 * F.pi - x)<br class=""> default:<br class=""> fatalError("unreachable")<br class=""> }<br class="">}<br class=""><br class="">@_versioned<br class="">@_inlineable<br class="">//@_specialize(where F == Float)<br class="">//@_specialize(where F == Double)<br class="">func cos<F>(on_first_quadrant x:F) -> F where F:BinaryFloatingPoint<br class="">{<br class=""> let x2:F = x * x<br class=""> var y:F = -0.0000000000114707451267755432394<br class=""> for c:F in [0.000000002087675698165412591559,<br class=""> -0.000000275573192239332256421489,<br class=""> 0.00002480158730158702330045157,<br class=""> -0.00138888888888888880310186415,<br class=""> 0.04166666666666666665319411988,<br class=""> -0.4999999999999999999991637437,<br class=""> 0.9999999999999999999999914771<br class=""> ]<br class=""> {<br class=""> y = x2 * y + c<br class=""> }<br class=""> return y<br class="">}</span><br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Thu, Aug 3, 2017 at 7:04 AM, Stephen Canon via swift-evolution <span dir="ltr" class=""><<a href="mailto:swift-evolution@swift.org" target="_blank" class="">swift-evolution@swift.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word;line-break:after-white-space" class=""><span class=""><blockquote type="cite" class="">On Aug 2, 2017, at 7:03 PM, Karl Wagner via swift-evolution <<a href="mailto:swift-evolution@swift.org" target="_blank" class="">swift-evolution@swift.org</a>> wrote:<br class=""></blockquote><div class=""><blockquote type="cite" class=""><div class=""><br class="m_3712668855384799923Apple-interchange-newline"><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">It’s important to remember that computers are mathematical machines, and some functions which are implemented in hardware on essentially every platform (like sin/cos/etc) are definitely best implemented as compiler intrinsics.</span></div></blockquote><br class=""></div></span><div class="">sin/cos/etc are implemented in software, not hardware. x86 does have the FSIN/FCOS instructions, but (almost) no one actually uses them to implement the sin( ) and cos( ) functions; they are a legacy curiosity, both too slow and too inaccurate for serious use today. There are no analogous instructions on ARM or PPC.</div><div class=""><br class=""></div><div class="">– Steve</div></div><br class="">______________________________<wbr class="">_________________<br class="">
swift-evolution mailing list<br class="">
<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a><br class="">
<a href="https://lists.swift.org/mailman/listinfo/swift-evolution" rel="noreferrer" target="_blank" class="">https://lists.swift.org/<wbr class="">mailman/listinfo/swift-<wbr class="">evolution</a><br class="">
<br class=""></blockquote></div><br class=""></div>
_______________________________________________<br class="">swift-evolution mailing list<br class=""><a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a><br class="">https://lists.swift.org/mailman/listinfo/swift-evolution<br class=""></div></blockquote></div><br class=""><div class=""><br class=""></div><div class="">Just a guess, but I’d expect inlining implies specialisation. It would be weird if the compiler inlined a chunk of unoptimised generic code in to your function.</div><div class=""><br class=""></div><div class="">Pretty cool figures, though.</div><div class=""><br class=""></div><div class="">- Karl</div><div class=""><br class=""></div></body></html>