[swift-dev] "available externally" vs build time

Fri Dec 29 14:54:22 CST 2017

> On Dec 28, 2017, at 7:32 PM, Chris Lattner via swift-dev <swift-dev at swift.org> wrote:
> 
> Folks working on the SIL optimizer, particularly those interested in faster builds:
> 
> If I understand the SIL optimizer correctly, it seems that when the current program references an external symbol declared as @_inlinable, that SILModule::linkFunction eagerly deserializes the @_inlinable body and splat it into the current module.  That SIL function exists in the current module, gets optimized, inlined, etc along with existing functions, then gets dropped on the floor at IRGen time if it still exists.
> 
> If this is true, this seems like an incredibly wasteful approach, particularly given how many @_inlinable functions exist in the standard library, and particularly for programs that have lots of small files.

In the past, I had talked about making all linking in of functions be lazy. The implementation was still there. There was a large backlash from other performance people since it adds complexity every place that one attempts to analyze a function. I didn't push too hard on this (gotta save your political capital when possible), but I still think that such an issue is an API problem, not an optimizer design problem. With the proper API design around looking up functions and function related instructions, such problems would go away.

>  Try this:
> 
> $ cat u.swift 
> func f() {
>   print("hello")
> }
> 
> $ swiftc u.swift -emit-sil -o - | wc -l
>     7191
> 
> That is a *TON* of SIL, most having to do with array internals, string internals, and other stuff.  This eats memory and costs a ton of compile time to deserialize and slog this around, which gets promptly dropped on the floor by IRGen.  It also makes the -emit-sil output more difficult to work with...
> 
> 
> Optimized builds are also bad:
> $ swiftc u.swift -emit-sil -o - -O | wc -l
>      861
> 
> If you look at it, only about 70 lines of that is the actual program being compiled, the rest is dropped on the floor by IRGen. This costs a ton of memory and compile time to deserialize and represent this, then even more is wasted running the optimizer on code which was presumably optimized when the stdlib was built.
> 
> I imagine that this approach was inspired by LLVM’s available_externally linkage, which does things the same way.  This is a simple way to make sure that interprocedural optimizations can see the bodies of external functions to inline them, etc.   However, LLVM doesn’t have the benefit of a module system like Swift’s, so it has no choice.
> 
> 
> So here are the questions:  :-)
> 
> 1. It looks like the MandatoryInliner is the biggest culprit at -O0 here: it deserializes the referenced function (MandatoryInlining.cpp:384) and *then* checks to see if the callee is @_transparent.  Would it make sense to change this to check for @_transparent first (which might require a SIL change?), and only deserialize if so?

I think the reason why this happened is that a transparent function IIRC must have a body and the verifier asserts upon this. I imagine you could add an intermediate step. IMO this is worth the change not only for -Onone compile time, but also b/c SourceKit relies on mandatory inlining for the purposes of diagnostics, so it /could/ speed up the editor experience as well.

> 
> 2. The performance inliner will have the same issue after this, and deserializing the bodies of all inlinable referenced functions is unavoidable for it.  However, we don’t have to copy the SIL into the current module and burn compile time by subjecting it to all of the standard optimizations again.  Would it make sense to put deserialized function bodies into a separate SIL module, and teach the (few) IPA/IPO optimizations about this fact?  This should be very straight-forward to do for all of the optimizations I’m aware of.

I haven't thought about this completely, but this could potentially cause problems. In general the SIL optimizer assumes that there is one SILModule. I would be careful. Couldn't you just turn off optimizations on available_external functions? Also, one could argue that there are potential cases where optimizing the available_external functions by themselves could save compile time since you are optimizing in one place instead of in multiple places after inlining. That being said, off the top of my head I can't think of any situation where optimizing in an imported module would result in more optimization opportunities than in the original module beyond cases where there are circular references (e.g. a function in the imported module refers to a function in my module, so I can devirtualize/optimize further in my module and do that once before inlining). But IIRC circular references are no-beuano in Swift, so it is not clear to me if that is a /real/ case. Jordan would know more about this. +CC Jordan.

> 
> I haven’t done any measurements, but this seems like it could be a big speedup, particularly for programs containing a bunch of relatively small files and not using WMO.
> 
> -Chris
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev at swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev