[swift-dev] Reducing the size of Swift binaries by shortening symbols

Mon Dec 21 00:07:19 CST 2015

Hi Steve, 

> On Dec 20, 2015, at 7:35 AM, Stephen Canon <scanon at apple.com> wrote:
> 
> Nadav, can you clarify what we’re really trying to accomplish here?  "Smaller binaries” isn’t too important of a goal in and of itself.
> 
> Are we trying to:
> – reduce storage used on disk
> – reduce load time
> – reduce loaded memory footprint
> – make emitting swift binaries more efficient
> – something else?
> 
> Yes, I know, “all of the above”, but understanding something about what’s most important would help evaluate the proposal.

> 
> It’s also worth keeping in mind that iOS and OS X have been aggressively adopting pervasive system-wide compression both on disk and in memory.  This trend will continue, and it makes it quite a bit less important for individual components to explicitly adopt compression techniques themselves, except in cases where there’s a lot of special structure that those components can leverage to get better compression than a general-purpose lossless compressor can manage (images and sound are the two obvious examples of this, but also cases like huge arrays of floating-point data where the low-order bits don’t matter, etc).  Linux hasn’t been as aggressive about doing this yet, but pervasive system-level compression is The Future.

Swift is a systems programming language. We’d like to be able to build the whole operating system in Swift. This mans that one day your phone will have hundreds of shared libraries (written in swift) loaded all at the same time. Thousands of shared libraries will be saved on disk, and updated every time you upgrade the OS or some apps. The string table (linkedit section) is loaded into memory (shared cow). In a world where every single process uses multiple swift libraries reducing the size of this section is very beneficial. 

Disk and network compressions can help. I believe that we have domain specific information that will allow us do a better job in compressing this section. 

Thanks,
-Nadav

> 
> – Steve
> 
>> On Dec 20, 2015, at 5:17 AM, Dmitri Gribenko <gribozavr at gmail.com <mailto:gribozavr at gmail.com>> wrote:
>> 
>> + Stephen Canon, because he probably has good ideas in this domain.
>> 
>> On Fri, Dec 18, 2015 at 3:42 PM, Nadav Rotem via swift-dev <swift-dev at swift.org <mailto:swift-dev at swift.org>> wrote:
>> 
>> What’s next?
>> 
>> The small experiment I described above showed that compressing the names in the string table has a huge potential for reducing the size of swift binaries. I’d like for us (swift-developers) to talk about the implications of this change and start working on the two tasks of tightening our existing mangling format and on implementing a new compression layer on top. 
>> 
>> Hi Nadav,
>> 
>> This is a great start that shows that there is a potential for improvement in our mangled names!
>> 
>> To make this effort more visible, I would suggest creating a bug on https://bugs.swift.org/ <https://bugs.swift.org/> .
>> 
>> I think we survey existing solutions that industry has developed for compressing short messages.  What comes to mind:
>> 
>> - header compression in HTTP2:
>> https://http2.github.io/http2-spec/compression.html <https://http2.github.io/http2-spec/compression.html>
>> 
>> - PPM algorithms are one of the best-performing compression algorithms for text.
>> 
>> - Arithmetic coding is also a natural starting point for experimentation.
>> 
>> Since the input mangled name also comes in a restricted character set, we could also remove useless bits first, and try an existing compression algorithm on the resulting binary string.
>> 
>> We should also build a scheme that uses shortest one between the compressed and non-compressed names.
>> 
>> For running experiments it would be useful to publish a sample corpus of mangled names that we will be using for comparing the algorithms and approaches.
>> 
>> I also have a concern about making mangled names completely unreadable.  Today, I can frequently at least get a gist of what the referenced entity is without a demangler.  What we could do is make the name consist of a human-readable prefix that encodes just the base name and a compressed suffix that encodes the rest of the information.
>> 
>> _T<length><class name><length><method name><compressed suffix>
>> 
>> We would be able to use references to the class and the method name from the compressed part, so that character data isn't completely wasted.
>> 
>> This scheme that injects human-readable parts will also allow the debugger to quickly match the names without the need to decompress them.
>> 
>> We should also investigate improving existing mangling scheme to produce shorter results.  For example, one idea that comes to mind is using base-60 instead of base-10 for single-digit numbers that that specify identifier length, falling back to base-10 for longer numbers to avoid ambiguity.  This would save one character for every identifier longer than 9 characters and shorter than 60, which is actually the common case.
>> 
>> Dmitri
>> 
>> -- 
>> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
>> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com <mailto:gribozavr at gmail.com>>*/
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20151220/e29e71df/attachment.html>