<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Nadav, can you clarify what we’re really trying to accomplish here? "Smaller binaries” isn’t too important of a goal in and of itself.</div><div class=""><br class=""></div><div class="">Are we trying to:</div><div class="">– reduce storage used on disk</div><div class="">– reduce load time</div><div class="">– reduce loaded memory footprint</div><div class="">– make emitting swift binaries more efficient</div><div class="">– something else?</div><div class=""><br class=""></div><div class="">Yes, I know, “all of the above”, but understanding something about what’s most important would help evaluate the proposal.</div><div class=""><br class=""></div><div class="">It’s also worth keeping in mind that iOS and OS X have been aggressively adopting pervasive system-wide compression both on disk and in memory. This trend will continue, and it makes it quite a bit less important for individual components to explicitly adopt compression techniques themselves, except in cases where there’s a lot of special structure that those components can leverage to get better compression than a general-purpose lossless compressor can manage (images and sound are the two obvious examples of this, but also cases like huge arrays of floating-point data where the low-order bits don’t matter, etc). Linux hasn’t been as aggressive about doing this yet, but pervasive system-level compression is The Future.</div><div class=""><br class=""></div><div class="">– Steve</div><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 20, 2015, at 5:17 AM, Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" class="">gribozavr@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">+ Stephen Canon, because he probably has good ideas in this domain.</div><div class="gmail_quote"><br class=""></div><div class="gmail_quote">On Fri, Dec 18, 2015 at 3:42 PM, Nadav Rotem via swift-dev <span dir="ltr" class=""><<a href="mailto:swift-dev@swift.org" target="_blank" class="">swift-dev@swift.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><b class=""><u class=""><br class=""></u></b></div><div class=""><b class=""><u class="">What’s next?</u></b></div><div class=""><br class=""></div><div class="">The small experiment I described above showed that compressing the names in the string table has a huge potential for reducing the size of swift binaries. I’d like for us (swift-developers) to talk about the implications of this change and start working on the two tasks of tightening our existing mangling format and on implementing a new compression layer on top. </div></div></blockquote><div class=""><br class=""></div><div class="">Hi Nadav,</div><div class=""><br class=""></div><div class="">This is a great start that shows that there is a potential for improvement in our mangled names!</div><div class=""><br class=""></div><div class="">To make this effort more visible, I would suggest creating a bug on <a href="https://bugs.swift.org/" class="">https://bugs.swift.org/</a> .</div><div class=""><br class=""></div><div class="">I think we survey existing solutions that industry has developed for compressing short messages. What comes to mind:</div><div class=""><br class=""></div><div class="">- header compression in HTTP2:</div><div class=""><a href="https://http2.github.io/http2-spec/compression.html" target="_blank" class="">https://http2.github.io/http2-spec/compression.html</a><br class=""></div></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">- PPM algorithms are one of the best-performing compression algorithms for text.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">- Arithmetic coding is also a natural starting point for experimentation.</div><div class="gmail_extra"><br class=""></div>Since the input mangled name also comes in a restricted character set, we could also remove useless bits first, and try an existing compression algorithm on the resulting binary string.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">We should also build a scheme that uses shortest one between the compressed and non-compressed names.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">For running experiments it would be useful to publish a sample corpus of mangled names that we will be using for comparing the algorithms and approaches.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">I also have a concern about making mangled names completely unreadable. Today, I can frequently at least get a gist of what the referenced entity is without a demangler. What we could do is make the name consist of a human-readable prefix that encodes just the base name and a compressed suffix that encodes the rest of the information.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">_T<length><class name><length><method name><compressed suffix></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">We would be able to use references to the class and the method name from the compressed part, so that character data isn't completely wasted.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">This scheme that injects human-readable parts will also allow the debugger to quickly match the names without the need to decompress them.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">We should also investigate improving existing mangling scheme to produce shorter results. For example, one idea that comes to mind is using base-60 instead of base-10 for single-digit numbers that that specify identifier length, falling back to base-10 for longer numbers to avoid ambiguity. This would save one character for every identifier longer than 9 characters and shorter than 60, which is actually the common case.<br clear="all" class=""><div class=""><br class=""></div><div class="">Dmitri</div><div class=""><br class=""></div>-- <br class=""><div class="">main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if<br class="">(j){printf("%d\n",i);}}} /*Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" target="_blank" class="">gribozavr@gmail.com</a>>*/</div>
</div></div>
</div></blockquote></div><br class=""></body></html>