<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi Steve, </div><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 20, 2015, at 7:35 AM, Stephen Canon <<a href="mailto:scanon@apple.com" class="">scanon@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Nadav, can you clarify what we’re really trying to accomplish here? "Smaller binaries” isn’t too important of a goal in and of itself.</div><div class=""><br class=""></div><div class="">Are we trying to:</div><div class="">– reduce storage used on disk</div><div class="">– reduce load time</div><div class="">– reduce loaded memory footprint</div><div class="">– make emitting swift binaries more efficient</div><div class="">– something else?</div><div class=""><br class=""></div><div class="">Yes, I know, “all of the above”, but understanding something about what’s most important would help evaluate the proposal.</div></div></div></blockquote><div><br class=""></div><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><div class="">It’s also worth keeping in mind that iOS and OS X have been aggressively adopting pervasive system-wide compression both on disk and in memory. This trend will continue, and it makes it quite a bit less important for individual components to explicitly adopt compression techniques themselves, except in cases where there’s a lot of special structure that those components can leverage to get better compression than a general-purpose lossless compressor can manage (images and sound are the two obvious examples of this, but also cases like huge arrays of floating-point data where the low-order bits don’t matter, etc). Linux hasn’t been as aggressive about doing this yet, but pervasive system-level compression is The Future.</div></div></div></blockquote><div><br class=""></div><div>Swift is a systems programming language. We’d like to be able to build the whole operating system in Swift. This mans that one day your phone will have hundreds of shared libraries (written in swift) loaded all at the same time. Thousands of shared libraries will be saved on disk, and updated every time you upgrade the OS or some apps. The string table (linkedit section) is loaded into memory (shared cow). In a world where every single process uses multiple swift libraries reducing the size of this section is very beneficial. </div><div><br class=""></div><div>Disk and network compressions can help. I believe that we have domain specific information that will allow us do a better job in compressing this section. </div><div><br class=""></div><div>Thanks,</div><div>-Nadav</div><div><br class=""></div><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><div class="">– Steve</div><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Dec 20, 2015, at 5:17 AM, Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" class="">gribozavr@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_extra"><div class="gmail_quote">+ Stephen Canon, because he probably has good ideas in this domain.</div><div class="gmail_quote"><br class=""></div><div class="gmail_quote">On Fri, Dec 18, 2015 at 3:42 PM, Nadav Rotem via swift-dev <span dir="ltr" class=""><<a href="mailto:swift-dev@swift.org" target="_blank" class="">swift-dev@swift.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><b class=""><u class=""><br class=""></u></b></div><div class=""><b class=""><u class="">What’s next?</u></b></div><div class=""><br class=""></div><div class="">The small experiment I described above showed that compressing the names in the string table has a huge potential for reducing the size of swift binaries. I’d like for us (swift-developers) to talk about the implications of this change and start working on the two tasks of tightening our existing mangling format and on implementing a new compression layer on top. </div></div></blockquote><div class=""><br class=""></div><div class="">Hi Nadav,</div><div class=""><br class=""></div><div class="">This is a great start that shows that there is a potential for improvement in our mangled names!</div><div class=""><br class=""></div><div class="">To make this effort more visible, I would suggest creating a bug on <a href="https://bugs.swift.org/" class="">https://bugs.swift.org/</a> .</div><div class=""><br class=""></div><div class="">I think we survey existing solutions that industry has developed for compressing short messages. What comes to mind:</div><div class=""><br class=""></div><div class="">- header compression in HTTP2:</div><div class=""><a href="https://http2.github.io/http2-spec/compression.html" target="_blank" class="">https://http2.github.io/http2-spec/compression.html</a><br class=""></div></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">- PPM algorithms are one of the best-performing compression algorithms for text.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">- Arithmetic coding is also a natural starting point for experimentation.</div><div class="gmail_extra"><br class=""></div>Since the input mangled name also comes in a restricted character set, we could also remove useless bits first, and try an existing compression algorithm on the resulting binary string.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">We should also build a scheme that uses shortest one between the compressed and non-compressed names.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">For running experiments it would be useful to publish a sample corpus of mangled names that we will be using for comparing the algorithms and approaches.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">I also have a concern about making mangled names completely unreadable. Today, I can frequently at least get a gist of what the referenced entity is without a demangler. What we could do is make the name consist of a human-readable prefix that encodes just the base name and a compressed suffix that encodes the rest of the information.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">_T<length><class name><length><method name><compressed suffix></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">We would be able to use references to the class and the method name from the compressed part, so that character data isn't completely wasted.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">This scheme that injects human-readable parts will also allow the debugger to quickly match the names without the need to decompress them.</div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">We should also investigate improving existing mangling scheme to produce shorter results. For example, one idea that comes to mind is using base-60 instead of base-10 for single-digit numbers that that specify identifier length, falling back to base-10 for longer numbers to avoid ambiguity. This would save one character for every identifier longer than 9 characters and shorter than 60, which is actually the common case.<br clear="all" class=""><div class=""><br class=""></div><div class="">Dmitri</div><div class=""><br class=""></div>-- <br class=""><div class="">main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if<br class="">(j){printf("%d\n",i);}}} /*Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com" target="_blank" class="">gribozavr@gmail.com</a>>*/</div>
</div></div>
</div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></body></html>