[swift-dev] Reducing the size of Swift binaries by shortening symbols

Sun Dec 20 04:17:31 CST 2015

+ Stephen Canon, because he probably has good ideas in this domain.

On Fri, Dec 18, 2015 at 3:42 PM, Nadav Rotem via swift-dev <
swift-dev at swift.org> wrote:

>
> *What’s next?*
>
> The small experiment I described above showed that compressing the names
> in the string table has a huge potential for reducing the size of swift
> binaries. I’d like for us (swift-developers) to talk about the implications
> of this change and start working on the two tasks of tightening our
> existing mangling format and on implementing a new compression layer on
> top.
>

Hi Nadav,

This is a great start that shows that there is a potential for improvement
in our mangled names!

To make this effort more visible, I would suggest creating a bug on
https://bugs.swift.org/ .

I think we survey existing solutions that industry has developed for
compressing short messages.  What comes to mind:

- header compression in HTTP2:
https://http2.github.io/http2-spec/compression.html

- PPM algorithms are one of the best-performing compression algorithms for
text.

- Arithmetic coding is also a natural starting point for experimentation.

Since the input mangled name also comes in a restricted character set, we
could also remove useless bits first, and try an existing compression
algorithm on the resulting binary string.

We should also build a scheme that uses shortest one between the compressed
and non-compressed names.

For running experiments it would be useful to publish a sample corpus of
mangled names that we will be using for comparing the algorithms and
approaches.

I also have a concern about making mangled names completely unreadable.
Today, I can frequently at least get a gist of what the referenced entity
is without a demangler.  What we could do is make the name consist of a
human-readable prefix that encodes just the base name and a compressed
suffix that encodes the rest of the information.

_T<length><class name><length><method name><compressed suffix>

We would be able to use references to the class and the method name from
the compressed part, so that character data isn't completely wasted.

This scheme that injects human-readable parts will also allow the debugger
to quickly match the names without the need to decompress them.

We should also investigate improving existing mangling scheme to produce
shorter results.  For example, one idea that comes to mind is using base-60
instead of base-10 for single-digit numbers that that specify identifier
length, falling back to base-10 for longer numbers to avoid ambiguity.
This would save one character for every identifier longer than 9 characters
and shorter than 60, which is actually the common case.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20151220/bde5840b/attachment.html>