[swift-dev] Metadata Representation

Saleem Abdulrasool compnerd at compnerd.org
Fri Sep 22 19:39:36 CDT 2017

On Thu, Sep 21, 2017 at 10:28 PM, John McCall <rjmccall at apple.com> wrote:

> On Sep 21, 2017, at 10:10 PM, Saleem Abdulrasool <compnerd at compnerd.org>
> wrote:
> On Thu, Sep 21, 2017 at 5:18 PM, John McCall <rjmccall at apple.com> wrote:
>> On Sep 21, 2017, at 1:26 PM, Saleem Abdulrasool via swift-dev <
>> swift-dev at swift.org> wrote:
>> On Thu, Sep 21, 2017 at 12:04 PM, Joe Groff <jgroff at apple.com> wrote:
>>> On Sep 21, 2017, at 11:49 AM, Saleem Abdulrasool <compnerd at compnerd.org>
>>> wrote:
>>> On Thu, Sep 21, 2017 at 10:53 AM, Joe Groff <jgroff at apple.com> wrote:
>>>> On Sep 21, 2017, at 9:32 AM, Saleem Abdulrasool via swift-dev <
>>>> swift-dev at swift.org> wrote:
>>>> Hello,
>>>> The current layout for the swift metadata for structure types, as
>>>> emitted, seems to be unrepresentable in PE/COFF (at least for x86_64).
>>>> There is a partial listing of the generated code following the message for
>>>> reference.
>>>> When building the standard library, LLVM encounters a relocation which
>>>> cannot be represented.  Tracking down the relocation led to the type
>>>> metadata for SwiftNSOperatingSystemVersion.  The metadata here is
>>>> _T0SC30_SwiftNSOperatingSystemVersionVN.  At +32-bytes we find the
>>>> Kind (1).  So, this is a struct metadata type.  Thus at Offset 1 (+40
>>>> bytes) we have the nominal type descriptor reference.  This is the
>>>> relocation which we fail to represent correctly.  If I'm not mistaken, it
>>>> seems that the field is supposed to be a relative offset to the nominal
>>>> type descriptor.  However, currently, the nominal type descriptor is
>>>> emitted in a different section (.rodata) as opposed to the type descriptor
>>>> (.data).  This cross-section relocation cannot be represented in the file
>>>> format.
>>>> My understanding is that the type metadata will be adjusted during the
>>>> load for the field offsets.  Furthermore, my guess is that the relative
>>>> offset is used to encode the location to avoid a relocation for the load
>>>> address base.  In the case of windows, the based relocations are a given,
>>>> and I'm not sure if there is a better approach to be taken.  There are a
>>>> couple of solutions which immediately spring to mind: moving the nominal
>>>> type descriptor into the (RW) data segment and the other is to adjust the
>>>> ABI to use an absolute relocation which would be rebased.  Given that the
>>>> type metadata may be adjusted means that we cannot emit it into the RO data
>>>> segment.  Is there another solution that I am overlooking which may be
>>>> simpler or better?
>>>> IIRC, this came up when someone was trying to port Swift to Windows on
>>>> ARM as well, and they were able to conditionalize the code so that we used
>>>> absolute pointers on Windows/ARM, and we may have to do the same on Windows
>>>> in general. It may be somewhat more complicated on Win64 since we generally
>>>> assume that relative references can be 32-bit, whereas an absolute
>>>> reference will be 64-bit, so some formats may have to change layout to make
>>>> this work too. I believe Windows' executable loader still ultimately maps
>>>> the final PE image contiguously, so alternatively, you could conceivably
>>>> build a Swift toolchain that used ELF or Mach-O or some other format with
>>>> better support for PIC as the intermediate object format and still linked a
>>>> final PE executable. Using relative references should still be a win on
>>>> Windows both because of the size benefit of being 32-bit and the fact that
>>>> they don't need to be slid when running under ASLR or when a DLL needs to
>>>> be rebased.
>>> Yeah, I tracked down the relativePointer thing.  There is a nice subtle
>>> little warning that it is not fully portable :-).  Would you happen to have
>>> a pointer to where the adjustment for the absolute pointers on WoA is?
>>> You are correct that the image should be contiugously mapped on
>>> Windows.  The idea of MachO as an intermediatary is rather intriguing.
>>> Thinking longer term, maybe we want to use that as a global solution?  It
>>> would also provide a nicer autolinking mechanism for ELF which is the one
>>> target which currently is missing this functionality.  However, if Im not
>>> mistaken, this would require a MachO linker (and the only current viable
>>> MachO linker would be ld64).  The MachO binary would then need to be
>>> converted into ELF or COFF.  This seems like it could take a while to
>>> implement though, but would not really break ABI, so pushing that off to
>>> later may be wise.
>>> Intriguingly, LLVM does support `*-*-win32-macho` as a target triple
>>> already, though I don't know what Mach-O to PE linker (if any) that's
>>> intended to be used with. We implemented relative references using
>>> current-position-relative offsets for Darwin and Linux both because that
>>> still allows for a fairly convenient pointer-like C++ API for working with
>>> relative offsets, and because the established toolchains on those platforms
>>> already have to support PIC so had most of the relocations we needed to
>>> make them work already; is there another base we could use for relative
>>> offsets on Windows that would fit in the set of relocations supported by
>>> standard COFF linkers?
>> Yes, the `-windows-macho` target is used for UEFI :-).  The MachO binary
>> is translated later to PE/COFF as required by the UEFI specification.
>> There are only two relocation types which can be used for relative
>> displacements: __ImageBase relative (IMAGE_REL_*_ADDR32NB) and section
>> relative (IMAGE_REL_*_SECREL) which are relative to the beginning of the
>> section.  The latter is why I mentioned that moving them into the same
>> section could be a solution as that would allow the relative distance to be
>> encoded.  Unfortunately, the section relative relocation is relative to the
>> section within which the symbol is.
>> What's wrong with IMAGE_REL_AMD64_REL32?  We'd have to adjust the
>> relative-pointer logic to store an offset from the end of the relative
>> pointer instead of the beginning, but it doesn't seem to have a section
>> requirement.
> Hmm, is it possible to use RIP relative addressing in data?  If so, yes,
> that could work.
> There's no inherent reason, but I wouldn't put it past the linker to fall
> over and die.  But it should at least be section-agnostic about the target,
> since this is likely to be used for all sorts of PC-relative addressing.
At least MC doesnt seem to like it.  Something like this for example:

  .long 0

  .section .rodata
  .quad data(%rip)

Bails out due to the unexpected modifier.  Now, theoretically, we could
support that modififer, but it does seem pretty odd.

Now, as it so happens, both PE and PE+ have limitations on the file size at
4GiB.  This means that we are guaranteed that the relative difference is
guaranteed to fit within 32-bits. This is where things get really

We cannot generate the relocation because we are emitting the values at
pointer width.  However, the value that we are emitting is a relative
offset, which we just determined to be limited to 32-bits in width.  The
thing is, the IMAGE_REL_AMD64_REL32 doesn't actually seem to care about the
cross-setionness as you pointed out.  So, rather than emitting a
pointer-width value (`.quad`), we could emit a pad (`.long 0`) and follow
that with the relative displacement (`.long <expr>`).  This would be
representable in the PE/COFF model.

If I understand the layout correctly, the type metadata fields are supposed
to be pointer sized.  I assume that we would like to maintain that across
the formats.  It may be possible to alter the emission to change the
relative pointer emission to emit a pair of longs instead for PE/COFF with
a 64-bit pointer value.  Basically, we cannot truncate the relocation to a
IMAGE_REL_AMD64_REL32 but we could generate the appropriate relocation and
pad to the desired width.

Are there any pitfalls that I should be aware of trying to adjust the
emission to do this?  The only downsides that I can see is that the
 emission would need to be taret dependent (that is check the output object
format and the target pointer width).

Thanks for the hint John!  It seems that was spot on :-).

> John.

Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20170922/7acb7703/attachment.html>

More information about the swift-dev mailing list