[swift-dev] z/OS, Swift, and encodings
geoff at wozniak.ca
Wed May 10 21:55:24 CDT 2017
I've been part of the team at IBM that has been porting Swift to
z/OS. Although we have a working version of the compiler and
runtime, we’ve had to implement some horrible hacks to get there
and we’re now in the midst of trying to “right our technical
wrongs”. If we don’t there is no way our code can possibly be
pushed upstream and we will forever be downstream consumers,
complete with constant merge pains. Some backstory, quickly.
z/OS is the operating system on IBM’s mainframe systems (“z
Systems” or simply “z”). We had to port LLVM and Clang
(obviously) and implement the backend for the z architecture,
although most of the work was done previously by a team that
ported LLVM/Clang to Linux on z. The LLVM backend is where most
of our code changes have been applied. The changes in Swift are
small in comparison. However, there is one massive elephant in
the room: EBCDIC. The native encoding on mainframes is EBCDIC.
In order to make any progress at the beginning of the project, we
took the huge -- and arguably necessary -- step to change the
internal representation of strings, symbols, and the like to be
EBCDIC. Our hacks revolve around this. We kept Swift strings
themselves in Unicode, but assumed all Swift source code (and LLVM
IR, SIL, etc.) was EBCDIC and converted accordingly. It was ugly,
but it worked. Obviously this violates the Swift spec and is no
good if you attempt to pull in code from other sources, say, via
the package manager. We are now working to eliminate this hack.
As such, we are starting on the assumption that all (textual)
input must be converted to UTF-8. It’s “UTF-8 inside”. Any
conversions to other codesets are done at system boundaries.
Input must come with a codeset and convert if necessary before
being processed, and output may be converted is required (such as
messages to stderr). We are working on a solution now that is
minimally invasive and will have little to no performance impact
on other platforms.
This was the most reasonable approach we could come up with.
Demanding that all input and output be Unicode means that anyone
editing files on a z system will have a hard time. It also makes
development very difficult and tedious, for example, when reading
intermediate file output.
All this leads me to some questions/points.
1) Does the skeletal outline provided above seem reasonable to
others? Are we missing something really important?
2) String and character literals in C++ source code are one of our
biggest issues. The only C++11 compliant compiler for z/OS is an
internal version of IBM’s XL C/C++ compiler. It only handles
EBCDIC, currently. This means the literals in C++ source end up
as EBCDIC. If you convert the input Swift processes to UTF-8,
then comparisons to such literals will fail. The solution we like
the most is to use the C++11/C++17 feature of a ‘u8’ prefix on all
string and character literals. It would be a huge change, but
makes the encoding of literals explicit and involves no extra
build configuration. Without the prefix, we have to resort to
much build hackery by defining our own pre-processor. If anyone
has any ideas or tools that could help in this regard, we’d
appreciate some input.
3) Obviously this is not limited to Swift code; we have to touch
LLVM and Clang libraries. Are those mailing lists better places
to discuss this?
More information about the swift-dev