[swift-dev] z/OS, Swift, and encodings

Wed May 10 21:55:24 CDT 2017

I've been part of the team at IBM that has been porting Swift to 
z/OS.  Although we have a working version of the compiler and 
runtime, we’ve had to implement some horrible hacks to get there 
and we’re now in the midst of trying to “right our technical 
wrongs”.  If we don’t there is no way our code can possibly be 
pushed upstream and we will forever be downstream consumers, 
complete with constant merge pains.   Some backstory, quickly. 
z/OS is the operating system on IBM’s mainframe systems (“z 
Systems” or simply “z”).  We had to port LLVM and Clang 
(obviously) and implement the backend for the z architecture, 
although most of the work was done previously by a team that 
ported LLVM/Clang to Linux on z.  The LLVM backend is where most 
of our code changes have been applied.  The changes in Swift are 
small in comparison.   However, there is one massive elephant in 
the room: EBCDIC.  The native encoding on mainframes is EBCDIC. 
In order to make any progress at the beginning of the project, we 
took the huge -- and arguably necessary -- step to change the 
internal representation of strings, symbols, and the like to be 
EBCDIC.  Our hacks revolve around this.  We kept Swift strings 
themselves in Unicode, but assumed all Swift source code (and LLVM 
IR, SIL, etc.) was EBCDIC and converted accordingly.  It was ugly, 
but it worked.   Obviously this violates the Swift spec and is no 
good if you attempt to pull in code from other sources, say, via 
the package manager.  We are now working to eliminate this hack. 
As such, we are starting on the assumption that all (textual) 
input must be converted to UTF-8.  It’s “UTF-8 inside”.  Any 
conversions to other codesets are done at system boundaries. 
Input must come with a codeset and convert if necessary before 
being processed, and output may be converted is required (such as 
messages to stderr).  We are working on a solution now that is 
minimally invasive and will have little to no performance impact 
on other platforms.

This was the most reasonable approach we could come up with. 
Demanding that all input and output be Unicode means that anyone 
editing files on a z system will have a hard time.  It also makes 
development very difficult and tedious, for example, when reading 
intermediate file output.

All this leads me to some questions/points.

1) Does the skeletal outline provided above seem reasonable to 
others?  Are we missing something really important?

2) String and character literals in C++ source code are one of our 
biggest issues.  The only C++11 compliant compiler for z/OS is an 
internal version of IBM’s XL C/C++ compiler.  It only handles 
EBCDIC, currently.  This means the literals in C++ source end up 
as EBCDIC.  If you convert the input Swift processes to UTF-8, 
then comparisons to such literals will fail.  The solution we like 
the most is to use the C++11/C++17 feature of a ‘u8’ prefix on all 
string and character literals.  It would be a huge change, but 
makes the encoding of literals explicit and involves no extra 
build configuration.  Without the prefix, we have to resort to 
much build hackery by defining our own pre-processor.  If anyone 
has any ideas or tools that could help in this regard, we’d 
appreciate some input.

3) Obviously this is not limited to Swift code; we have to touch 
LLVM and Clang libraries.  Are those mailing lists better places 
to discuss this?

-- Geoff