[swift-evolution] [Review] SE-0168: Multi-Line String Literals

Adrian Zubarev adrian.zubarev at devandartist.com
Thu Apr 13 04:28:13 CDT 2017

Now that I had some sleep I actually revise my opinion about the last line. I took a few hours to sum my thoughts together in a gist, here is a formatted version of it: https://gist.github.com/DevAndArtist/dae76d4e3d4e49b1fab22ef7e86a87a9

Simple ‘multi-line string literal’ model

Core features:

Omitting of (most) backslashes for ".
Altering the string with implicit new line injection at the end of the line.
Consequences of #1:

To omit escaping the quote character, the delimiter characters for the multi-line string literal will be tripled quotes """, also similar to other programming languages.

When a standard string literal contains at least 5 quotes, then the usage of a multi-line string literal will be shorter.

"<a href=\"\(url)\" id=\"link\(i)\" class=\"link\">"    // With escapes
"""<a href="\(url)" id="link\(i)" class="link">"""      // With tripled literals
Consequences of #2:

To fully support this feature, we need to compromise the design for simplicity and intuitivity.

This feature needs precision for leading and trailing spaces.
Alternatively one would need a way to disable new line injection to also support code formatting.
Two ways of writing a multi-line string literal:

Single line version """abc""" is trivial and already was shown above.

The multi-line version comes with a few compromises for simplicity of rules:

"""   // DS (delimiter start)
foo   // s0
foo   // s1
foo   // s2
"""   // DE (delimiter end)
The string content is always written between the lines DS and DE (delimiter lines).

To not to go the continuation quotes path, the left (or leading) precision is handled by the closing delimiter (1. compromise). The closing delimiter is also responsible for the indent algorithm, which will calculate the stripping prefix in line DE and apply stripping to lines s0 to sn.

Right (or trailing) precision of each line from s0 to sn (notice n equals 2 in the example above) is handled by a backslash character (2. compromise).

The right precision comes at a price of losing the implicit new line injection, however this was also a requested feature (3. compromise). That means that the backslash serves two jobs simultaneously.

New line injection happens only on lines s0 to s(n - 1) (4. and last compromise of the design). The last line sn (or s2 above) does not inject a new line into the final string. This implies that in this line a backslash character handles only right precision, or one could say it’s reduced to one functionality.


Because whitespace is important to these examples, it is explicitly indicated: · is a space, ⇥ is a tab, and ↵ is a newline.

Leading/trailing precision and indent (1. and 2. compromise):

// Nothing to strip in this example (no ident).
let str_1 = """↵  

// No right precision (no backslash) -> whitespaces will be stripped.
let str_2 = """↵  

// Same as `str_2`
let str_3 = """↵  

// Line `DE` of the closing delimiter calculates the indent prefix  
// `··` and strips it from `s0` (left precision).
let str_4 = """↵  

// Line `DE` of the closing delimiter calculates the indent prefix  
// `····` and strips it from `s0` (left precision).
// No right precision (no backslash) -> whitespaces will be stripped.
let str_5 = """↵  

// Line `DE` of the closing delimiter calculates the indent prefix  
// `⇥ ⇥ ` and strips it from `s0` (left precision).
// Right precision is applied (backslash). In this case the literal
// contains only a single line of content, which happens to be   
// also the last line before `DE` -> backslash only serves precision.
let str_6 = """↵  
⇥ ⇥ foo\↵
⇥ ⇥ """

// Line `DE` of the closing delimiter calculates the indent prefix  
// `·⇥ ·⇥ ` and strips it from `s0` (left precision).
// No right precision (no backslash) -> whitespaces will be stripped.
let str_7 = """↵  
·⇥ ·⇥ foo··↵
·⇥ ·⇥ """

let string_1 = "foo"

str_1 == string_1   // => true
str_2 == string_1   // => true
str_3 == string_1   // => true
str_4 == string_1   // => true
str_5 == string_1   // => true
str_6 == string_1   // => true
str_7 == string_1   // => true
A false multi-line string literal, which compiles but emits a warning and proves a fix-it:

let str_8 = """↵  

str_8 == string_1   // => true
warning: missing indentation in multi-line string literal
  Fix-it: Insert "··"
The stripping algorithm calculates the prefix indent from the closing delimiter line DE and tries to strip it in lines s0 to sn if possible, otherwise each line, which could not be handled correctly will emit an individual warning and a fix-it.

The stripping algorithm removes every whitespace on the end of each line from s0 to sn iff there is no right precision, annotated through a backslash like ··foo··\↵. This behavior is essential and aligns well with the precision behavior of a standard string literal " ", otherwise a multi-line string literal like

can contain 3 characters or 10 characters or even 1000 characters, but the developer couldn’t tell or even approximately guess.

The correct way of fixing this, as already mentioned above, is by striping all white spaces after the last non-space character of the line, unless the right precision is explicitly annotated with a backslash.

foo   \
Disabling new line injection (3. compromise):

The examples we’ve used so far had only a single content line, so we couldn’t showcase the behavior yet. New lines are only injected into a multi-line string if it has at least two content lines.

let str_9 = """↵  

let str_10 = """↵  

let string_2 = "foo\nbar"
let string_3 = "foor\nbar\nbaz"

str_9 == string_2  // => true
str_10 == string_3 // => true
To disable new line injection one would need to use the backslash for right precision.

let str_11 = """↵  

let str_12 = """↵  

str_11 == string_2    // => false
str_12 == string_3    // => false

str_11 == "foorbar"   // => true
str_12 == "foobarbaz" // => true
Remember that the last content line sn does not automatically inject a new line into the final string!

New line injection except for the last line (4. compromise):

The standard string literal like "foo" only contains its string content from the starting delimiter to the closing delimiter. The discussion on the mailing list suggests that the multi-line string literal should also go that route and not inject a new line for the last content line sn. str_9 is a good example for that behavior.

Now if one would want a new line at the end of the string, there are a few options to achieve this:

// Natural way:
let str_13 = """↵  

// Remember the last content line does not inject a `\n` character by default
// so there is no need for `\n\` here (but it's possible as well)!
let str_14 = """↵  

str_13 == "foo\nbar\n" // => true
At first glance the behavior in str_13 seems odd and inconsistent, however it actually mimics perfectly the natural way of writing text paragraphs.

[here is a blank line]↵
text text text tex text↵
text text text tex text↵
[here is a blank line]
This is easily expressed with the literal model expressed above:

let myParagraph = """↵
····text text text tex text↵
····text text text tex text↵

Adrian Zubarev
Am 13. April 2017 um 02:39:51, Xiaodi Wu (xiaodi.wu at gmail.com) schrieb:

On Wed, Apr 12, 2017 at 5:20 PM, Brent Royal-Gordon <brent at architechies.com> wrote:
Wow, maybe I shouldn't have slept.

Okay, let's deal with trailing newline first. I'm *very* confident that trailing newlines should be kept by default. This opinion comes from lots of practical experience with multiline string features in other languages. In practice, if you're generating files in a line-oriented way, you're usually generating them a line at a time. It's pretty rare that you want to generate half a line and then add more to it in another statement; it's more likely you'll interpolate the data. I'm not saying it doesn't happen, of course, but it happens a lot less often than you would think just sitting by the fire, drinking whiskey and musing over strings.

I know that, if you're pushing for this feature, it's not satisfying to have the answer be "trust me, it's not what you want". But trust me, it's not what you want.

This is not a very good argument. If you are generating files in a line-oriented way, it is the function _emitting_ the string that handles the line-orientedness, not the string itself. That is the example set by `print()`:

print("Hello, world!") // Emits "Hello, world!\n"

Once upon a time, if I recall, this function was called `println`, but it was renamed. This particular example demonstrates why keeping trailing newlines by default is misguided:

  Hello, world!

Under your proposed rules, this emits "Hello, world!\n\n". It is almost certainly not what you want. Instead, it is a misguided attempt by the designers of multiline string syntax to do the job that the designers of `print()` have already accounted for.

If we were to buy your line of reasoning and adapt it for single-line strings, we would arrive at a rather absurd result. If you're emitting multiple single-line strings, you almost certainly want a space to separate them. Again this is exemplified by the behavior of `print()`:

print("Hello", "Brent!")

This emits "Hello Brent!" (and not "HelloBrent!"). But we do not take that reasoning and demand that "This is my string" end with an default trailing space, nor do we have `+` concatenate strings by default with a separating space.

Moving to the other end, I think we could do a leading newline strip *if* we're willing to create multiline and non-multiline modes—that is, newlines are _not allowed at all_ unless the opening delimiter ends its line and the closing delimiter starts its line (modulo indentation). But I'm reluctant to do that because, well, it's weird and complicated. I also get the feeling that, if there's a single-line mode and a multi-line mode, we ought to treat them as truly orthogonal features and allow `"`-delimited strings to use multi-line mode, but I'm really not convinced that's a good idea.

(Note, by the way, that heredocs—a *really* common multiline string design—always strip the leading newline but not the trailing one.)

Adrian cited this example, where I agree that you really don't want the string to be on the same line as the leading delimiter:

let myReallyLongXMLConstantName = """<?xml version="1.0"?>
                                        <book id="bk101" empty="">
                                           <author>John Doe</author>
                                           <title>XML Developer's Guide</title>

But there are lots of places where it works fine. Is there a good reason to force an additional newline in this?

case .isExprSameType(let from, let to):
return """checking a value with optional type \(from) against dynamic type \(to) \
      succeeds whenever the value is non-'nil'; did you mean to use '!= nil'?\

I mean, we certainly could, but I'm not convinced we should. At least, not yet.

In any case, trailing newline definitely stays. Leading newline, I'm still thinking about.

As for other things:

* I see zero reason to fiddle with trailing whitespace. If it's there, it might be significant or it might not be. If we strip it by default and we shouldn't, the developer has no way to protect it. Let's trust the developer. (And their tooling—Xcode, I believe Git, and most linters already have trailing whitespace features. We don't need them too.)

* Somebody asked about `"""`-delimited heredocs. I think it's a pretty syntax, but it's not compatible with single-line use of `"""`, and I think that's probably more important. We can always add heredocs in another way if we decide we want them. (I think `#to(END)` is another very Swifty syntax we could use for heredocs--less lightweight, but it gives us a Google-able keyword.)

* Literal spaces and tabs cannot be backslashed. This is really important because, if you see a backslash after the last visible character in a line, you can't tell just by looking whether the next character is a space, tab, or newline. So the solution is, if it's not a newline, it's not valid at all.

I'll respond to Jarod separately.

On Apr 12, 2017, at 12:07 PM, John Holdsworth <mac at johnholdsworth.com> wrote:

Finally.. a new Xcode toolchain is available largely in sync with the proposal as is.
(You need to restart Xcode after selecting the toolchain to restart SourceKit)

I personally am undecided whether to remove the first line if it is empty. The new
rules are more consistent but somehow less practical. A blank initial line is almost
never what a user would want and I would tend towards removing it automatically.
This is almost what a user would it expect it to do.

I’m less sure the same applies to the trailing newline. If this is a syntax for
multi-line strings, I'd argue that they should normally be complete lines -
particularly since the final newline can so easily be escaped.

        let longstring = """\
            Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod \
            tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, \
            quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\

        print( """\
            Usage: myapp <options>
            Run myapp to do mything
            -myoption - an option
            """ )

(An explicit “\n" in the string should never be stripped btw)

Can we have a straw poll for the three alternatives:

1) Proposal as it stands  - no magic removal of leading/training blank lines.
2) Removal of a leading blank line when indent stripping is being applied.
3) Removal of leading blank line and trailing newline when indent stripping is being applied.

My vote is for the pragmatic path: 2)

(The main intent of this revision was actually removing the link between how the
string started and whether indent stripping was applied which was unnecessary.)

On 12 Apr 2017, at 17:48, Xiaodi Wu via swift-evolution <swift-evolution at swift.org> wrote:

Agree. I prefer the new rules over the old, but considering common use cases, stripping the leading and trailing newline makes for a more pleasant experience than not stripping either of them.

I think that is generally worth prioritizing over a simpler algorithm or even accommodating more styles. Moreover, a user who wants a trailing or leading newline merely types an extra one if there is newline stripping, so no use cases are made difficult, only a very common one is made more ergonomic.

Brent Royal-Gordon

