[swift-evolution] multi-line string literals.

Dave Abrahams dabrahams at apple.com
Tue May 3 14:27:29 CDT 2016

on Sat Apr 30 2016, Brent Royal-Gordon <swift-evolution at swift.org> wrote:

>> Second, this proposal should explain why it's reinventing the wheel
>> instead of standardizing existing, very successful, prior art.
> Answer
>> the question: “what compelling advantages does this syntax have over
>> Python's?”
> Sure.
> First of all, I will admit up front that I have not written much
> Python (a couple weeks ago, "much" would have been "any") and I may
> not fully understand their string literals. So I'll start by
> describing my understanding of the design in question; then I'll
> critique the design as I understand it. So if something in this
> section is wrong, please forgive any related mistakes in the critique.
> Python offers a `"""` string which is almost the same as the `"`
> string:
> 	* Every character between the first `"""` and the second `"""`
> is part of its contents.
> 	* Escapes are processed normally.
> 	* There is no special behavior with regards to whitespace.
> The only difference is that a `"""` string allows real, unescaped
> newlines in it, while a `"` string forbids them. (And, of course,
> since the delimiter is `"""`, the strings `"` and `""` are interpreted
> literally.)

IMO you really can't consider `"""` without `r`-prefixed raw strings,
and the apostrophe (single-quote) variants, and how they all compose.

> This approach is really simple, which is a plus, but it has a number
> of issues.
> A number of aspects of the design combine to make `"""` strings harder
> to read than they should be:
> 	* You can't indent the contents of a `"""` string to match the
> code it's in. This is actually pretty shocking considering how
> sensitive Python is to indentation, and it necessitates a number of
> strange hacks (for instance, Python's `help()` function unindents all
> but the first line of doc strings).

True, but there's a standard function for stripping off the extra
indentation, if that's important to you.  In some use-cases, it makes
sense, but I wouldn't say it's uniformly desirable to add indentation to
a multiline string literal in the first place.

> 	* You can't put all of the contents against the left margin,
> either, because a newline right after the `"""` is counted as part of
> the string's contents. (You can use a backslash to work around this.)



works fine if a leading newline is important to avoid.

> * The last line of the string also has to have the delimiter in it,
> because again, a newline right before the `"""`is counted as part of
> the string's contents. (You can use a backslash to work around this,
> but the backslash is *not* in the mirror position of the start of the
> string, so good luck remembering it.)



works fine if leading and trailing newlines are important to avoid.  In
my experience with real use-cases, extra newlines are very seldom

> In other words, the first and last lines have to be adulterated by
> adding a `"""`, and the middle lines can't be indented to line up with
> either the surrounding code or the beginning of the first line. If one
> of the selling points of this feature is that you just stick your
> contents in verbatim without alteration, that isn't great.

This argument doesn't make any sense to me.  You *do* just stick your
contents in (between `"""`s) without alteration. The fact that you may
see the quotes on the same line as the content doesn't change that.

> This is such a problem that, in researching `"""` to be sure I
> understood how it works, I came across a Stack Overflow question whose
> answers are full of people recommending a different, more highly
> punctuated, feature instead:
> <http://stackoverflow.com/questions/1520548/how-does-pythons-triple-quote-string-work>

That's because the OP didn't *want* a multiline string in the first
place.  The answer he wanted contained *no newlines*.  He merely picked
the wrong tool for the job.  This is a very weak argument against `"""`.

> (There is an alternate design which would fix the beginning and end
> problems: make a newline after the opening delimiter and before the
> closing delimiter mandatory and part of the delimiter. You might then
> choose to fix the indentation problem by taking the whitespace between
> the closing delimiter and the newline before it as the amount of
> indentation for the entire string, and removing that much indentation
> from each line. But that's not what Python does, and it's not what you
> seem to be proposing.)
> String literals are expressions, and in fact, they are expressions
> with no side effects. To do anything useful, they *must* be put into a
> larger expression. Often this expression is an assignment, but it
> could be anything—concatenation, method call, function parameter, you
> name it.

How is this different from array literals?

> This creates a challenge for multiline strings, because they can
> become very large and effectively break up the expression they're
> in. The continuation-quote-based multiline strings I'm proposing are
> aimed primarily at relatively short strings*, where this is less of a
> concern. 

Why are relatively short strings the right target for a multiline string
proposal?  I think this really goes to the crux of the question: **what
is the use case you're aiming at, how does this proposal address that
use case, and why is that use-case more important than others**.?

> But `"""` aims to be used not only for short strings, but for ones
> which may be many dozens or even hundreds of lines long. You're going
> to end up with code like:
> 	print("""<?xml version="1.0"?>
> 	<catalog>
> 		<book id="bk101" empty="">
> 			...
> 			...
> 			...a hundred more lines of XML with
> interpolations in it...
> 			...
> 			...
> 		</book>
> 	</catalog>""")
> What does that `)` mean? Who knows? We saw the beginning of the
> expression an hour and a half ago. (It's common to avoid this issue by
> assigning the string to a constant even if it's only going to be used
> once, but that just changes the problem a little—now you're trying to
> remember the name of a local variable declared a hundred lines ago.)

This seems to be an argument that one shouldn't create large literals of
any kind.

> Heredocs cleverly avoid this issue by not trying to put the literal's
> contents in the middle of the expression. Instead, they put a short
> placeholder in the expression, then start the contents on the next
> line. The expression is readable as an expression, while the contents
> of the literal are adjacent but separate. That's why I think they're a
> better solution than `"""` for truly massive string literals.

I don't see how this addresses your complaint.  By the time you get to
the closing delimiter you've still lost track of any context and you
don't know how the string is being used.  Furthermore, there's no reason
to think you'll even recognize the closing delimiter.

> * This is something I am not saying in the proposal, but I really
> should.
> Another problem is that you don't get another choice besides
> `"""`. That's not so bad, though, right? It's such an uncommon
> sequence of characters, surely you'll never encounter it?
> Well, sure...until you try to generate code.
> For instance, suppose you're writing a web app using a barebones Swift
> framework and you have a lot of code like this:
> 	response.send("""<tr>
> 		<td>\(name)</td>
> 		<td>\(value)</td>
> 	</tr>
> 	""")
> Every 90s Perl hacker knows what a pain this is, 

Sorry, where's the pain here?

> and every 90s Perl hacker knows the solution: a template language.

You mean https://github.com/apple/swift/blob/master/utils/gyb.py?

> Hack together some kind of simple syntax for embedding commands in a
> file of content, and then convert it into runnable code with a tool
> that does things like:
> 	print("""
> 	response.send("""\(escapedContent)""")
> 	""")
> ...oh. Wait a minute there.

You seem to understand what you're saying here, but I don't get it.  I
truly don't understand why you'd use `"""` strings for generalized
substitutions like this in your template language.  You get to design
the language.

> To get around this, you really need to support, not two delimiters,
> but *n* delimiters. Heredocs let you choose an arbitrary
> delimiter. C++ lets you augment the delimiter with arbitrary
> characters. Perl's `qq` construct lets you choose a single character,
> but it can be almost anything you want (and some of them nest). I'm
> thinking about letting you extend the delimiter with an arbitrary
> number of underscores. All of these solutions have in common that they
> don't just have "primary" and "alternate" delimiters, but an
> effectively endless number of them.
> `"""` does not have this feature—you just have the primary delimiter
> and the alternate delimiter, and if neither of them works for you, you
> have to escape. That isn't ideal.
> `"""` does not offer much help with preventing or diagnosing runaway
> literals or highlighting code with half-written literals. Heredocs
> don't either, but I envision heredocs being used less often than `"""`
> strings would be, since continuation quotes would handle shorter
> strings.
> So, let's talk about this:
>>>  (like Python's """ strings) which trick some syntax
>>> highlighters into working some of the time with some contents, we
> don't think
>>> this occasional, accidental compatibility is a big enough gain to
> justify
>>>  changing the design.
>> I've never seen a syntax highlighter have problems with it, I don't
> see
>> how it *could* ever cause a problem, and lastly I think it's both
> naïve
>> and presumptuous to call these effects accidental.
> I call these effects "accidental" because the syntax highlighter was
> not designed to handle the `"""`; it just happens to handle it
> correctly because it misinterprets a `"""` string as an empty `"`
> string, followed by a non-empty `"` string, followed by another empty
> `"` string. It's "accidental" from the perspective of the syntax
> highlighter designer, not the language designer, who probably intended
> that to happen.
> And it only works in a specific subset of cases. It breaks if:
> * The syntax highlighter tries to apply smarter per-language rules.
> * The syntax highlighter assumes that strings are not allowed to be
> multi-line. (This is true of many languages, including C derivatives
> and Swift 2.)
> * The string literal contains any `"` characters, which `"""` is often
> used in order to permit.

Okay, that's fair.

> * The string literal contains any escapes or special features that the
> syntax highlighter misinterprets, like an interpolation which itself
> contains a string literal.
> Yes, it will often work, or at least sort-of work. But I just don't
> see that as very valuable.
> In my opinion, the best thing about `"""` (the language feature) is
> `"""` (the token).
> A sequence of three quote marks is a fantastic token for a feature
> meant to create long string literals. It clearly has something to do
> with string literals, but it cannot be an empty string, because there
> are too many quote marks—that is, it's too long. It's a really clever
> mnemonic which also parses unambiguously.
> I've spoken before in this thread and others about potentially using
> `"""` as an alternate delimiter (which could be extended to `"""""`
> and beyond). I'm also considering the idea that it might be a good
> token for a Perl-style heredoc syntax:
> 	print(""" + e""")
> 	It was a dark and stormy \(timeOfDay) when 
> 	"""
> 	the Swift core team invented the \(interpolation) syntax.
> 	"""
> Nesting could be achieved with a version of whatever alternate
> delimiter syntax we use for `"` strings. For instance, if we adopted
> the `_"foo"_` syntax I sketched:
> 	print(_"""_)
> 	response.send(""")
> 	\(escapedContent)
> 	"""
> 	_"""_
> (P.S. If this post seems way too long to have been written in a couple
> hours, that's because I've been drafting a version of it on and off
> for a day or two; it just so happened that Dave directly asked me to
> confront `"""` today.)


More information about the swift-evolution mailing list