[swift-evolution] multi-line string literals.

Brent Royal-Gordon brent at architechies.com
Sat Apr 30 22:11:15 CDT 2016


> Second, this proposal should explain why it's reinventing the wheel
> instead of standardizing existing, very successful, prior art.  Answer
> the question: “what compelling advantages does this syntax have over
> Python's?”

Sure.

First of all, I will admit up front that I have not written much Python (a couple weeks ago, "much" would have been "any") and I may not fully understand their string literals. So I'll start by describing my understanding of the design in question; then I'll critique the design as I understand it. So if something in this section is wrong, please forgive any related mistakes in the critique.

Python offers a `"""` string which is almost the same as the `"` string:

	* Every character between the first `"""` and the second `"""` is part of its contents.
	* Escapes are processed normally.
	* There is no special behavior with regards to whitespace.

The only difference is that a `"""` string allows real, unescaped newlines in it, while a `"` string forbids them. (And, of course, since the delimiter is `"""`, the strings `"` and `""` are interpreted literally.)

This approach is really simple, which is a plus, but it has a number of issues.



CONTENT FORMATTING

A number of aspects of the design combine to make `"""` strings harder to read than they should be:

	* You can't indent the contents of a `"""` string to match the code it's in. This is actually pretty shocking considering how sensitive Python is to indentation, and it necessitates a number of strange hacks (for instance, Python's `help()` function unindents all but the first line of doc strings).
	* You can't put all of the contents against the left margin, either, because a newline right after the `"""` is counted as part of the string's contents. (You can use a backslash to work around this.)
	* The last line of the string also has to have the delimiter in it, because again, a newline right before the `"""`is counted as part of the string's contents. (You can use a backslash to work around this, but the backslash is *not* in the mirror position of the start of the string, so good luck remembering it.)

In other words, the first and last lines have to be adulterated by adding a `"""`, and the middle lines can't be indented to line up with either the surrounding code or the beginning of the first line. If one of the selling points of this feature is that you just stick your contents in verbatim without alteration, that isn't great.

This is such a problem that, in researching `"""` to be sure I understood how it works, I came across a Stack Overflow question whose answers are full of people recommending a different, more highly punctuated, feature instead: <http://stackoverflow.com/questions/1520548/how-does-pythons-triple-quote-string-work>

(There is an alternate design which would fix the beginning and end problems: make a newline after the opening delimiter and before the closing delimiter mandatory and part of the delimiter. You might then choose to fix the indentation problem by taking the whitespace between the closing delimiter and the newline before it as the amount of indentation for the entire string, and removing that much indentation from each line. But that's not what Python does, and it's not what you seem to be proposing.)



BREAKING UP EXPRESSIONS

String literals are expressions, and in fact, they are expressions with no side effects. To do anything useful, they *must* be put into a larger expression. Often this expression is an assignment, but it could be anything—concatenation, method call, function parameter, you name it.

This creates a challenge for multiline strings, because they can become very large and effectively break up the expression they're in. The continuation-quote-based multiline strings I'm proposing are aimed primarily at relatively short strings*, where this is less of a concern. But `"""` aims to be used not only for short strings, but for ones which may be many dozens or even hundreds of lines long. You're going to end up with code like:

	print("""<?xml version="1.0"?>
	<catalog>
		<book id="bk101" empty="">
			...
			...
			...a hundred more lines of XML with interpolations in it...
			...
			...
		</book>
	</catalog>""")

What does that `)` mean? Who knows? We saw the beginning of the expression an hour and a half ago. (It's common to avoid this issue by assigning the string to a constant even if it's only going to be used once, but that just changes the problem a little—now you're trying to remember the name of a local variable declared a hundred lines ago.)

Heredocs cleverly avoid this issue by not trying to put the literal's contents in the middle of the expression. Instead, they put a short placeholder in the expression, then start the contents on the next line. The expression is readable as an expression, while the contents of the literal are adjacent but separate. That's why I think they're a better solution than `"""` for truly massive string literals.

* This is something I am not saying in the proposal, but I really should.



NESTING

Another problem is that you don't get another choice besides `"""`. That's not so bad, though, right? It's such an uncommon sequence of characters, surely you'll never encounter it?

Well, sure...until you try to generate code.

For instance, suppose you're writing a web app using a barebones Swift framework and you have a lot of code like this:

	response.send("""<tr>
		<td>\(name)</td>
		<td>\(value)</td>
	</tr>
	""")

Every 90s Perl hacker knows what a pain this is, and every 90s Perl hacker knows the solution: a template language. Hack together some kind of simple syntax for embedding commands in a file of content, and then convert it into runnable code with a tool that does things like:

	print("""
	response.send("""\(escapedContent)""")
	""")

...oh. Wait a minute there.

To get around this, you really need to support, not two delimiters, but *n* delimiters. Heredocs let you choose an arbitrary delimiter. C++ lets you augment the delimiter with arbitrary characters. Perl's `qq` construct lets you choose a single character, but it can be almost anything you want (and some of them nest). I'm thinking about letting you extend the delimiter with an arbitrary number of underscores. All of these solutions have in common that they don't just have "primary" and "alternate" delimiters, but an effectively endless number of them.

`"""` does not have this feature—you just have the primary delimiter and the alternate delimiter, and if neither of them works for you, you have to escape. That isn't ideal.



RUNAWAY LITERALS

`"""` does not offer much help with preventing or diagnosing runaway literals or highlighting code with half-written literals. Heredocs don't either, but I envision heredocs being used less often than `"""` strings would be, since continuation quotes would handle shorter strings.



SYNTAX HIGHLIGHTING

So, let's talk about this:

>>  (like Python's """ strings) which trick some syntax
>>  highlighters into working some of the time with some contents, we don't think
>>  this occasional, accidental compatibility is a big enough gain to justify
>>  changing the design.
> 
> I've never seen a syntax highlighter have problems with it, I don't see
> how it *could* ever cause a problem, and lastly I think it's both naïve
> and presumptuous to call these effects accidental.

I call these effects "accidental" because the syntax highlighter was not designed to handle the `"""`; it just happens to handle it correctly because it misinterprets a `"""` string as an empty `"` string, followed by a non-empty `"` string, followed by another empty `"` string. It's "accidental" from the perspective of the syntax highlighter designer, not the language designer, who probably intended that to happen.

And it only works in a specific subset of cases. It breaks if:

* The syntax highlighter tries to apply smarter per-language rules.
* The syntax highlighter assumes that strings are not allowed to be multi-line. (This is true of many languages, including C derivatives and Swift 2.)
* The string literal contains any `"` characters, which `"""` is often used in order to permit.
* The string literal contains any escapes or special features that the syntax highlighter misinterprets, like an interpolation which itself contains a string literal.

Yes, it will often work, or at least sort-of work. But I just don't see that as very valuable.



WHAT'S GOOD ABOUT `"""`?

In my opinion, the best thing about `"""` (the language feature) is `"""` (the token).

A sequence of three quote marks is a fantastic token for a feature meant to create long string literals. It clearly has something to do with string literals, but it cannot be an empty string, because there are too many quote marks—that is, it's too long. It's a really clever mnemonic which also parses unambiguously.

I've spoken before in this thread and others about potentially using `"""` as an alternate delimiter (which could be extended to `"""""` and beyond). I'm also considering the idea that it might be a good token for a Perl-style heredoc syntax:

	print(""" + e""")
	It was a dark and stormy \(timeOfDay) when 
	"""
	the Swift core team invented the \(interpolation) syntax.
	"""

Nesting could be achieved with a version of whatever alternate delimiter syntax we use for `"` strings. For instance, if we adopted the `_"foo"_` syntax I sketched:

	print(_"""_)
	response.send(""")
	\(escapedContent)
	"""
	_"""_



(P.S. If this post seems way too long to have been written in a couple hours, that's because I've been drafting a version of it on and off for a day or two; it just so happened that Dave directly asked me to confront `"""` today.)

-- 
Brent Royal-Gordon
Architechies



More information about the swift-evolution mailing list