[swift-evolution] multi-line string literals.

Fri May 6 00:52:02 CDT 2016

> As far as mixed whitespace, I think the only sane thing to do would be to only allow leading tabs *or* spaces.  Mixing tabs and spaces in the leading whitespace would be a syntax error.  All lines in the string would need to use tabs or all lines use spaces, you could not have one line with tabs and another with spaces.  This would keep the compiler out of the business of making any assumptions or guesses, would not be a problem often, and would be very easy to fix if it ever happens accidentally.

The sane thing to do would be to require every line be prefixed with *exactly* the same sequence of characters as the closing delimiter line. Anything else (except perhaps a completely blank line, to permit whitespace trimming) would be a syntax error.

But take a moment to consider the downsides before you leap to adopt this solution.

1. You have introduced tab-space confusion into the equation.

2. You have introduced trailing-newline confusion into the equation.

3. The #escaped and #marginStripped keywords are now specific to multiline strings; #escaped in particular will be attractive there for tasks like regexes. You will have to invent a different syntax for it there.

4. This form of `"""` is not useful for not having to escape `"` in a single-line string; you now have to invent a separate mechanism for that.

5. You can't necessarily look at a line and tell whether it's code or string. And—especially with the #escaped-style constructs—the delimiters don't necessarily "pop" visually; they're too small and easy to miss compared to the text they contain. In extremis, you actually have to look at the entire file from top to bottom, counting the `"""`s to figure out whether you're in a string or not. Granted, you *usually* can tell from context, but it's a far cry from what continuation quotes offer.

6. You are now forcing *any* string literal of more than one line to include two extra lines devoted wholly to the quoting syntax. In my Swift-generating example, that would change shorter snippets like this:

code +=      "    
             "    static var messages: [HTTPStatus: String] = [
             ""

Into things like this:

code +=      """

                 static var messages: [HTTPStatus: String] = [

             """

To my mind, the second syntax is actually *heavier*, despite not requiring every line be marked, because it takes two extra lines and additional punctuation.

7. You are also introducing visual ambiguity into the equation—in the above example, the left margin is now ambiguous to the eye (even if it's not ambiguous to the compiler). You could recover it by permitting non-whitespace prefix characters:

code +=      """
            |    
            |    static var messages: [HTTPStatus: String] = [
            |
            |"""

...but then we're back to annotating every line, *plus* we have the leading and trailing `"""` lines. Worst of both worlds.

8. In longer examples, you are dividing the expression in half in a way that makes it difficult to read. For instance, consider this code:

        socket.send( 
            """ #escaped #marginStripped 
            <?xml version="1.0"?>
            <catalog>
               <book id="bk101" empty="">
                   <author>\(author)</author>
                   <title>XML Developer's Guide</title>
                   <genre>Computer</genre>
                   <price>44.95</price>
                   <publish_date>2000-10-01</publish_date>
                   <description>An in-depth look at creating applications with XML.</description>
               </book>
            </catalog>
            """.data(using: NSUTF8StringEncoding))

The effect—particularly with even larger literals than this—is not unlike pausing in the middle of reading an article to watch a movie. What were we talking about again?

This problem is neatly avoided by a heredoc syntax, which keeps the expression together and then collects the string below it:

        socket.send(""".data(using: NSUTF8StringEncoding))
            <?xml version="1.0"?>
            <catalog>
               <book id="bk101" empty="">
                   <author>\(author)</author>
                   <title>XML Developer's Guide</title>
                   <genre>Computer</genre>
                   <price>44.95</price>
                   <publish_date>2000-10-01</publish_date>
                   <description>An in-depth look at creating applications with XML.</description>
               </book>
            </catalog>
            """

(I'm assuming there's no need for #escaped or #marginStripped; they're both enabled by default.)

* * *

Let's actually talk about heredocs. Leaving aside indentation (which can be applied to either feature) and the traditional token choices (which can be changed), I think these are the pros of heredocs compared to Python triple-quotes:

H1: Doesn't break up expressions, as discussed above.
H2: Literal content formatting is completely unaffected by code formatting, including the first and last lines.

Here are the pros of Python triple-quotes compared to heredocs:

P1: Simpler to explain: "like a string literal, but really big".
P2: Lighter syntactic weight, enough to make`"""` usable as a single-line syntax.
P3: Less trailing-newline confusion.

(There is one other difference: `"""` is simpler to parse, so we might be able to get it in Swift 3, whereas heredocs probably have to wait for Swift 4. But I don't think we should pick one feature over another merely so we can get it sooner. It's one thing if you plan to eventually introduce both features, as I plan to eventually have both continuation quotes and heredocs, to introduce each of them as soon as you can; it's another to actually choose one feature over another specifically to get something you can implement sooner.)

But the design you're discussing trades P2 and P3—and frankly, with the mandatory newlines, part of P1—away in an attempt to get H2. So we end up deciding between these two selling points:

* This triple-quotes design: Simpler to explain.
* Heredocs: Doesn't break up expressions.

Simplicity is good, but I really like the code reading benefits of heredocs. Your code is your code and your text is your text. The interface between them is a bit funky, but within their separate worlds, they're both pretty nice.

* * *

Either way, heredocs or multiline-only triple quotes could be tweaked to support indentation by using the indentation of the end delimiter. But as I explained above, I don't think that's a great idea for either triple quotes *or* heredocs—the edge of the indentation is not visually well defined enough.

That's why I came to the conclusion that trying to cram every multiline literal into one syntax is trying to cram too many peg shapes into one hole shape. Indentation should *only* be supported by a dedicated syntax which is also designed for the smallest multiline strings, where indentation support is most useful. A separate feature without indentation support should handle longer strings, where the length alone is so disruptive to the flow of your code that there's just no point even trying to indent them to match (and the break with normal indentation itself assists you in finding the end of the string).

And I think that the best choice for the first feature is continuation quotes, and for the second is heredocs. Triple-quote syntaxes—either Python's or this modification—are jacks of all trades, but masters of none.

-- 
Brent Royal-Gordon
Architechies

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160505/262b094e/attachment.html>