[swift-evolution] multi-line string literals.

Brent Royal-Gordon brent at architechies.com
Thu Apr 28 16:56:32 CDT 2016


> Awesome.  Some specific suggestions below, but feel free to iterate in a pull request if you prefer that.

I've adopted these suggestions in some form, though I also ended up rewriting the explanation of why the feature was designed as it is and fusing it with material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking John, Tyler, and maybe Chris? Who's supposed to go there?)

Multiline string literals

Proposal: SE-NNNN <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
Author(s): Brent Royal-Gordon <https://github.com/brentdax>
Status: Second Draft
Review manager: TBD
 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction

In Swift 2.2, the only means to insert a newline into a string literal is the \n escape. String literals specified in this way are generally ugly and unreadable. We propose a multiline string feature inspired by English punctuation which is a straightforward extension of our existing string literals.

This proposal is one step in a larger plan to improve how string literals address various challenging use cases. It is not meant to solve all problems with escaping, nor to serve all use cases involving very long string literals. See the "Future directions for string literals in general" section for a sketch of the problems we ultimately want to address and some ideas of how we might do so.

Swift-evolution threads: multi-line string literals. (April) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>, multi-line string literals (December) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft Notes

Removes the comment feature, which was felt to be an unnecessary complication. This and the backslash feature have been listed as future directions. 

Loosens the specification of diagnostics, suggesting instead of requiring fix-its.

Splits a "Rationale" section out of the "Proposed solution" section.

Adds extensive discussion of other features which wold combine with this one.

I've listed only myself as an author because I don't want to put anyone else's name to a document they haven't seen, but there are others who deserve to be listed (John Holdsworth at least). Let me know if you think you should be included.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation

As Swift begins to move into roles beyond app development, code which needs to generate text becomes a more important use case. Consider, for instance, generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\" empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes and run-together lines; it looks like little more than line noise. We can improve its readability somewhat by concatenating separate strings for each line and using real tabs instead of \t escapes:

let xml = "<?xml version=\"1.0\"?>\n" + 
          "<catalog>\n" + 
          " <book id=\"bk101\" empty=\"\">\n" + 
          "     <author>\(author)</author>\n" + 
          " </book>\n" + 
          "</catalog>"
However, this creates a more complex expression for the type checker, and there's still far more punctuation than ought to be necessary. If the most important goal of Swift is making code readable, this kind of code falls far short of that goal.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed solution

We propose that, when Swift is parsing a string literal, if it reaches the end of the line without encountering an end quote, it should look at the next line. If it sees a quote at the beginning (a "continuation quote"), the string literal contains a newline and then continues on that line. Otherwise, the string literal is unterminated and syntactically invalid.

Our sample above could thus be written as:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          "     <author>\(author)</author>
          " </book>
          "</catalog>"
If the second or subsequent lines had not begun with a quotation mark, or the trailing quotation mark after the </catalog>tag had not been included, Swift would have emitted an error.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale

This design is rather unusual, and it's worth pausing a moment to explain why it has been chosen.

The traditional design for this feature, seen in languages like Perl and Python, simply places one delimiter at the beginning of the literal and another at the end. Individual lines in the literal are not marked in any way. 

We think continuation quotes offer several important advantages over the traditional design:

They help the compiler pinpoint errors in string literal delimiting. Traditional multiline strings have a serious weakness: if you forget the closing quote, the compiler has no idea where you wanted the literal to end. It simply continues on until the compiler encounters another quote (or the end of the file). If you're lucky, the text after that quote is not valid code, and the resulting error will at least point you to the next string literal in the file. If you're unlucky, you'll get a seemingly unrelated error several literals later, an unbalanced brace error at the end of the file, or perhaps even code that compiles but does something totally wrong.

(This is not a minor concern. Many popular languages, including C and Swift 2, specifically reject newlines in string literals to prevent this from happening.)

Continuation quotes provide the compiler with redundant information about your intent. If you forget a closing quote, the continuation quotes give the compiler a very good idea of where you meant to put it. The compiler can point you to (or at least very near) the end of the literal, where you want to insert the quote, rather than showing you the beginning of the literal or even some unrelated error later in the file that was caused by the missing quote.

Temporarily unclosed literals don't make editors go haywire. The syntax highlighter has the same trouble parsing half-written, unclosed traditional quotes that the compiler does: It can't tell where the literal is supposed to end and the code should begin. It must either apply heuristics to try to guess where the literal ends, or incorrectly color everything between the opening quote and the next closing quote as a string literal. This can cause the file's coloring to alternate distractingly between "string literal" and "running code".

Continuation quotes give the syntax highlighter enough context to guess at the correct coloration, even when the string isn't complete yet. Lines with a continuation quote are literals; lines without are code. At worst, the syntax highlighter might incorrectly color a few characters at the end of a line, rather than the remainder of the file.

They separate indentation from the string's contents. Traditional multiline strings usually include all of the content between the start and end delimiters, including leading whitespace. This means that it's usually impossible to indent a multiline string, so including one breaks up the flow of the surrounding code, making it less readable. Some languages apply heuristics or mode switches to try to remove indentation, but like all heuristics, these are mistake-prone and murky.

Continuation quotes neatly avoid this problem. Whitespace before the continuation quote is indentation used to format the source code; whitespace after the continuation quote is part of the string literal. The interpretation of the code is perfectly clear to both compiler and programmer.

They improve the ability to quickly recognize the literal. Traditional multiline strings don't provide much visual help. To find the end, you must visually scan until you find the matching delimiter, which may be only one or a few characters long. When looking at a random line of source, it can be hard to tell at a glance whether it's code or literal. Syntax highlighting can help with these issues, but it's often unreliable, especially with advanced, idiosyncratic string literal features like multiline strings.

Continuation quotes solve these problems. To find the end of the literal, just scan down the column of continuation characters until they end. To figure out if a given line of source is part of a literal, just see if it starts with a quote mark. The meaning of the source becomes obvious at a glance.

Nevertheless, the traditional design does has a few advantages:

It is simpler. Although continuation quotes are more complex, we believe that the advantages listed above pay for that complexity.

There is no need to edit the intervening lines to add continuation quotes. While the additional effort required to insert continuation quotes is an important downside, we believe that tool support, including both compiler fix-its and perhaps editor support for commands like "Paste as String Literal", can address this issue. In some editors, new features aren't even necessary; TextMate, for instance, lets you insert a character on several lines simultaneously. And new tool features could also address other issues like escaping embedded quotes.

Naïve syntax highlighters may have trouble understanding this syntax. This is true, but naïve syntax highlighters generally have terrible trouble with advanced string literal constructs; some struggle with even basic ones. While there are some designs (like Python's """ strings) which trick some syntax highlighters into working some of the time with some contents, we don't think this occasional, accidental compatibility is a big enough gain to justify changing the design.

It looks funny—quotes should always be in matched pairs. We aren't aware of another programming language which uses unbalanced quotes in string literals, but there is one very important precedent for this kind of formatting: natural languages. English, for instance, uses a very similar format for quoting multiple lines of dialog by the same speaker. As an English Stack Exchange answer illustrates <http://english.stackexchange.com/a/96613/64636>:

“That seems like an odd way to use punctuation,” Tom said. “What harm would there be in using quotation marks at the end of every paragraph?”

“Oh, that’s not all that complicated,” J.R. answered. “If you closed quotes at the end of every paragraph, then you would need to reidentify the speaker with every subsequent paragraph.

“Say a narrative was describing two or three people engaged in a lengthy conversation. If you closed the quotation marks in the previous paragraph, then a reader wouldn’t be able to easily tell if the previous speaker was extending his point, or if someone else in the room had picked up the conversation. By leaving the previous paragraph’s quote unclosed, the reader knows that the previous speaker is still the one talking.”

“Oh, that makes sense. Thanks!”
In English, omitting the ending quotation mark tells the text's reader that the quote continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader that they're in the middle of a quote.

Similarly, in this proposal, omitting the ending quotation mark tells the code's reader (and compiler) that the string literal continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader (and compiler) that they're in the middle of a string literal.

On balance, we think continuation quotes are the best design for this problem.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed design

When Swift is parsing a string literal and reaches the end of a line without finding a closing quote, it examines the next line, applying the following rules:

If the next line begins with whitespace followed by a continuation quote, then the string literal contains a newline followed by the contents of the string literal starting on that line. (This line may itself have no closing quote, in which case the same rules apply to the line which follows.)

If the next line contains anything else, Swift raises a syntax error for an unterminated string literal. 

The exact error messages and diagnostics provided are left to the implementers to determine, but we believe it should be possible to provide two fix-its which will help users learn the syntax and correct string literal mistakes:

Insert " at the end of the current line to terminate the quote.

Insert " at the beginning of the next line (with some indentation heuristics) to continue the quote on the next line.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact on existing code

Failing to close a string literal before the end of the line is currently a syntax error, so no valid Swift code should be affected by this change.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future directions for multiline string literals

We could permit comments before encountering a continuation quote to be counted as whitespace, and permit empty lines in the middle of string literals. This would allow you to comment out whole lines in the literal.

We could allow you to put a trailing backslash on a line to indicate that the newline isn't "real" and should be omitted from the literal's contents.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future directions for string literals in general

There are other issues with Swift's string handling which this proposal intentionally does not address:

Reducing the amount of double-backslashing needed when working with regular expression libraries, Windows paths, source code generation, and other tasks where backslashes are part of the data.

Alternate delimiters or other strategies for writing strings with " characters in them.

Accommodating code formatting concerns like hard wrapping and commenting.

String literals consisting of very long pieces of text which are best represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might address these issues. Combined, we believe they would address most of the string literal use cases which Swift is currently not very good at.

Please note that these are simply sketches of hypothetical future designs; they may radically change before proposal, and some may never be proposed at all. Many, perhaps most, will not be proposed for Swift 3. We are sketching these designs not to propose and refine these features immediately, but merely to show how we think they might be solved in ways which complement this proposal.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String literal modifiers

A string literal modifier is a cluster of identifier characters which goes before a string literal and adjusts the way it is parsed. Modifers only alter the interpretation of the text in the literal, not the type of data it produces; for instance, there will never be something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a feature; lowercase characters disable a feature.

Modifiers can be attached to both single-line and multiline literals, and could also be attached to other literal syntaxes which might be introduced in the future. When used with multiline strings, only the starting quote needs to carry the modifiers, not the continuation quotes.

Modifiers are an extremely flexible feature which can be used for many proposes. Of the ideas listed below, we believe the e modifier is an urgent addition which should be included in Swift 3 if at all possible; the others are less urgent and most of them could be deferred, or at least added later if time allows.

Escape disabling: e"\\\" (string with three backslash characters)

Fine-grained escape disabling: i"\(foo)\n" (the string \(foo) followed by a newline); eI"\(foo)\n" (the contents of foo followed by the string \n), b"\w+\n" (the string \w+ followed by a newline)

Alternate delimiters: _ has no lowercase form, so it could be used to allow strings with internal quotes: _"print("Hello, world!")"_, __"print("Hello, world!")"__, etc.

Whitespace normalization: changes all runs of whitespace in the literal to single space characters; this would allow you to use multiline strings purely to improve code formatting.

alert.informativeText =
    W"\(appName) could not typeset the element “\(title)” because 
     "it includes a link to an element that has been removed from this 
     "book."
Localization: 

alert.informativeText =
    LW"\(appName) could not typeset the element “\(title)” because 
      "it includes a link to an element that has been removed from this 
      "book."
Comments: Embedding comments in string literals might be useful for literals containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift, perhaps as part of a hygienic macro system. It might also become possible to change the default modifiers applied to literals in a particular file or scope.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the middle of a file full of Swift source code. Maybe the file is essentially a template and the literals are a majority of the code's contents, or maybe you're writing a code generator and just want to get string data into it with minimal fuss, or maybe people unfamiliar with Swift need to be able to edit the literals. Whatever the reason, the normal string literal syntax is just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put a placeholder for a literal on one line; the contents of the literal begin on the next line, running up to some delimiter. It would be possible to put multiple placeholders in a single line, and to apply string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when 
---
the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline string literals bounded by a different delimiter, like """. This might look like:

print("""
It was a dark and stormy \(timeOfDay) when 
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")
Although heredocs could make a good addition to Swift eventually, there are good reasons to defer them for now. Please see the "Alternatives considered" section for details.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First-class regular expressions

Members of the core team are interested in regular expressions, but they don't want to just build a literal that wraps PCRE or libicu; rather, they aim to integrate regexes into the pattern matching system and give them a deep, Perl 6-style rethink. This would be a major effort, far beyond the scope of Swift 3.

In the meantime, the e modifier and perhaps other string literal modifiers will make it easier to specify regular expressions in string literals for use with NSRegularExpression and other libraries accessible from Swift.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives considered

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring no continuation character

The main alternative is to not require a continuation quote, and simply extend the string literal from the starting quote to the ending quote, including all newlines between them. For example:

let xml = "<?xml version=\"1.0\"?>
<catalog>
    <book id=\"bk101\" empty=\"\">
        <author>\(author)</author>
    </book>
</catalog>"
This alternative is extensively discussed in the "Rationale" section above.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution, such as generated code or code which is mostly literals with a little Swift sprinkled around. On the other hand, there are also cases where multiline strings are better: short strings in code which is meant to be read. If a single feature can't handle them both well, there's no shame in supporting the two features separately.

It makes sense to support multiline strings first because:

They extend existing syntax instead of introducing new syntax.

They are much easier to parse; heredocs require some kind of mode in the parser which kicks in at the start of the next line, whereas multiline string literals can be handled in the lexer.

As discussed in "Rationale", they offer better diagnostics, code formatting, and visual scannability.

 <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a different delimiter, """, at the beginning and end of the string, with no continuation characters between. Like heredocs, this might be a good alternative for certain use cases, but it has the same basic flaws as the "no continuation character" solution.

-- 
Brent Royal-Gordon
Architechies

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160428/833f01c5/attachment.html>


More information about the swift-evolution mailing list