[swift-evolution] multi-line string literals.

Vladimir.S svabox at gmail.com
Fri Apr 29 00:48:25 CDT 2016


On 29.04.2016 1:31, Michael Peternell via swift-evolution wrote:
> is it just me who would prefer a multiline string literal to not require
> a \backslash before each "double quote"?

You are not alone ;-)
But, as I understand, the proposal does not even try to solve a problem of 
*as-is* text in sources, but is fighting against just \n"+ at the end of 
the string. That is what is proposed. I don't feel like it is a valuable 
improvement, but it is OK for me to have such feature in language.

IMO we need just 2 variants: current method where we can use all the 
escaped chars, interpolation, \n and closing quotes, and additionally 
should have a feature to paste text *as-is*, without escapes and 
interpolation. For example :

let xml = "\
|<?xml version="1.0"?>
|<catalog>
|    <book id="myid" empty="">
|        <author>myAuthor</author>
|        <title>myTitle \tutorial 1\(edition 2)</title>
|    </book>
|</catalog>
"

or

let xml = _"
"<?xml version="1.0"?>
"<catalog>
"    <book id="myid" empty="">
"        <author>myAuthor</author>
"        <title>myTitle \tutorial 1\(edition 2)</title>
"    </book>
"</catalog>


>
> Did you ever really use multiline string literals before? I did, and
> it's mostly for quick hacks where I wrote a script or tried something
> out quickly. And maybe I needed to put an XML snippet into a unit test
> case to see if my parser correctly parses or correctly rejects the
> snippet. The current proposal doesn't help this use case in any way. I
> cannot see which use case inspires multiline string literals which
> require double quotes to be escaped... I wouldn't use them if they were
> available. I'd become an Android developer instead ;)
>
> -Michael
>
>> Am 28.04.2016 um 23:56 schrieb Brent Royal-Gordon via swift-evolution
>> <swift-evolution at swift.org>:
>>
>>> Awesome.  Some specific suggestions below, but feel free to iterate
>>> in a pull request if you prefer that.
>>
>> I've adopted these suggestions in some form, though I also ended up
>> rewriting the explanation of why the feature was designed as it is and
>> fusing it with material from "Alternatives considered".
>>
>> (Still not sure who I should list as a co-author. I'm currently
>> thinking John, Tyler, and maybe Chris? Who's supposed to go there?)
>>
>> Multiline string literals
>>
>> • Proposal: SE-NNNN • Author(s): Brent Royal-Gordon • Status: Second
>> Draft • Review manager: TBD Introduction
>>
>> In Swift 2.2, the only means to insert a newline into a string literal
>> is the \n escape. String literals specified in this way are generally
>> ugly and unreadable. We propose a multiline string feature inspired by
>> English punctuation which is a straightforward extension of our
>> existing string literals.
>>
>> This proposal is one step in a larger plan to improve how string
>> literals address various challenging use cases. It is not meant to
>> solve all problems with escaping, nor to serve all use cases involving
>> very long string literals. See the "Future directions for string
>> literals in general" section for a sketch of the problems we
>> ultimately want to address and some ideas of how we might do so.
>>
>> Swift-evolution threads: multi-line string literals. (April),
>> multi-line string literals (December)
>>
>> Draft Notes
>>
>> • Removes the comment feature, which was felt to be an unnecessary
>> complication. This and the backslash feature have been listed as
>> future directions.
>>
>> • Loosens the specification of diagnostics, suggesting instead of
>> requiring fix-its.
>>
>> • Splits a "Rationale" section out of the "Proposed solution"
>> section.
>>
>> • Adds extensive discussion of other features which wold combine with
>> this one.
>>
>> • I've listed only myself as an author because I don't want to put
>> anyone else's name to a document they haven't seen, but there are
>> others who deserve to be listed (John Holdsworth at least). Let me
>> know if you think you should be included.
>>
>> Motivation
>>
>> As Swift begins to move into roles beyond app development, code which
>> needs to generate text becomes a more important use case. Consider,
>> for instance, generating even a small XML string:
>>
>> let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
>> empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
>> The string is practically unreadable, its structure drowned in escapes
>> and run-together lines; it looks like little more than line noise. We
>> can improve its readability somewhat by concatenating separate strings
>> for each line and using real tabs instead of \t escapes:
>>
>> let xml = "<?xml version=\"1.0\"?>\n" +
>>
>>
>> "<catalog>\n" +
>>
>>
>> " <book id=\"bk101\" empty=\"\">\n" +
>>
>>
>> "     <author>\(author)</author>\n" +
>>
>>
>> " </book>\n" +
>>
>>
>> "</catalog>" However, this creates a more complex expression for the
>> type checker, and there's still far more punctuation than ought to be
>> necessary. If the most important goal of Swift is making code
>> readable, this kind of code falls far short of that goal.
>>
>> Proposed solution
>>
>> We propose that, when Swift is parsing a string literal, if it reaches
>> the end of the line without encountering an end quote, it should look
>> at the next line. If it sees a quote at the beginning (a "continuation
>> quote"), the string literal contains a newline and then continues on
>> that line. Otherwise, the string literal is unterminated and
>> syntactically invalid.
>>
>> Our sample above could thus be written as:
>>
>> let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
>> empty=\"\"> "     <author>\(author)</author> " </book> "</catalog>"
>>
>> If the second or subsequent lines had not begun with a quotation mark,
>> or the trailing quotation mark after the </catalog>tag had not been
>> included, Swift would have emitted an error.
>>
>> Rationale
>>
>> This design is rather unusual, and it's worth pausing a moment to
>> explain why it has been chosen.
>>
>> The traditional design for this feature, seen in languages like Perl
>> and Python, simply places one delimiter at the beginning of the
>> literal and another at the end. Individual lines in the literal are
>> not marked in any way.
>>
>> We think continuation quotes offer several important advantages over
>> the traditional design:
>>
>> • They help the compiler pinpoint errors in string literal delimiting.
>> Traditional multiline strings have a serious weakness: if you forget
>> the closing quote, the compiler has no idea where you wanted the
>> literal to end. It simply continues on until the compiler encounters
>> another quote (or the end of the file). If you're lucky, the text
>> after that quote is not valid code, and the resulting error will at
>> least point you to the next string literal in the file. If you're
>> unlucky, you'll get a seemingly unrelated error several literals
>> later, an unbalanced brace error at the end of the file, or perhaps
>> even code that compiles but does something totally wrong.
>>
>> (This is not a minor concern. Many popular languages, including C and
>> Swift 2, specifically reject newlines in string literals to prevent
>> this from happening.)
>>
>> Continuation quotes provide the compiler with redundant information
>> about your intent. If you forget a closing quote, the continuation
>> quotes give the compiler a very good idea of where you meant to put
>> it. The compiler can point you to (or at least very near) the end of
>> the literal, where you want to insert the quote, rather than showing
>> you the beginning of the literal or even some unrelated error later in
>> the file that was caused by the missing quote.
>>
>> • Temporarily unclosed literals don't make editors go haywire. The
>> syntax highlighter has the same trouble parsing half-written, unclosed
>> traditional quotes that the compiler does: It can't tell where the
>> literal is supposed to end and the code should begin. It must either
>> apply heuristics to try to guess where the literal ends, or
>> incorrectly color everything between the opening quote and the next
>> closing quote as a string literal. This can cause the file's coloring
>> to alternate distractingly between "string literal" and "running
>> code".
>>
>> Continuation quotes give the syntax highlighter enough context to
>> guess at the correct coloration, even when the string isn't complete
>> yet. Lines with a continuation quote are literals; lines without are
>> code. At worst, the syntax highlighter might incorrectly color a few
>> characters at the end of a line, rather than the remainder of the
>> file.
>>
>> • They separate indentation from the string's contents. Traditional
>> multiline strings usually include all of the content between the start
>> and end delimiters, including leading whitespace. This means that it's
>> usually impossible to indent a multiline string, so including one
>> breaks up the flow of the surrounding code, making it less readable.
>> Some languages apply heuristics or mode switches to try to remove
>> indentation, but like all heuristics, these are mistake-prone and
>> murky.
>>
>> Continuation quotes neatly avoid this problem. Whitespace before the
>> continuation quote is indentation used to format the source code;
>> whitespace after the continuation quote is part of the string literal.
>> The interpretation of the code is perfectly clear to both compiler and
>> programmer.
>>
>> • They improve the ability to quickly recognize the literal.
>> Traditional multiline strings don't provide much visual help. To find
>> the end, you must visually scan until you find the matching delimiter,
>> which may be only one or a few characters long. When looking at a
>> random line of source, it can be hard to tell at a glance whether it's
>> code or literal. Syntax highlighting can help with these issues, but
>> it's often unreliable, especially with advanced, idiosyncratic string
>> literal features like multiline strings.
>>
>> Continuation quotes solve these problems. To find the end of the
>> literal, just scan down the column of continuation characters until
>> they end. To figure out if a given line of source is part of a
>> literal, just see if it starts with a quote mark. The meaning of the
>> source becomes obvious at a glance.
>>
>> Nevertheless, the traditional design does has a few advantages:
>>
>> • It is simpler. Although continuation quotes are more complex, we
>> believe that the advantages listed above pay for that complexity.
>>
>> • There is no need to edit the intervening lines to add continuation
>> quotes. While the additional effort required to insert continuation
>> quotes is an important downside, we believe that tool support,
>> including both compiler fix-its and perhaps editor support for
>> commands like "Paste as String Literal", can address this issue. In
>> some editors, new features aren't even necessary; TextMate, for
>> instance, lets you insert a character on several lines simultaneously.
>> And new tool features could also address other issues like escaping
>> embedded quotes.
>>
>> • Naïve syntax highlighters may have trouble understanding this
>> syntax. This is true, but naïve syntax highlighters generally have
>> terrible trouble with advanced string literal constructs; some
>> struggle with even basic ones. While there are some designs (like
>> Python's """ strings) which trick some syntax highlighters into
>> working some of the time with some contents, we don't think this
>> occasional, accidental compatibility is a big enough gain to justify
>> changing the design.
>>
>> • It looks funny—quotes should always be in matched pairs. We aren't
>> aware of another programming language which uses unbalanced quotes in
>> string literals, but there is one very important precedent for this
>> kind of formatting: natural languages. English, for instance, uses a
>> very similar format for quoting multiple lines of dialog by the same
>> speaker. As an English Stack Exchange answer illustrates:
>>
>> “That seems like an odd way to use punctuation,” Tom said. “What harm
>> would there be in using quotation marks at the end of every
>> paragraph?”
>>
>> “Oh, that’s not all that complicated,” J.R. answered. “If you closed
>> quotes at the end of every paragraph, then you would need to
>> reidentify the speaker with every subsequent paragraph.
>>
>> “Say a narrative was describing two or three people engaged in a
>> lengthy conversation. If you closed the quotation marks in the
>> previous paragraph, then a reader wouldn’t be able to easily tell if
>> the previous speaker was extending his point, or if someone else in
>> the room had picked up the conversation. By leaving the previous
>> paragraph’s quote unclosed, the reader knows that the previous speaker
>> is still the one talking.”
>>
>> “Oh, that makes sense. Thanks!” In English, omitting the ending
>> quotation mark tells the text's reader that the quote continues on the
>> next line, while including a quotation mark at the beginning of the
>> next line reminds the reader that they're in the middle of a quote.
>>
>> Similarly, in this proposal, omitting the ending quotation mark tells
>> the code's reader (and compiler) that the string literal continues on
>> the next line, while including a quotation mark at the beginning of
>> the next line reminds the reader (and compiler) that they're in the
>> middle of a string literal.
>>
>> On balance, we think continuation quotes are the best design for this
>> problem.
>>
>> Detailed design
>>
>> When Swift is parsing a string literal and reaches the end of a line
>> without finding a closing quote, it examines the next line, applying
>> the following rules:
>>
>> • If the next line begins with whitespace followed by a continuation
>> quote, then the string literal contains a newline followed by the
>> contents of the string literal starting on that line. (This line may
>> itself have no closing quote, in which case the same rules apply to
>> the line which follows.)
>>
>> • If the next line contains anything else, Swift raises a syntax error
>> for an unterminated string literal.
>>
>> The exact error messages and diagnostics provided are left to the
>> implementers to determine, but we believe it should be possible to
>> provide two fix-its which will help users learn the syntax and correct
>> string literal mistakes:
>>
>> • Insert " at the end of the current line to terminate the quote.
>>
>> • Insert " at the beginning of the next line (with some indentation
>> heuristics) to continue the quote on the next line.
>>
>> Impact on existing code
>>
>> Failing to close a string literal before the end of the line is
>> currently a syntax error, so no valid Swift code should be affected by
>> this change.
>>
>> Future directions for multiline string literals
>>
>> • We could permit comments before encountering a continuation quote to
>> be counted as whitespace, and permit empty lines in the middle of
>> string literals. This would allow you to comment out whole lines in
>> the literal.
>>
>> • We could allow you to put a trailing backslash on a line to indicate
>> that the newline isn't "real" and should be omitted from the literal's
>> contents.
>>
>> Future directions for string literals in general
>>
>> There are other issues with Swift's string handling which this
>> proposal intentionally does not address:
>>
>> • Reducing the amount of double-backslashing needed when working with
>> regular expression libraries, Windows paths, source code generation,
>> and other tasks where backslashes are part of the data.
>>
>> • Alternate delimiters or other strategies for writing strings with "
>> characters in them.
>>
>> • Accommodating code formatting concerns like hard wrapping and
>> commenting.
>>
>> • String literals consisting of very long pieces of text which are
>> best represented completely verbatim, with minimal alteration.
>>
>> This section briefly outlines some future proposals which might
>> address these issues. Combined, we believe they would address most of
>> the string literal use cases which Swift is currently not very good
>> at.
>>
>> Please note that these are simply sketches of hypothetical future
>> designs; they may radically change before proposal, and some may never
>> be proposed at all. Many, perhaps most, will not be proposed for Swift
>> 3. We are sketching these designs not to propose and refine these
>> features immediately, but merely to show how we think they might be
>> solved in ways which complement this proposal.
>>
>> String literal modifiers
>>
>> A string literal modifier is a cluster of identifier characters which
>> goes before a string literal and adjusts the way it is parsed.
>> Modifers only alter the interpretation of the text in the literal, not
>> the type of data it produces; for instance, there will never be
>> something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++.
>> Uppercase characters enable a feature; lowercase characters disable a
>> feature.
>>
>> Modifiers can be attached to both single-line and multiline literals,
>> and could also be attached to other literal syntaxes which might be
>> introduced in the future. When used with multiline strings, only the
>> starting quote needs to carry the modifiers, not the continuation
>> quotes.
>>
>> Modifiers are an extremely flexible feature which can be used for many
>> proposes. Of the ideas listed below, we believe the e modifier is an
>> urgent addition which should be included in Swift 3 if at all
>> possible; the others are less urgent and most of them could be
>> deferred, or at least added later if time allows.
>>
>> • Escape disabling: e"\\\" (string with three backslash characters)
>>
>> • Fine-grained escape disabling: i"\(foo)\n" (the string \(foo)
>> followed by a newline); eI"\(foo)\n" (the contents of foo followed by
>> the string \n), b"\w+\n" (the string \w+ followed by a newline)
>>
>> • Alternate delimiters: _ has no lowercase form, so it could be used
>> to allow strings with internal quotes: _"print("Hello, world!")"_,
>> __"print("Hello, world!")"__, etc.
>>
>> • Whitespace normalization: changes all runs of whitespace in the
>> literal to single space characters; this would allow you to use
>> multiline strings purely to improve code formatting.
>>
>> alert.informativeText = W"\(appName) could not typeset the element
>> “\(title)” because "it includes a link to an element that has been
>> removed from this "book."
>>
>> • Localization:
>>
>> alert.informativeText = LW"\(appName) could not typeset the element
>> “\(title)” because "it includes a link to an element that has been
>> removed from this "book."
>>
>> • Comments: Embedding comments in string literals might be useful for
>> literals containing regular expressions or other code.
>>
>> Eventually, user-specified string modifiers could be added to Swift,
>> perhaps as part of a hygienic macro system. It might also become
>> possible to change the default modifiers applied to literals in a
>> particular file or scope.
>>
>> Heredocs or other "verbatim string literal" features
>>
>> Sometimes it really is best to just splat something else down in the
>> middle of a file full of Swift source code. Maybe the file is
>> essentially a template and the literals are a majority of the code's
>> contents, or maybe you're writing a code generator and just want to
>> get string data into it with minimal fuss, or maybe people unfamiliar
>> with Swift need to be able to edit the literals. Whatever the reason,
>> the normal string literal syntax is just too burdensome.
>>
>> One approach to this problem is heredocs. A heredoc allows you to put
>> a placeholder for a literal on one line; the contents of the literal
>> begin on the next line, running up to some delimiter. It would be
>> possible to put multiple placeholders in a single line, and to apply
>> string modifiers to them.
>>
>> In Swift, this might look like:
>>
>> print(#to("---") + e#to("END" )) It was a dark and stormy \(timeOfDay)
>> when --- the Swift core team invented the \(interpolation) syntax.
>> END
>>
>> Another possible approach would be to support traditional multiline
>> string literals bounded by a different delimiter, like """. This might
>> look like:
>>
>> print(""" It was a dark and stormy \(timeOfDay) when """ + e""" the
>> Swift core team invented the \(interpolation) syntax. """) Although
>> heredocs could make a good addition to Swift eventually, there are
>> good reasons to defer them for now. Please see the "Alternatives
>> considered" section for details.
>>
>> First-class regular expressions
>>
>> Members of the core team are interested in regular expressions, but
>> they don't want to just build a literal that wraps PCRE or libicu;
>> rather, they aim to integrate regexes into the pattern matching system
>> and give them a deep, Perl 6-style rethink. This would be a major
>> effort, far beyond the scope of Swift 3.
>>
>> In the meantime, the e modifier and perhaps other string literal
>> modifiers will make it easier to specify regular expressions in string
>> literals for use with NSRegularExpression and other libraries
>> accessible from Swift.
>>
>> Alternatives considered
>>
>> Requiring no continuation character
>>
>> The main alternative is to not require a continuation quote, and
>> simply extend the string literal from the starting quote to the ending
>> quote, including all newlines between them. For example:
>>
>> let xml = "<?xml version=\"1.0\"?> <catalog> <book id=\"bk101\"
>> empty=\"\"> <author>\(author)</author> </book> </catalog>" This
>> alternative is extensively discussed in the "Rationale" section
>> above.
>>
>> Skip multiline strings and just support heredocs
>>
>> There are definitely cases where a heredoc would be a better solution,
>> such as generated code or code which is mostly literals with a little
>> Swift sprinkled around. On the other hand, there are also cases where
>> multiline strings are better: short strings in code which is meant to
>> be read. If a single feature can't handle them both well, there's no
>> shame in supporting the two features separately.
>>
>> It makes sense to support multiline strings first because:
>>
>> • They extend existing syntax instead of introducing new syntax.
>>
>> • They are much easier to parse; heredocs require some kind of mode in
>> the parser which kicks in at the start of the next line, whereas
>> multiline string literals can be handled in the lexer.
>>
>> • As discussed in "Rationale", they offer better diagnostics, code
>> formatting, and visual scannability.
>>
>> Use a different delimiter for multiline strings
>>
>> The initial suggestion was that multiline strings should use a
>> different delimiter, """, at the beginning and end of the string, with
>> no continuation characters between. Like heredocs, this might be a
>> good alternative for certain use cases, but it has the same basic
>> flaws as the "no continuation character" solution.
>>
>> -- Brent Royal-Gordon Architechies
>>
>> _______________________________________________ swift-evolution
>> mailing list swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________ swift-evolution mailing
> list swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>


More information about the swift-evolution mailing list