[swift-evolution] multi-line string literals.

Vladimir.S svabox at gmail.com
Fri Apr 29 01:01:33 CDT 2016


@Brent, I suggest to rename the proposal to be clear that it is not trying 
to solve the problem with char escaping i.e. with text *as-is*, that it is 
just removes the  \n"+  from the end of the string. I think many can think 
of "as-is" text feature when starting to read your proposal or will ask 
questions like "why multi-line proposal does not include proposal for as-is 
multi-line", I feel like the title is too generic.

Regarding the proposal itself. I'm ready to support it (in case you'll add 
'specification' of your multi-line feature in the title like "multi-line 
with support of escaping and interpolation", so we can then have another 
proposal like "multi-line without escaping, with text as-is")

One question: what about trailing spaces/tabs in the end of each line? IMO 
there should be one strict rule to prevent any hard-to-find bugs/errors : 
your feature must trim all trailing spaces, or should have an explicit 
marker when to do this or not.

On 29.04.2016 0:56, Brent Royal-Gordon via swift-evolution wrote:
>> Awesome.  Some specific suggestions below, but feel free to iterate in a
>> pull request if you prefer that.
>
> I've adopted these suggestions in some form, though I also ended up
> rewriting the explanation of why the feature was designed as it is and
> fusing it with material from "Alternatives considered".
>
> (Still not sure who I should list as a co-author. I'm currently thinking
> John, Tyler, and maybe Chris? Who's supposed to go there?)
>
>
>   Multiline string literals
>
>   * Proposal: SE-NNNN
>     <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
>   * Author(s): Brent Royal-Gordon <https://github.com/brentdax>
>   * Status: *Second Draft*
>   * Review manager: TBD
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction
>
> In Swift 2.2, the only means to insert a newline into a string literal is
> the |\n| escape. String literals specified in this way are generally ugly
> and unreadable. We propose a multiline string feature inspired by English
> punctuation which is a straightforward extension of our existing string
> literals.
>
> This proposal is one step in a larger plan to improve how string literals
> address various challenging use cases. It is not meant to solve all
> problems with escaping, nor to serve all use cases involving very long
> string literals. See the "Future directions for string literals in general"
> section for a sketch of the problems we ultimately want to address and some
> ideas of how we might do so.
>
> Swift-evolution threads: multi-line string literals. (April)
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>, multi-line
> string literals (December)
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft
>     Notes
>
>   *
>
>     Removes the comment feature, which was felt to be an unnecessary
>     complication. This and the backslash feature have been listed as future
>     directions.
>
>   *
>
>     Loosens the specification of diagnostics, suggesting instead of
>     requiring fix-its.
>
>   *
>
>     Splits a "Rationale" section out of the "Proposed solution" section.
>
>   *
>
>     Adds extensive discussion of other features which wold combine with
>     this one.
>
>   *
>
>     I've listed only myself as an author because I don't want to put anyone
>     else's name to a document they haven't seen, but there are others who
>     deserve to be listed (John Holdsworth at least). Let me know if you
>     think you should be included.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation
>
> As Swift begins to move into roles beyond app development, code which needs
> to generate text becomes a more important use case. Consider, for instance,
> generating even a small XML string:
>
> let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
> empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
>
> The string is practically unreadable, its structure drowned in escapes and
> run-together lines; it looks like little more than line noise. We can
> improve its readability somewhat by concatenating separate strings for each
> line and using real tabs instead of |\t| escapes:
>
> let xml = "<?xml version=\"1.0\"?>\n" +
>           "<catalog>\n" +
>           " <book id=\"bk101\" empty=\"\">\n" +
>           " <author>\(author)</author>\n" +
>           " </book>\n" +
>           "</catalog>"
>
> However, this creates a more complex expression for the type checker, and
> there's still far more punctuation than ought to be necessary. If the most
> important goal of Swift is making code readable, this kind of code falls
> far short of that goal.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed
>     solution
>
> We propose that, when Swift is parsing a string literal, if it reaches the
> end of the line without encountering an end quote, it should look at the
> next line. If it sees a quote at the beginning (a "continuation quote"),
> the string literal contains a newline and then continues on that line.
> Otherwise, the string literal is unterminated and syntactically invalid.
>
> Our sample above could thus be written as:
>
> |let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
> empty=\"\"> " <author>\(author)</author> " </book> "</catalog>" |
>
> If the second or subsequent lines had not begun with a quotation mark, or
> the trailing quotation mark after the |</catalog>|tag had not been
> included, Swift would have emitted an error.
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale
>
> This design is rather unusual, and it's worth pausing a moment to explain
> why it has been chosen.
>
> The traditional design for this feature, seen in languages like Perl and
> Python, simply places one delimiter at the beginning of the literal and
> another at the end. Individual lines in the literal are not marked in any way.
>
> We think continuation quotes offer several important advantages over the
> traditional design:
>
>  1.
>
>     *They help the compiler pinpoint errors in string literal
>     delimiting.* Traditional multiline strings have a serious weakness: if
>     you forget the closing quote, the compiler has no idea where you wanted
>     the literal to end. It simply continues on until the compiler
>     encounters another quote (or the end of the file). If you're lucky, the
>     text after that quote is not valid code, and the resulting error will
>     at least point you to the next string literal in the file. If you're
>     unlucky, you'll get a seemingly unrelated error several literals later,
>     an unbalanced brace error at the end of the file, or perhaps even code
>     that compiles but does something totally wrong.
>
>     (This is not a minor concern. Many popular languages, including C and
>     Swift 2, specifically reject newlines in string literals to prevent
>     this from happening.)
>
>     Continuation quotes provide the compiler with redundant information
>     about your intent. If you forget a closing quote, the continuation
>     quotes give the compiler a very good idea of where you meant to put it.
>     The compiler can point you to (or at least very near) the /end/ of the
>     literal, where you want to insert the quote, rather than showing you
>     the /beginning/ of the literal or even some unrelated error later in
>     the file that was caused by the missing quote.
>
>  2.
>
>     *Temporarily unclosed literals don't make editors go haywire.* The
>     syntax highlighter has the same trouble parsing half-written, unclosed
>     traditional quotes that the compiler does: It can't tell where the
>     literal is supposed to end and the code should begin. It must either
>     apply heuristics to try to guess where the literal ends, or incorrectly
>     color everything between the opening quote and the next closing quote
>     as a string literal. This can cause the file's coloring to alternate
>     distractingly between "string literal" and "running code".
>
>     Continuation quotes give the syntax highlighter enough context to guess
>     at the correct coloration, even when the string isn't complete yet.
>     Lines with a continuation quote are literals; lines without are code.
>     At worst, the syntax highlighter might incorrectly color a few
>     characters at the end of a line, rather than the remainder of the file.
>
>  3.
>
>     They separate indentation from the string's contents. Traditional
>     multiline strings usually include all of the content between the start
>     and end delimiters, including leading whitespace. This means that it's
>     usually impossible to indent a multiline string, so including one
>     breaks up the flow of the surrounding code, making it less readable.
>     Some languages apply heuristics or mode switches to try to remove
>     indentation, but like all heuristics, these are mistake-prone and murky.
>
>     Continuation quotes neatly avoid this problem. Whitespace before the
>     continuation quote is indentation used to format the source code;
>     whitespace after the continuation quote is part of the string literal.
>     The interpretation of the code is perfectly clear to both compiler and
>     programmer.
>
>  4.
>
>     They improve the ability to quickly recognize the literal. Traditional
>     multiline strings don't provide much visual help. To find the end, you
>     must visually scan until you find the matching delimiter, which may be
>     only one or a few characters long. When looking at a random line of
>     source, it can be hard to tell at a glance whether it's code or
>     literal. Syntax highlighting can help with these issues, but it's often
>     unreliable, especially with advanced, idiosyncratic string literal
>     features like multiline strings.
>
>     Continuation quotes solve these problems. To find the end of the
>     literal, just scan down the column of continuation characters until
>     they end. To figure out if a given line of source is part of a literal,
>     just see if it starts with a quote mark. The meaning of the source
>     becomes obvious at a glance.
>
> Nevertheless, the traditional design /does/ has a few advantages:
>
>  1.
>
>     *It is simpler.* Although continuation quotes are more complex, we
>     believe that the advantages listed above pay for that complexity.
>
>  2.
>
>     *There is no need to edit the intervening lines to add continuation
>     quotes.* While the additional effort required to insert continuation
>     quotes is an important downside, we believe that tool support,
>     including both compiler fix-its and perhaps editor support for commands
>     like "Paste as String Literal", can address this issue. In some
>     editors, new features aren't even necessary; TextMate, for instance,
>     lets you insert a character on several lines simultaneously. And new
>     tool features could also address other issues like escaping embedded
>     quotes.
>
>  3.
>
>     *Naïve syntax highlighters may have trouble understanding this
>     syntax.* This is true, but naïve syntax highlighters generally have
>     terrible trouble with advanced string literal constructs; some struggle
>     with even basic ones. While there are some designs (like
>     Python's |"""| strings) which trick some syntax highlighters into
>     working some of the time with some contents, we don't think this
>     occasional, accidental compatibility is a big enough gain to justify
>     changing the design.
>
>  4.
>
>     *It looks funny—quotes should always be in matched pairs.* We aren't
>     aware of another programming language which uses unbalanced quotes in
>     string literals, but there /is/ one very important precedent for this
>     kind of formatting: natural languages. English, for instance, uses a
>     very similar format for quoting multiple lines of dialog by the same
>     speaker. As an English Stack Exchange answer illustrates
>     <http://english.stackexchange.com/a/96613/64636>:
>
>         “That seems like an odd way to use punctuation,” Tom said. “What
>         harm would there be in using quotation marks at the end of every
>         paragraph?”
>
>         “Oh, that’s not all that complicated,” J.R. answered. “If you
>         closed quotes at the end of every paragraph, then you would need to
>         reidentify the speaker with every subsequent paragraph.
>
>         “Say a narrative was describing two or three people engaged in a
>         lengthy conversation. If you closed the quotation marks in the
>         previous paragraph, then a reader wouldn’t be able to easily tell
>         if the previous speaker was extending his point, or if someone else
>         in the room had picked up the conversation. By leaving the previous
>         paragraph’s quote unclosed, the reader knows that the previous
>         speaker is still the one talking.”
>
>         “Oh, that makes sense. Thanks!”
>
>     In English, omitting the ending quotation mark tells the text's reader
>     that the quote continues on the next line, while including a quotation
>     mark at the beginning of the next line reminds the reader that they're
>     in the middle of a quote.
>
>     Similarly, in this proposal, omitting the ending quotation mark tells
>     the code's reader (and compiler) that the string literal continues on
>     the next line, while including a quotation mark at the beginning of the
>     next line reminds the reader (and compiler) that they're in the middle
>     of a string literal.
>
> On balance, we think continuation quotes are the best design for this problem.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed
>     design
>
> When Swift is parsing a string literal and reaches the end of a line
> without finding a closing quote, it examines the next line, applying the
> following rules:
>
>  1.
>
>     If the next line begins with whitespace followed by a continuation
>     quote, then the string literal contains a newline followed by the
>     contents of the string literal starting on that line. (This line may
>     itself have no closing quote, in which case the same rules apply to the
>     line which follows.)
>
>  2.
>
>     If the next line contains anything else, Swift raises a syntax error
>     for an unterminated string literal.
>
> The exact error messages and diagnostics provided are left to the
> implementers to determine, but we believe it should be possible to provide
> two fix-its which will help users learn the syntax and correct string
> literal mistakes:
>
>   *
>
>     Insert |"| at the end of the current line to terminate the quote.
>
>   *
>
>     Insert |"| at the beginning of the next line (with some indentation
>     heuristics) to continue the quote on the next line.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact
>     on existing code
>
> Failing to close a string literal before the end of the line is currently a
> syntax error, so no valid Swift code should be affected by this change.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future
>     directions for multiline string literals
>
>   *
>
>     We could permit comments before encountering a continuation quote to be
>     counted as whitespace, and permit empty lines in the middle of string
>     literals. This would allow you to comment out whole lines in the literal.
>
>   *
>
>     We could allow you to put a trailing backslash on a line to indicate
>     that the newline isn't "real" and should be omitted from the literal's
>     contents.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future
>     directions for string literals in general
>
> There are other issues with Swift's string handling which this proposal
> intentionally does not address:
>
>   *
>
>     Reducing the amount of double-backslashing needed when working with
>     regular expression libraries, Windows paths, source code generation,
>     and other tasks where backslashes are part of the data.
>
>   *
>
>     Alternate delimiters or other strategies for writing strings
>     with |"| characters in them.
>
>   *
>
>     Accommodating code formatting concerns like hard wrapping and commenting.
>
>   *
>
>     String literals consisting of very long pieces of text which are best
>     represented completely verbatim, with minimal alteration.
>
> This section briefly outlines some future proposals which might address
> these issues. Combined, we believe they would address most of the string
> literal use cases which Swift is currently not very good at.
>
> Please note that these are simply sketches of hypothetical future designs;
> they may radically change before proposal, and some may never be proposed
> at all. Many, perhaps most, will not be proposed for Swift 3. We are
> sketching these designs not to propose and refine these features
> immediately, but merely to show how we think they might be solved in ways
> which complement this proposal.
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String
>       literal modifiers
>
> A string literal modifier is a cluster of identifier characters which goes
> before a string literal and adjusts the way it is parsed. Modifers only
> alter the interpretation of the text in the literal, not the type of data
> it produces; for instance, there will never be something like the
> UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a
> feature; lowercase characters disable a feature.
>
> Modifiers can be attached to both single-line and multiline literals, and
> could also be attached to other literal syntaxes which might be introduced
> in the future. When used with multiline strings, only the starting quote
> needs to carry the modifiers, not the continuation quotes.
>
> Modifiers are an extremely flexible feature which can be used for many
> proposes. Of the ideas listed below, we believe the |e| modifier is an
> urgent addition which should be included in Swift 3 if at all possible; the
> others are less urgent and most of them could be deferred, or at least
> added later if time allows.
>
>   *
>
>     *Escape disabling*: |e"\\\"| (string with three backslash characters)
>
>   *
>
>     *Fine-grained escape disabling*: |i"\(foo)\n"| (the
>     string |\(foo)| followed by a newline); |eI"\(foo)\n"| (the contents
>     of |foo| followed by the string |\n|), |b"\w+\n"| (the
>     string |\w+| followed by a newline)
>
>   *
>
>     *Alternate delimiters*: |_| has no lowercase form, so it could be used
>     to allow strings with internal quotes: |_"print("Hello,
>     world!")"_|, |__"print("Hello, world!")"__|, etc.
>
>   *
>
>     *Whitespace normalization*: changes all runs of whitespace in the
>     literal to single space characters; this would allow you to use
>     multiline strings purely to improve code formatting.
>
>     |alert.informativeText = W"\(appName) could not typeset the element
>     “\(title)” because "it includes a link to an element that has been
>     removed from this "book." |
>
>   *
>
>     *Localization*:
>
>     |alert.informativeText = LW"\(appName) could not typeset the element
>     “\(title)” because "it includes a link to an element that has been
>     removed from this "book." |
>
>   *
>
>     *Comments*: Embedding comments in string literals might be useful for
>     literals containing regular expressions or other code.
>
> Eventually, user-specified string modifiers could be added to Swift,
> perhaps as part of a hygienic macro system. It might also become possible
> to change the default modifiers applied to literals in a particular file or
> scope.
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs
>       or other "verbatim string literal" features
>
> Sometimes it really is best to just splat something else down in the middle
> of a file full of Swift source code. Maybe the file is essentially a
> template and the literals are a majority of the code's contents, or maybe
> you're writing a code generator and just want to get string data into it
> with minimal fuss, or maybe people unfamiliar with Swift need to be able to
> edit the literals. Whatever the reason, the normal string literal syntax is
> just too burdensome.
>
> One approach to this problem is heredocs. A heredoc allows you to put a
> placeholder for a literal on one line; the contents of the literal begin on
> the next line, running up to some delimiter. It would be possible to put
> multiple placeholders in a single line, and to apply string modifiers to them.
>
> In Swift, this might look like:
>
> print(#to("---") + e#to("END"))
> It was a dark and stormy \(timeOfDay) when
> ---
> the Swift core team invented the \(interpolation) syntax.
> END
>
> Another possible approach would be to support traditional multiline string
> literals bounded by a different delimiter, like |"""|. This might look like:
>
> print("""
> It was a dark and stormy \(timeOfDay) when
> """ + e"""
> the Swift core team invented the \(interpolation) syntax.
> """)
>
> Although heredocs could make a good addition to Swift eventually, there are
> good reasons to defer them for now. Please see the "Alternatives
> considered" section for details.
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First-class
>       regular expressions
>
> Members of the core team are interested in regular expressions, but they
> don't want to just build a literal that wraps PCRE or libicu; rather, they
> aim to integrate regexes into the pattern matching system and give them a
> deep, Perl 6-style rethink. This would be a major effort, far beyond the
> scope of Swift 3.
>
> In the meantime, the |e| modifier and perhaps other string literal
> modifiers will make it easier to specify regular expressions in string
> literals for use with |NSRegularExpression| and other libraries accessible
> from Swift.
>
>
>     <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives
>     considered
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring
>       no continuation character
>
> The main alternative is to not require a continuation quote, and simply
> extend the string literal from the starting quote to the ending quote,
> including all newlines between them. For example:
>
> let xml = "<?xml version=\"1.0\"?>
> <catalog>
> <book id=\"bk101\" empty=\"\">
> <author>\(author)</author>
> </book>
> </catalog>"
>
> This alternative is extensively discussed in the "Rationale" section above.
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip
>       multiline strings and just support heredocs
>
> There are definitely cases where a heredoc would be a better solution, such
> as generated code or code which is mostly literals with a little Swift
> sprinkled around. On the other hand, there are also cases where multiline
> strings are better: short strings in code which is meant to be read. If a
> single feature can't handle them both well, there's no shame in supporting
> the two features separately.
>
> It makes sense to support multiline strings first because:
>
>   *
>
>     They extend existing syntax instead of introducing new syntax.
>
>   *
>
>     They are much easier to parse; heredocs require some kind of mode in
>     the parser which kicks in at the start of the next line, whereas
>     multiline string literals can be handled in the lexer.
>
>   *
>
>     As discussed in "Rationale", they offer better diagnostics, code
>     formatting, and visual scannability.
>
>
>       <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use
>       a different delimiter for multiline strings
>
> The initial suggestion was that multiline strings should use a different
> delimiter, |"""|, at the beginning and end of the string, with no
> continuation characters between. Like heredocs, this might be a good
> alternative for certain use cases, but it has the same basic flaws as the
> "no continuation character" solution.
>
> --
> Brent Royal-Gordon
> Architechies
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>


More information about the swift-evolution mailing list