[swift-evolution] multi-line string literals.
Vladimir.S
svabox at gmail.com
Fri Apr 29 01:01:33 CDT 2016
@Brent, I suggest to rename the proposal to be clear that it is not trying
to solve the problem with char escaping i.e. with text *as-is*, that it is
just removes the \n"+ from the end of the string. I think many can think
of "as-is" text feature when starting to read your proposal or will ask
questions like "why multi-line proposal does not include proposal for as-is
multi-line", I feel like the title is too generic.
Regarding the proposal itself. I'm ready to support it (in case you'll add
'specification' of your multi-line feature in the title like "multi-line
with support of escaping and interpolation", so we can then have another
proposal like "multi-line without escaping, with text as-is")
One question: what about trailing spaces/tabs in the end of each line? IMO
there should be one strict rule to prevent any hard-to-find bugs/errors :
your feature must trim all trailing spaces, or should have an explicit
marker when to do this or not.
On 29.04.2016 0:56, Brent Royal-Gordon via swift-evolution wrote:
>> Awesome. Some specific suggestions below, but feel free to iterate in a
>> pull request if you prefer that.
>
> I've adopted these suggestions in some form, though I also ended up
> rewriting the explanation of why the feature was designed as it is and
> fusing it with material from "Alternatives considered".
>
> (Still not sure who I should list as a co-author. I'm currently thinking
> John, Tyler, and maybe Chris? Who's supposed to go there?)
>
>
> Multiline string literals
>
> * Proposal: SE-NNNN
> <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
> * Author(s): Brent Royal-Gordon <https://github.com/brentdax>
> * Status: *Second Draft*
> * Review manager: TBD
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction
>
> In Swift 2.2, the only means to insert a newline into a string literal is
> the |\n| escape. String literals specified in this way are generally ugly
> and unreadable. We propose a multiline string feature inspired by English
> punctuation which is a straightforward extension of our existing string
> literals.
>
> This proposal is one step in a larger plan to improve how string literals
> address various challenging use cases. It is not meant to solve all
> problems with escaping, nor to serve all use cases involving very long
> string literals. See the "Future directions for string literals in general"
> section for a sketch of the problems we ultimately want to address and some
> ideas of how we might do so.
>
> Swift-evolution threads: multi-line string literals. (April)
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>, multi-line
> string literals (December)
> <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft
> Notes
>
> *
>
> Removes the comment feature, which was felt to be an unnecessary
> complication. This and the backslash feature have been listed as future
> directions.
>
> *
>
> Loosens the specification of diagnostics, suggesting instead of
> requiring fix-its.
>
> *
>
> Splits a "Rationale" section out of the "Proposed solution" section.
>
> *
>
> Adds extensive discussion of other features which wold combine with
> this one.
>
> *
>
> I've listed only myself as an author because I don't want to put anyone
> else's name to a document they haven't seen, but there are others who
> deserve to be listed (John Holdsworth at least). Let me know if you
> think you should be included.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation
>
> As Swift begins to move into roles beyond app development, code which needs
> to generate text becomes a more important use case. Consider, for instance,
> generating even a small XML string:
>
> let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
> empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
>
> The string is practically unreadable, its structure drowned in escapes and
> run-together lines; it looks like little more than line noise. We can
> improve its readability somewhat by concatenating separate strings for each
> line and using real tabs instead of |\t| escapes:
>
> let xml = "<?xml version=\"1.0\"?>\n" +
> "<catalog>\n" +
> " <book id=\"bk101\" empty=\"\">\n" +
> " <author>\(author)</author>\n" +
> " </book>\n" +
> "</catalog>"
>
> However, this creates a more complex expression for the type checker, and
> there's still far more punctuation than ought to be necessary. If the most
> important goal of Swift is making code readable, this kind of code falls
> far short of that goal.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed
> solution
>
> We propose that, when Swift is parsing a string literal, if it reaches the
> end of the line without encountering an end quote, it should look at the
> next line. If it sees a quote at the beginning (a "continuation quote"),
> the string literal contains a newline and then continues on that line.
> Otherwise, the string literal is unterminated and syntactically invalid.
>
> Our sample above could thus be written as:
>
> |let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
> empty=\"\"> " <author>\(author)</author> " </book> "</catalog>" |
>
> If the second or subsequent lines had not begun with a quotation mark, or
> the trailing quotation mark after the |</catalog>|tag had not been
> included, Swift would have emitted an error.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale
>
> This design is rather unusual, and it's worth pausing a moment to explain
> why it has been chosen.
>
> The traditional design for this feature, seen in languages like Perl and
> Python, simply places one delimiter at the beginning of the literal and
> another at the end. Individual lines in the literal are not marked in any way.
>
> We think continuation quotes offer several important advantages over the
> traditional design:
>
> 1.
>
> *They help the compiler pinpoint errors in string literal
> delimiting.* Traditional multiline strings have a serious weakness: if
> you forget the closing quote, the compiler has no idea where you wanted
> the literal to end. It simply continues on until the compiler
> encounters another quote (or the end of the file). If you're lucky, the
> text after that quote is not valid code, and the resulting error will
> at least point you to the next string literal in the file. If you're
> unlucky, you'll get a seemingly unrelated error several literals later,
> an unbalanced brace error at the end of the file, or perhaps even code
> that compiles but does something totally wrong.
>
> (This is not a minor concern. Many popular languages, including C and
> Swift 2, specifically reject newlines in string literals to prevent
> this from happening.)
>
> Continuation quotes provide the compiler with redundant information
> about your intent. If you forget a closing quote, the continuation
> quotes give the compiler a very good idea of where you meant to put it.
> The compiler can point you to (or at least very near) the /end/ of the
> literal, where you want to insert the quote, rather than showing you
> the /beginning/ of the literal or even some unrelated error later in
> the file that was caused by the missing quote.
>
> 2.
>
> *Temporarily unclosed literals don't make editors go haywire.* The
> syntax highlighter has the same trouble parsing half-written, unclosed
> traditional quotes that the compiler does: It can't tell where the
> literal is supposed to end and the code should begin. It must either
> apply heuristics to try to guess where the literal ends, or incorrectly
> color everything between the opening quote and the next closing quote
> as a string literal. This can cause the file's coloring to alternate
> distractingly between "string literal" and "running code".
>
> Continuation quotes give the syntax highlighter enough context to guess
> at the correct coloration, even when the string isn't complete yet.
> Lines with a continuation quote are literals; lines without are code.
> At worst, the syntax highlighter might incorrectly color a few
> characters at the end of a line, rather than the remainder of the file.
>
> 3.
>
> They separate indentation from the string's contents. Traditional
> multiline strings usually include all of the content between the start
> and end delimiters, including leading whitespace. This means that it's
> usually impossible to indent a multiline string, so including one
> breaks up the flow of the surrounding code, making it less readable.
> Some languages apply heuristics or mode switches to try to remove
> indentation, but like all heuristics, these are mistake-prone and murky.
>
> Continuation quotes neatly avoid this problem. Whitespace before the
> continuation quote is indentation used to format the source code;
> whitespace after the continuation quote is part of the string literal.
> The interpretation of the code is perfectly clear to both compiler and
> programmer.
>
> 4.
>
> They improve the ability to quickly recognize the literal. Traditional
> multiline strings don't provide much visual help. To find the end, you
> must visually scan until you find the matching delimiter, which may be
> only one or a few characters long. When looking at a random line of
> source, it can be hard to tell at a glance whether it's code or
> literal. Syntax highlighting can help with these issues, but it's often
> unreliable, especially with advanced, idiosyncratic string literal
> features like multiline strings.
>
> Continuation quotes solve these problems. To find the end of the
> literal, just scan down the column of continuation characters until
> they end. To figure out if a given line of source is part of a literal,
> just see if it starts with a quote mark. The meaning of the source
> becomes obvious at a glance.
>
> Nevertheless, the traditional design /does/ has a few advantages:
>
> 1.
>
> *It is simpler.* Although continuation quotes are more complex, we
> believe that the advantages listed above pay for that complexity.
>
> 2.
>
> *There is no need to edit the intervening lines to add continuation
> quotes.* While the additional effort required to insert continuation
> quotes is an important downside, we believe that tool support,
> including both compiler fix-its and perhaps editor support for commands
> like "Paste as String Literal", can address this issue. In some
> editors, new features aren't even necessary; TextMate, for instance,
> lets you insert a character on several lines simultaneously. And new
> tool features could also address other issues like escaping embedded
> quotes.
>
> 3.
>
> *Naïve syntax highlighters may have trouble understanding this
> syntax.* This is true, but naïve syntax highlighters generally have
> terrible trouble with advanced string literal constructs; some struggle
> with even basic ones. While there are some designs (like
> Python's |"""| strings) which trick some syntax highlighters into
> working some of the time with some contents, we don't think this
> occasional, accidental compatibility is a big enough gain to justify
> changing the design.
>
> 4.
>
> *It looks funny—quotes should always be in matched pairs.* We aren't
> aware of another programming language which uses unbalanced quotes in
> string literals, but there /is/ one very important precedent for this
> kind of formatting: natural languages. English, for instance, uses a
> very similar format for quoting multiple lines of dialog by the same
> speaker. As an English Stack Exchange answer illustrates
> <http://english.stackexchange.com/a/96613/64636>:
>
> “That seems like an odd way to use punctuation,” Tom said. “What
> harm would there be in using quotation marks at the end of every
> paragraph?”
>
> “Oh, that’s not all that complicated,” J.R. answered. “If you
> closed quotes at the end of every paragraph, then you would need to
> reidentify the speaker with every subsequent paragraph.
>
> “Say a narrative was describing two or three people engaged in a
> lengthy conversation. If you closed the quotation marks in the
> previous paragraph, then a reader wouldn’t be able to easily tell
> if the previous speaker was extending his point, or if someone else
> in the room had picked up the conversation. By leaving the previous
> paragraph’s quote unclosed, the reader knows that the previous
> speaker is still the one talking.”
>
> “Oh, that makes sense. Thanks!”
>
> In English, omitting the ending quotation mark tells the text's reader
> that the quote continues on the next line, while including a quotation
> mark at the beginning of the next line reminds the reader that they're
> in the middle of a quote.
>
> Similarly, in this proposal, omitting the ending quotation mark tells
> the code's reader (and compiler) that the string literal continues on
> the next line, while including a quotation mark at the beginning of the
> next line reminds the reader (and compiler) that they're in the middle
> of a string literal.
>
> On balance, we think continuation quotes are the best design for this problem.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed
> design
>
> When Swift is parsing a string literal and reaches the end of a line
> without finding a closing quote, it examines the next line, applying the
> following rules:
>
> 1.
>
> If the next line begins with whitespace followed by a continuation
> quote, then the string literal contains a newline followed by the
> contents of the string literal starting on that line. (This line may
> itself have no closing quote, in which case the same rules apply to the
> line which follows.)
>
> 2.
>
> If the next line contains anything else, Swift raises a syntax error
> for an unterminated string literal.
>
> The exact error messages and diagnostics provided are left to the
> implementers to determine, but we believe it should be possible to provide
> two fix-its which will help users learn the syntax and correct string
> literal mistakes:
>
> *
>
> Insert |"| at the end of the current line to terminate the quote.
>
> *
>
> Insert |"| at the beginning of the next line (with some indentation
> heuristics) to continue the quote on the next line.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact
> on existing code
>
> Failing to close a string literal before the end of the line is currently a
> syntax error, so no valid Swift code should be affected by this change.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future
> directions for multiline string literals
>
> *
>
> We could permit comments before encountering a continuation quote to be
> counted as whitespace, and permit empty lines in the middle of string
> literals. This would allow you to comment out whole lines in the literal.
>
> *
>
> We could allow you to put a trailing backslash on a line to indicate
> that the newline isn't "real" and should be omitted from the literal's
> contents.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future
> directions for string literals in general
>
> There are other issues with Swift's string handling which this proposal
> intentionally does not address:
>
> *
>
> Reducing the amount of double-backslashing needed when working with
> regular expression libraries, Windows paths, source code generation,
> and other tasks where backslashes are part of the data.
>
> *
>
> Alternate delimiters or other strategies for writing strings
> with |"| characters in them.
>
> *
>
> Accommodating code formatting concerns like hard wrapping and commenting.
>
> *
>
> String literals consisting of very long pieces of text which are best
> represented completely verbatim, with minimal alteration.
>
> This section briefly outlines some future proposals which might address
> these issues. Combined, we believe they would address most of the string
> literal use cases which Swift is currently not very good at.
>
> Please note that these are simply sketches of hypothetical future designs;
> they may radically change before proposal, and some may never be proposed
> at all. Many, perhaps most, will not be proposed for Swift 3. We are
> sketching these designs not to propose and refine these features
> immediately, but merely to show how we think they might be solved in ways
> which complement this proposal.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String
> literal modifiers
>
> A string literal modifier is a cluster of identifier characters which goes
> before a string literal and adjusts the way it is parsed. Modifers only
> alter the interpretation of the text in the literal, not the type of data
> it produces; for instance, there will never be something like the
> UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a
> feature; lowercase characters disable a feature.
>
> Modifiers can be attached to both single-line and multiline literals, and
> could also be attached to other literal syntaxes which might be introduced
> in the future. When used with multiline strings, only the starting quote
> needs to carry the modifiers, not the continuation quotes.
>
> Modifiers are an extremely flexible feature which can be used for many
> proposes. Of the ideas listed below, we believe the |e| modifier is an
> urgent addition which should be included in Swift 3 if at all possible; the
> others are less urgent and most of them could be deferred, or at least
> added later if time allows.
>
> *
>
> *Escape disabling*: |e"\\\"| (string with three backslash characters)
>
> *
>
> *Fine-grained escape disabling*: |i"\(foo)\n"| (the
> string |\(foo)| followed by a newline); |eI"\(foo)\n"| (the contents
> of |foo| followed by the string |\n|), |b"\w+\n"| (the
> string |\w+| followed by a newline)
>
> *
>
> *Alternate delimiters*: |_| has no lowercase form, so it could be used
> to allow strings with internal quotes: |_"print("Hello,
> world!")"_|, |__"print("Hello, world!")"__|, etc.
>
> *
>
> *Whitespace normalization*: changes all runs of whitespace in the
> literal to single space characters; this would allow you to use
> multiline strings purely to improve code formatting.
>
> |alert.informativeText = W"\(appName) could not typeset the element
> “\(title)” because "it includes a link to an element that has been
> removed from this "book." |
>
> *
>
> *Localization*:
>
> |alert.informativeText = LW"\(appName) could not typeset the element
> “\(title)” because "it includes a link to an element that has been
> removed from this "book." |
>
> *
>
> *Comments*: Embedding comments in string literals might be useful for
> literals containing regular expressions or other code.
>
> Eventually, user-specified string modifiers could be added to Swift,
> perhaps as part of a hygienic macro system. It might also become possible
> to change the default modifiers applied to literals in a particular file or
> scope.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs
> or other "verbatim string literal" features
>
> Sometimes it really is best to just splat something else down in the middle
> of a file full of Swift source code. Maybe the file is essentially a
> template and the literals are a majority of the code's contents, or maybe
> you're writing a code generator and just want to get string data into it
> with minimal fuss, or maybe people unfamiliar with Swift need to be able to
> edit the literals. Whatever the reason, the normal string literal syntax is
> just too burdensome.
>
> One approach to this problem is heredocs. A heredoc allows you to put a
> placeholder for a literal on one line; the contents of the literal begin on
> the next line, running up to some delimiter. It would be possible to put
> multiple placeholders in a single line, and to apply string modifiers to them.
>
> In Swift, this might look like:
>
> print(#to("---") + e#to("END"))
> It was a dark and stormy \(timeOfDay) when
> ---
> the Swift core team invented the \(interpolation) syntax.
> END
>
> Another possible approach would be to support traditional multiline string
> literals bounded by a different delimiter, like |"""|. This might look like:
>
> print("""
> It was a dark and stormy \(timeOfDay) when
> """ + e"""
> the Swift core team invented the \(interpolation) syntax.
> """)
>
> Although heredocs could make a good addition to Swift eventually, there are
> good reasons to defer them for now. Please see the "Alternatives
> considered" section for details.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First-class
> regular expressions
>
> Members of the core team are interested in regular expressions, but they
> don't want to just build a literal that wraps PCRE or libicu; rather, they
> aim to integrate regexes into the pattern matching system and give them a
> deep, Perl 6-style rethink. This would be a major effort, far beyond the
> scope of Swift 3.
>
> In the meantime, the |e| modifier and perhaps other string literal
> modifiers will make it easier to specify regular expressions in string
> literals for use with |NSRegularExpression| and other libraries accessible
> from Swift.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives
> considered
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring
> no continuation character
>
> The main alternative is to not require a continuation quote, and simply
> extend the string literal from the starting quote to the ending quote,
> including all newlines between them. For example:
>
> let xml = "<?xml version=\"1.0\"?>
> <catalog>
> <book id=\"bk101\" empty=\"\">
> <author>\(author)</author>
> </book>
> </catalog>"
>
> This alternative is extensively discussed in the "Rationale" section above.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip
> multiline strings and just support heredocs
>
> There are definitely cases where a heredoc would be a better solution, such
> as generated code or code which is mostly literals with a little Swift
> sprinkled around. On the other hand, there are also cases where multiline
> strings are better: short strings in code which is meant to be read. If a
> single feature can't handle them both well, there's no shame in supporting
> the two features separately.
>
> It makes sense to support multiline strings first because:
>
> *
>
> They extend existing syntax instead of introducing new syntax.
>
> *
>
> They are much easier to parse; heredocs require some kind of mode in
> the parser which kicks in at the start of the next line, whereas
> multiline string literals can be handled in the lexer.
>
> *
>
> As discussed in "Rationale", they offer better diagnostics, code
> formatting, and visual scannability.
>
>
> <https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use
> a different delimiter for multiline strings
>
> The initial suggestion was that multiline strings should use a different
> delimiter, |"""|, at the beginning and end of the string, with no
> continuation characters between. Like heredocs, this might be a good
> alternative for certain use cases, but it has the same basic flaws as the
> "no continuation character" solution.
>
> --
> Brent Royal-Gordon
> Architechies
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
More information about the swift-evolution
mailing list