[swift-evolution] multi-line string literals.

John Holdsworth mac at johnholdsworth.com
Sun May 1 12:15:35 CDT 2016


Thanks Brent for pulling together the proposal and summarising this thread.

I have to say I still feel most drawn to your “continuation quotes” idea and after
some thought the “_” modifier for _”strings”with”quotes”_ also seems sensible.
Most of all, for me the appeal of these approaches is their absolute simplicity.
My only reservation is what external editors will make of these strings as there
is no precedent in another programming language I am aware of.

I’ve updated the "reference toolchain" and PR for testing and review.

http://johnholdsworth.com/swift-LOCAL-2016-05-01-a-osx.tar.gz
https://github.com/apple/swift/pull/2275

This implementation still contains the “e” modifier as an example of how they
would be lexed (which I’ll remove before submission as it is outside the scope 
of this proposal) and one new feature that \ before a newline ignores the newline.
In this implementation modifiers can only be applied to the first segment of the literal.

This makes the following strings valid to my mind:

        let xml = "\
            "<?xml version=\"1.0\"?>
            "<catalog>
            "   <book id=\"bk101\" empty=\"\">
            "       <author>\(author)</author>
            "       <title>XML Developer's Guide</title>
            "       <genre>Computer</genre>
            "       <price>44.95</price>
            "       <publish_date>2000-10-01</publish_date>
            "       <description>An in-depth look at creating \
                        "applications with XML.</description>
            "   </book>
            "</catalog>
            ""
        print(xml)

        assert( xml == _"<?xml version="1.0"?>
            "<catalog>
            "   <book id="bk101" empty="">
            "       <author>\(author)</author>
            "       <title>XML Developer's Guide</title>
            "       <genre>Computer</genre>
            "       <price>44.95</price>
            "       <publish_date>2000-10-01</publish_date>
            "       <description>An in-depth look at creating applications with XML.</description>
            "   </book>
            "</catalog>
            ""_ )

        try! NSRegularExpression(pattern: e"<([a-zA-Z][\w]*)", options: [])
            .enumerateMatches(in: xml, options: [], range: NSMakeRange(0, xml.utf16.count)) {
                (result, flags, stop) in
                print((xml as NSString).substring( with:result!.range(at: 1)))
        }

I’d not create a lexStringMultilineLiteral() function just yet as the changes are still minor as you
can see from the PR and due to the fact you can’t determine if a string is multiline until you
are half way through it. Excuse the “bottom up" approach but the reasoning is it would be no
accident that if the lexer and any changes to it are minimal, it will be easy to document and use.

John


> On 1 May 2016, at 13:04, L Mihalkovic <laurent.mihalkovic at gmail.com> wrote:
> 
> [couple minutes read]
> 
> I read with great attention this thread, trying to see it from the implementation viewpoint (I know that the compiler structure should not drive the language features). I also revisited the how-to-contribute notes as well as the dev-process description. One of the ideas that stood out in my mind was that when looking at an implementation, enablement changes should be separated from the bulk of the feature, such that reviews can be easier.
> 
> So I tried to elevate this to the rank of a hidden-mandatory-requirement for anything related to this feature. It lead me to a staged approach to this feature that would allow a lot of things to be done, OVER TIME.
> 
> When distilling this feature to the smallest part enabler that would have to be added to the compiler I came to the following short list
> 
> add a string_multiline_token  to the lexer
> I realize that the current lexer can be tweaked to work (as per John’s PR), but IMO adding a dedicated "hole" in the parsing code is what will give something working today (no difference with current compiler behavior) while allowing all future changes to be cleanly isolated from anything around
> if one accepts the idea of a hole created by the token, then it stands to reason to have delimiters around it. I looking at the structure of the grammar, I came to the conclusion that  _” and “_ where an easy, unambiguous choice (I believe “”” and “”” looked like an equally easy an unambiguous choice)
> the next choice should be the creation of a lexStringMultilineLiteral() and lexMultilineCharacter() method in the Lexer. Again… bare with me, I do believe it is relevant to what everyone wants this feature to be… The latter method should contain only extensions specific to multiline literals delegating common use cases to lexCharacter()
> 
> The main point of following this route (or any equivalent) is that 
> it represents a very clear commitment to multiline string literals
> it ensures that there is no strong commitment to feature details, while allowing many future scenarios
> it will remain backward compatible with enhancements to the current string literal syntax (translation?)
> external contributors will be able to prototype while making sure we stay within strict boundaries for integration with the compiler
> 
> The next equally small step would be to describe the required minimal changes to Parser, a step I do not want to take now if the compiler experts  view no merit at all to the proposed staged approach.
> 
> 
> 
> A thought experiment pushing further down this path, shows how the following would be equally possible language features (with roughly equivalent implementation cost):
> 
> let whyOwhy = “”"\
>     !!    Can't understand what improvements it truly delivers 
>     !!        It basically removes a handful of characters
>     !!    It works today
>     !!        But I don't see it as a likable foundations for adding in future enhancements
>     !!\
>     !!    I don't envy the people who will have to support it outside of xcode
>     !!        Or even in xcode (considering how it currently struggles with indents/formatting
>     !!    As for elegance, beauty is in the eye of the beholder, they say.
> “”"
> var json1 = _"[json]\
>     !!{
>     !!  "file" : "\(wishIhadPlaceholders)_000.md"
>     !!  "desc" : "and why are all examples in xml, i thought it died a while ago ;-)"
>     !!  "rational" : [
>     !!          "Here we go again"
>     !!          "How will xcode help make these workable"
>     !!       ]
>     !!}
> “_
> var json2 = _"[json]\
> {
>   "file" : "\(wishIhadPlaceholders)_000.md"
>   "desc" : "and why are all examples in xml, i thought it died a while ago ;-)"
>   "rational" : [
>           "Here we go again"
>           "How will xcode help make these workable"
>        ]
> }
> “_
> 
>  [_"]  --> start string
>  [_"\] --> start line + ignore spaces until eol (basically swallow \r\n)
>  [!!\] --> ignore everything until eol... basically the gap does not exits
>  ["_]  --> terminate string
>  [_"[TYPEID]\] --> start string knowing that it a verifyer or a formatter (or a chain of) understanding TYPEID can syntax check or format or or or
> 
> 
> IMO splitting these expression from the current lexing/parsing has another long term benefits when coupled with the aforementioned idea of contents tagging:
> allow external dedicated formatter to be created in any editor supporting swift
> allow external validators (including in the form of compiler plugins)
> open a door for an equivalent to the scala's macros for contents marked as  [swift]
> 
> Once again I fully appreciate that implementation should not drive language design, but considering the flurry of great ideas, I thought it might in this instance be useful to identify a minimal, noncommittal, direction common to many scenarios, such that a step can be taken that will neither favor nor prohibit any of the proposals, but simply enable them all.
> 
> Thank you for your patience
> Regards
> 
> PS: I am working on a rudimentary implementation that I hope could help people test all the ideas floating in this list. 
> 
> 
>> On Apr 26, 2016, at 8:04 AM, Chris Lattner via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> 
>> On Apr 25, 2016, at 5:22 PM, Brent Royal-Gordon <brent at architechies.com <mailto:brent at architechies.com>> wrote:
>>>>> 3. It might be useful to make multiline `"` strings trim trailing whitespace and comments like Perl's `/x` regex modifier does.
>>>> 
>>>> If you have modifier characters already, it is easy to build a small zoo full of these useful beasts.
>>> 
>>> Modifiers are definitely a workable alternative, and can be quite flexible, particularly if a future macro system can let you create new modifiers.
>> 
>> Right. I consider modifiers to be highly precedented in other languages, and therefore proven to work.  If we go this way, I greatly prefer prefix to postfix modifiers.
>> 
>>>>> * Alternative delimiters: If a string literal starts with three, or five, or seven, or etc. quotes, that is the delimiter, and fewer quotes than that in a row are simply literal quote marks. Four, six, etc. quotes is a quote mark abutting the end of the literal.
>>>>> 
>>>>> 	let xml: String = """<?xml version="1.0"?>
>>>>> 				"""<catalog>
>>>>> 				"""\t<book id="bk101" empty="">
>>>>> 				"""\t\t<author>\(author)</author>
>>>>> 				"""\t</book>
>>>>> 				"""</catalog>"""
>>>>> 
>>>>> You can't use this syntax to express an empty string, or a string consisting entirely of quote marks, but `""` handles empty strings adequately, and escaping can help with quote marks. (An alternative would be to remove the abutting rule and permit `""""""` to mean "empty string", but abutting quotes seem more useful than long-delimiter empty strings.)
>>>> 
>>>> I agree that there is a need to support alternative delimiters, but subjectively, I find this to be pretty ugly.  It is also a really unfortunate degenerate case for “I just want a large blob of XML” because you’d end up using “"” almost all the time, and you have to use it on every line.
>>> 
>>> On the other hand, the `"""` does form a much larger, more obvious continuation indicator. It is *extremely* obvious that the above line is not Swift code, but something else embedded in it. It's also extremely obvious what its extent is: when you stop seeing `"""`, you're back to normal Swift code.
>> 
>> Right, but it is also heavy weight and ugly.  In your previous email you said about the single quote approach: "The quotation marks on the left end up forming a column that marks the lines as special”, so I don’t see a need for a triple quote syntax to solve this specific problem.
>> 
>>> I *really* don't like the idea of our only alternatives being "one double-quote mark with backslashing" or "use an entire heredoc". Heredocs have their place, but they are a *very* heavyweight quoting mechanism, and relatively short strings with many double-quotes are pretty common. (Consider, for instance, strings containing unparsed JSON.) I think we need *some* alternative to double-quotes, either single-quotes (with the same semantics, just as an alternative) or this kind of quote-stacking.
>> 
>> I agree that this is a real problem that would be great to solve.
>> 
>> If I step back and look at the string literal space we’re discussing, I feel like there are three options:
>> 
>> 1) single and simple multiline strings, using “
>> 2) your triple quote sort of string, specifically tuned to avoid having to escape “ when it occurs once or twice in sequence.
>> 3) heredoc, which is a very general (but also very heavy weight) solution to quoting problems.
>> 
>> I’m trying to eliminate the middle one, so we only have to have "two things”.  Here are some alternative ways to solve the problem, which might have less of an impact on the language:
>> 
>> A) Introduce single quoted string literals to avoid double quote problems specifically, e.g.:   ‘look “here” I say!’.  This is another form of #2 which is less ugly.  It also doesn’t help you if you have both “ and ‘ in your string.
>> 
>> B) Introduce a modifier character that requires a more complex closing sequence to close off the string, see C++ raw string literals for prior art on this approach.  Perhaps something like:
>> 
>> 	 Rxxx”look “ here “ I can use quotes “xxx
>> 
>> That said, I still prefer C) "ignore this issue for now”.  In other words, I wouldn’t want to block progress on improving the string literal situation overall on this issue, because anything we do here is an further extension to a proposal that doesn’t solve this problem.
>> 
>>> 
>>>> For cases like this, I think it would be reasonable to have a “heredoc” like scheme, which does not allow leading indentation, and does work with all the same modifier characters above.  I do not have a preference on a particular syntax, and haven’t given it any thought, but this would allow you to do things like:
>>>> 
>>>> 	let str = <<EOF
>>>> <?xml version="1.0"?>
>>>> <catalog>
>>>> \t<book id="bk101" empty="">
>>>> \t\t<author>\(author)</author>
>>>> \t</book>
>>>> </catalog>
>>>> EOF
>>>> 
>>>> for example.  You could then turn off escaping and other knobs using the modifier character (somehow, it would have to be incorporated into the syntax of course).
>>> 
>>> There are two questions and a suggestion I have whenever heredoc syntax comes up.
>>> 
>>> Q1: Does the heredoc begin immediately, at the next line, or at the next valid place for a statement to start? Heredocs traditionally take the second approach.
>>> 
>>> Q2: Do you permit heredocs to stack—that is, for a single line to specify multiple heredocs?
>>> 
>>> S: During the Perl 6 redesign, they decided to use the delimiter's indentation to determine the base indentation for the heredoc:
>>> 
>>> 	func x() -> String {
>>> 		return <<EOF
>>> 		<?xml version="1.0"?>
>>> 		<catalog>
>>> 		\t<book id="bk101" empty="">
>>> 		\t\t<author>\(author)</author>
>>> 		\t</book>
>>> 		</catalog>
>>> 		EOF
>>> 	}
>>> 
>>> Does that seem like a good approach?
>> 
>> I think that either approach could work, that you have a lot more experience on these topics than I do, and I would expect a vigorous community debate about these topics. :-)
>> 
>> That said, if you look at what we’re discussing:
>> 
>> 1. “Continuation" string literals, to allow a multi-line string literal.  You and I appear to completely agree about this.
>> 2. Heredoc: You and I seem to agree that they are a good “fully general” solution to have, but there are the details you outline above to iron out.
>> 3. Modifier characters:  I’m in favor, but I don’t know where you stand.  There is also still much to iron out here (such as the specific characters).
>> 4. A way to avoid having to escape “ in a non-heredoc literal.  I’m still unconvinced, and think that any solution to this problem will be orthogonal to the problems solved by 1-3 (and therefore can be added after getting experience with the other parts).
>> 
>> If you agree that these are all orthogonal pieces, then treat them as such: I’d suggest that you provide a proposal that just tackles the continuation string literals.  This seems simple, and possible to get in for Swift 3.  After that, we can discuss heredoc and modifiers (if you think they’re a good solution) on their own threads.  If those turn out to be uncontroversial, then perhaps they can get in too.
>> 
>> On the heredoc aspects specifically, unless others chime in with strong opinions about the topics you brought up, I’d suggest that you craft a proposal for adding them with your preferred solution to these.  You can mention the other answers (along with their tradeoffs and rationale for why you picked whatever you think is right) in the proposal, and we can help the community hash it out.
>> 
>> What do you think?
>> 
>> -Chris
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160501/cad1e08a/attachment.html>


More information about the swift-evolution mailing list