[swift-evolution] [Review] SE-0168: Multi-Line String Literals
Jarod Long
swift at lng.la
Wed Apr 12 19:51:05 CDT 2017
Thanks Brent, I really appreciate the thoughtful response. Apologies for anything I overlooked previously.
I agree with most of your points, although I still find myself preferring the common-whitespace logic and leading/trailing newline stripping when considering the pros and cons. It doesn't seem likely to gain traction though, so I won't spend more time on it.
Thanks again!
Jarod
On Apr 12, 2017, 16:35 -0700, Brent Royal-Gordon <brent at architechies.com>, wrote:
> > On Apr 12, 2017, at 11:58 AM, Jarod Long via swift-evolution <swift-evolution at swift.org> wrote:
> >
> > On a separate note, I'd like to bring up the de-indentation behavior I described earlier again. I still feel that having the position of the closing delimiter determine how much whitespace is de-indented is not very natural or intuitive, since I don't think there is any precedent in standard Swift styling to indent a closing delimiter to the same level as its content.
>
> String literal delimiters are very different from other delimiters because they switch the parser into a different mode where characters are interpreted in vastly different ways, and every character has a significant meaning. For instance, it's good practice to put either a space, or a newline and indentation, between array and dictionary literal delimiters or curly brackets and their content, but this is not possible with a string literal because the space would count as part of the content. This is the same way: you can't outdent because whitespace is significant inside a string literal, so it would change the meaning.
>
> I think that this probably seems way weirder on paper than it really is in practice. I recommend that you try it and see how it feels.
>
> > Stripping the most common whitespace possible from each line seems to be a much more intuitive and flexible solution in terms of formatting, and it's still compatible with the proposed formatting if that's anyone's preference.
>
> I discuss this at length in the Rationale section for indentation stripping. If you'll forgive me for quoting myself:
>
> > We could instead use an algorithm where the longest common whitespace prefix is removed from all lines; in well-formed code, that would produce the same behavior as this algorithm. But when not well-formed—when one line was accidentally indented less than the delimiter, or when a user mixed tabs and spaces accidentally—it would lead to valid, but incorrect and undiagnosable, behavior. For instance, if one line used a tab and other lines used spaces, Swift would not strip indentation from any of the lines; if most lines were indented four spaces, but one line was indented three, Swift would strip three spaces of indentation from all lines. And while you would still be able to create a string with all lines indented by indenting the closing delimiter less than the others, many users would never discover this trick.
> Let me provide an example to illustrate what I'm talking about. Suppose you want to say this:
>
> ····xml += """\↵
> ············<book id="bk\(id)">↵
> ················<author>\(author)</author>↵
> ················<title>\(title)</title>↵
> ················<genre>\(genre)</genre>↵
> ················<price>\(price)</price>↵
> ············</book>↵
> ············"""↵
>
> But instead, you miss just one little insignificant character:
>
> ····xml += """\↵
> ···········<book id="bk\(id)">↵
> ················<author>\(author)</author>↵
> ················<title>\(title)</title>↵
> ················<genre>\(genre)</genre>↵
> ················<price>\(price)</price>↵
> ············</book>↵
> ············"""↵
>
> This is the kind of mistake you will almost certainly never notice by hand inspection. You probably can't see the mistake without looking very carefully—and this is with invisible whitespace replaced with visible dots! But in the least-common-whitespace design, it's perfectly valid, and generates this:
>
> <book id="bk\(id)">↵
> ·····<author>\(author)</author>↵
> ·····<title>\(title)</title>↵
> ·····<genre>\(genre)</genre>↵
> ·····<price>\(price)</price>↵
> ·</book>↵
> ·
>
> That is not what you wanted. I'm pretty sure it's almost *never* what you want. But it's valid, it's going to be accepted, and it's going to affect every single line of the literal in a subtle way. (Plus the next line, thanks to that trailing space!) It's not something we can warn about, either, because it's perfectly valid. To fix it, you'll have to notice it's wrong and then work out why that happened.
>
> In the proposed design, on the other hand, we have a single source of truth for indentation: the last line tells us how much we should remove. That means we can actually call a mistake a mistake. The very same example, run through the proposed algorithm, produces this, plus a warning on the first line:
>
> ···········<book id="bk\(id)">↵
> ····<author>\(author)</author>↵
> ····<title>\(title)</title>↵
> ····<genre>\(genre)</genre>↵
> ····<price>\(price)</price>↵
> </book>↵
>
> Notice that there is only one line that comes out incorrectly, that it's the line which has the mistake, that the mistake is large and noticeable in the output, *and* that we were also able to emit a compile-time warning pointing to the exact line of code that was mistaken. That outcome is night-and-day better.
>
> Now consider mixed tabs and spaces:
>
> ····xml += """\↵
> ············<book id="bk\(id)">↵
> ················<author>\(author)</author>↵
> ········⇥ ····<title>\(title)</title>↵
> ················<genre>\(genre)</genre>↵
> ················<price>\(price)</price>↵
> ············</book>↵
> ············"""↵
>
> (I'm assuming a tab stop of 4, so mentally adjust that example if you need to.)
>
> With your design, the compiler happily removes the common whitespace and writes code which does this:
>
> ····<book id="bk\(id)">↵
> ········<author>\(author)</author>↵
> ⇥ ····<title>\(title)</title>↵
> ········<genre>\(genre)</genre>↵
> ········<price>\(price)</price>↵
> ····</book>↵
> ····
>
> Once again, every line is affected—including lines after this snippet, since there are spaces after the last newline. Once again, there can be no warning. You'll need to notice the problem and then figure out what happened.
>
> By contrast, with the proposed design, you get this, plus a warning:
>
> <book id="bk\(id)">↵
> ····<author>\(author)</author>↵
> ········⇥ ····<title>\(title)</title>↵
> ····<genre>\(genre)</genre>↵
> ····<price>\(price)</price>↵
> </book>↵
>
> Once again, the only line that's affected is the bad line, *and* you get a warning. In this case, I think the warning could probably point you to the exact *character* that causes the problem.
>
> Basically, common-whitespace-prefix makes the compiler act like a dumb computer that does what you say, not what you want. The proposed algorithm makes the compiler act like a smart human that notices when you ask for something that doesn't make sense and tells you about the problem.
>
> (Also note how, if you want a trailing newline, you still end up having the delimiter on a separate line aligned with the other text anyway! Stripping the common whitespace prefix in practice still ends up looking exactly the same as what you object to.)
>
> > The only functional limitation that I see is that if you can't have leading whitespace in the interpreted string if you actually want that. That doesn't seem like a very important use case to me,
>
> We showed an example of this being done in the Rationale section, and it was a *very* plausible example. I don't think it's rare or unnecessary at all; I think it's a really important use case, particularly for generating pretty-printed code or markup.
>
> > but if we think it is important, it could be supported by something like having a backslash in the leading whitespace at the location where it should be preserved from.
>
> There are good reasons not to allow backslashing of several different varieties of whitespace, and people were really unhappy with designs that required them to modify every line of text. I think this is a non-starter.
>
> > If we're set on the proposed behavior, have we considered what happens if the closing delimiter goes beyond the non-whitespace content of the string?
> >
> > let string = """
> > aa
> > bb
> > cc
> > """
> >
> > Does it strip the non-whitespace characters? Does it strip up to the non-whitespace characters? Does it generate an error?
>
> It strips nothing and generates a warning on each offending line (but not an error, because whitespace problems are usually minor enough that there's no need to interrupt your debugging to fix some indentation). This was covered in the proposal.
>
> (In an example like this, where every line is less indented than the delimiter, we might emit a different warning suggesting that the delimiter's indentation is wrong. That's a QoI issue, though, not the kind of thing we need to cover in a proposal.)
>
> --
> Brent Royal-Gordon
> Architechies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170412/46f21555/attachment.html>
More information about the swift-evolution
mailing list