[swift-evolution] [Pitch] Raw mode string literals

Chris Lattner clattner at nondot.org
Fri Nov 24 18:15:35 CST 2017

<email reordered a bit below to make responding easier>:

On Nov 24, 2017, at 11:12 AM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
> I think we've circled back to a topic that we've discussed here before. I do agree that having more of this validation at compile time would improve the experience. However, I can see a few drawbacks to the _compiler_ doing the validation:
> - As seen in these discussions about string literals where users want to copy and paste text and have it "just work," supporting only one dialect in regex literals will inevitably lead users to ask for other types of regex literals for each individual flavor of regex they encounter.

Focusing first on the user model instead of implementation details: 

I don’t see why this is desirable at all.  If someone came to the Perl community and said “I want to use unmodified tcl regexp syntax”, the Perl community would politely tell them to buzz off.  They can just use string literals.

Allowing // syntax to support different grammars makes the Swift language more complex for users (independent of implementation details) and I don’t see any benefit to allowing that.  IMO, we’d be much better off by having a single blessed syntax, make it work as well as possible, and steer the community strongly towards using it.

Someone wanting to use NSRegularExpression or a bsd regex library or whatever can use string literals, just like they do now.  This has the *advantage* that you don’t look at the code using //’s and think it does something it doesn’t.

> - In the absence of a `constexpr`-like facility, supporting runtime expressions would mean we'd be writing the same code twice, once in C++ for compile-time validation of literal expressions and another time in Swift for runtime expressions.

Agreed.  There are various ways we could factor this logic, including having the regex parser + tree representation be literally linked into both the compiler and stdlib.  I don’t think the cost is great, and we definitely do such things already.  If we do this right, the functionality can subsume tools like flex as well, which means we’d get a net reduction of complexity in the whole system.

> 2) I’d like to explore the idea of making // syntax be *patterns* instead of simply literals.  As a pattern, it should be possible to bind submatches directly into variable declarations, eliminating the need to count parens in matches or other gross things.  Here is strawman syntax with a dumb example:
> if case /([a-zA-Z]+: let firstName) ([a-zA-Z]+: let lastName)/ = getSomeString() {
>    print(firstName, lastName)
> }
> This is an interesting idea. But is it significantly more usable

I don’t know if this is the ideal way to do this, as I mentioned before, I think we need to have a concerted design effort that considers such things.  Regex functionality does fit naturally with pattern matching though, so I don’t think we should discard it too early.

> than the same type having a collection of named matches using the usual Perl syntax?
>   if case /(?<firstName>[a-zA-Z]+) (?<lastName>[a-zA-Z]+)/ = getSomeString() {
>     print(Regex.captured["firstName"], Regex.captured["lastName"])
>   }

Personally, I really don’t like this.  It turns a structured problem into one that violates DRY and loses the structure inherent in the solution.  Also, while theoretically the dictionary could be optimized away, in practice that would be difficult to do without heroics.

> 3) I see regex string matching as the dual to string interpolation.  We already provide the ability for types to specify a default way to print themselves, and it would be great to have default regex’s associated with many types, so you can just say “match an Int here” instead of having to match [0-9]+ and then do a failable conversion to Int outside the regex.
> 4) I’d like to consider some of the advances that Perl 6 added to its regex grammar.  Everyone knows that modern regex’s aren’t actually regular anyway, so it begs the question of how far to take it.  If nothing else, I appreciate the freeform structure supported (including inline comments) which make them more readable.
> Sounds like we want multiline regex literals :)

Yes, I absolutely do, but I want the // syntax to imply them.  It’s “single line” literal syntax that we should eliminate by default. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171124/4c67d6a3/attachment.html>

More information about the swift-evolution mailing list