[swift-evolution] Strings in Swift 4
Chris Eidhof
chris at eidhof.nl
Wed Jan 25 12:52:28 CST 2017
One concern I have with implementing this through pattern matching is that
it might make it really hard to optimize. For example, consider the
following hypothetical fragment:
switch string {
case "x" (let remainder):
return arc4random() % 2 == 0 ? parse1() : parse2()
This would parse a string starting with x, and then continue parsing with
either parse1 or parse2.
This is strictly more powerful (it allows you to implement parsers that are
"dynamically" constructed depending on the input), and therefore less
optimizable.
However, IANACE (I am not a compiler engineer), so hopefully my pessimism
is unjust.
On Wed, Jan 25, 2017 at 7:23 PM, Joe Groff <jgroff at apple.com> wrote:
>
> On Jan 24, 2017, at 9:35 PM, Chris Lattner via swift-evolution <
> swift-evolution at swift.org> wrote:
>
> On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>
> I agree that being able to implement parsers in a nice way can be a huge
> step forward in being really good at string processing.
>
>
> +1 from me as well, I agree with Joe that Swift can learn a lot from Perl
> 6 grammar’s and we should take the time to do it right. Below I say
> “regex” a lot, but I really mean a more general grammar system (and even
> Perl 5 regex’s aren’t regular :-)
>
> There are a couple of possibilities that come to mind directly:
>
> 1. Build parsers right into the language (like Perl 6 grammars)
> 2. Provide a parser combinator language (e.g. https://github.com/
> davedufresne/SwiftParsec).
> 3. Rely on external tools like bison/yacc/etc.
> 4. Make it easy for people to write hand-written parsers (e.g. by
> providing an NSScanner alternative).
>
>
> My opinion is that #1 is the right path to start with, but it wouldn’t
> preclude doing #2. Here’s my rationale / half-baked thought process:
>
> There are two important use cases for regex's: the literal case (e.g.
> /aa+b*/) and the dynamically computed case. The former is really what
> we’re talking about here, the latter should obviously be handled with some
> sort of Regex type which can be formed from string values or whatever.
> Regex literals in an expression context should default to producing the
> Regex type of course.
>
> This means that when you pass a regex literal into an API call (e.g. split
> on a string), it is really just creating something of Regex type, and
> passing it down. If you wanted to introduce a parser combinator DSL, you
> could totally plug it into the system, by having the combinators produce
> something of the Regex type.
>
> So why bless regex literals with language support at all? I see several
> reasons:
>
> 1. Diagnostics: These will be heavily used by people, and you want to have
> good compiler error and warning messages for them. You want to be able to
> validate the regex at compile time, not wait until runtime to detect
> syntactic mistakes like unbalanced parens.
>
> 2. Syntax Familiarity: To take advantage of people’s familiarity with
> other languages, we should strive to make the basic regex syntax familiar
> and obvious. I’d argue that /aa+b*/ should “just work” and do the thing
> you think it does. Relying on a combinator library to do that would be
> crazy.
>
> 3. Performance: Many regex’s are actually regular, so they can be
> trivially compiled into DFAs. There is a well understood body of work that
> can be simply dropped into the compiler to do this. Regex’s that are not
> regular can be compiled into hybrid DFA/NFA+backtracking schemes, and
> allowing a divide and conquer style of compiler optimization to do this is
> the path that makes the most sense (to me at least). Further, if you
> switch on a string and have a bunch of cases that are regex’s, you’d
> obviously want the compiler to generate a single state machine (like a
> lexer), not check each pattern in series.
>
> 4. Pattern matching greatness: One of the most obnoxious/error prone
> aspects of regex’s in many languages is that when you match a pattern, the
> various matches are dumped into numbered result values (often by the order
> of the parens in the pattern). This is totally barbaric: it begs for off
> by one errors, often breaks as the program is being evolved/maintained,
> etc. It is just as bad as printf/scanf!
>
> You should instead be able to directly bind subexpressions into local
> variables. For example if you were trying to match something like “42:
> Chris”, you should be able to use straw man syntax like this:
>
> case /(let id: \d+): (let name: \w+)/: print(id); print(name)
>
> Unless we were willing to dramatically expand how patterns work, this
> requires baking support into the language.
>
> 5. Scanner/“Formatter" integration: Taking the above one step farther, we
> could have default patterns for known types (and make it extensible to user
> defined types of course). For example, \d+ is the obvious pattern for
> integers, so you should be able to write the above like this (in principle):
>
> case /(let id: Int): (let name: \w+)/: print(id); print(name)
>
> In addition to avoiding having to specify \d+ all the time, this
> eliminates the need for a “string to int” conversion after the pattern is
> matched, because id would be bound as type Int already.
>
>
> Anyway, to summarize, I think that getting regex’s into the language is
> really important and expect them to be widely used. As such, I think it is
> worth burning compiler/language complexity to make them be truly great in
> Swift.
>
>
> Another good reason to bake them into pattern matching is that it would
> make it easier to optimize when you want to match one of multiple patterns.
> Often, you don't want just one grammar, but possibly one of many, and it'd
> be nice if switch-ing over multiple string patterns led to a reasonably
> efficient DFA/NFA/rec-descent machine based on the needs of the grammars
> being matched.
>
> -Joe
>
>
--
Chris Eidhof
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170125/0e224124/attachment.html>
More information about the swift-evolution
mailing list