[swift-evolution] Strings in Swift 4

Chris Eidhof chris at eidhof.nl
Wed Jan 25 09:27:16 CST 2017

Great stuff!

I wonder if built-in grammars (that's what Perl calls them) would work only
for things that are backed by string literals, or if it's worth the
time/effort to make them work for other kind of data as well. For example,
what if you write a grammar to tokenize (yielding some sequence of
`Token`s), and then want to parse those `Token`s? Or what if you want to
parse other kinds of data? Or should we try to make the 80% case work (only
provide grammar/regex literals for Strings) to avoid complexity?

I think it's worth looking at parser combinators. The really cool thing
about them is that they provide a few basic elements, and well-defined ways
to combine those elements. Specifically, it's interesting to look at the
problems surrounding parser combinators and seeing how we could do better:

- Error messages can be cryptic
- Because it's built on top of the language, you can't really do stuff like
binding subexpressions to local variables
- Once you allow something like flatMap in a parser combinator library, you
lose the possibility to optimize by rewriting the parser structure. This is
because the parser that comes out of the rhs of a flatMap can depend on the
parsed output of the lhs.

To me it seems like there's a lot of (exciting) work to be done to get this
right :).

On Wed, Jan 25, 2017 at 6:35 AM, Chris Lattner <sabre at nondot.org> wrote:

> On Jan 24, 2017, at 12:05 AM, Chris Eidhof via swift-evolution <
> swift-evolution at swift.org> wrote:
> I agree that being able to implement parsers in a nice way can be a huge
> step forward in being really good at string processing.
> +1 from me as well, I agree with Joe that Swift can learn a lot from Perl
> 6 grammar’s and we should take the time to do it right.  Below I say
> “regex” a lot, but I really mean a more general grammar system (and even
> Perl 5 regex’s aren’t regular :-)
> There are a couple of possibilities that come to mind directly:
> 1. Build parsers right into the language (like Perl 6 grammars)
> 2. Provide a parser combinator language (e.g. https://github.com/
> davedufresne/SwiftParsec).
> 3. Rely on external tools like bison/yacc/etc.
> 4. Make it easy for people to write hand-written parsers (e.g. by
> providing an NSScanner alternative).
> My opinion is that #1 is the right path to start with, but it wouldn’t
> preclude doing #2.  Here’s my rationale / half-baked thought process:
> There are two important use cases for regex's: the literal case (e.g.
> /aa+b*/) and the dynamically computed case.  The former is really what
> we’re talking about here, the latter should obviously be handled with some
> sort of Regex type which can be formed from string values or whatever.
> Regex literals in an expression context should default to producing the
> Regex type of course.
> This means that when you pass a regex literal into an API call (e.g. split
> on a string), it is really just creating something of Regex type, and
> passing it down.  If you wanted to introduce a parser combinator DSL, you
> could totally plug it into the system, by having the combinators produce
> something of the Regex type.
> So why bless regex literals with language support at all?  I see several
> reasons:
> 1. Diagnostics: These will be heavily used by people, and you want to have
> good compiler error and warning messages for them.  You want to be able to
> validate the regex at compile time, not wait until runtime to detect
> syntactic mistakes like unbalanced parens.
> 2. Syntax Familiarity: To take advantage of people’s familiarity with
> other languages, we should strive to make the basic regex syntax familiar
> and obvious.  I’d argue that /aa+b*/ should “just work” and do the thing
> you think it does.  Relying on a combinator library to do that would be
> crazy.
> 3. Performance: Many regex’s are actually regular, so they can be
> trivially compiled into DFAs.  There is a well understood body of work that
> can be simply dropped into the compiler to do this. Regex’s that are not
> regular can be compiled into hybrid DFA/NFA+backtracking schemes, and
> allowing a divide and conquer style of compiler optimization to do this is
> the path that makes the most sense (to me at least).  Further, if you
> switch on a string and have a bunch of cases that are regex’s, you’d
> obviously want the compiler to generate a single state machine (like a
> lexer), not check each pattern in series.
> 4. Pattern matching greatness: One of the most obnoxious/error prone
> aspects of regex’s in many languages is that when you match a pattern, the
> various matches are dumped into numbered result values (often by the order
> of the parens in the pattern).  This is totally barbaric: it begs for off
> by one errors, often breaks as the program is being evolved/maintained,
> etc.  It is just as bad as printf/scanf!
> You should instead be able to directly bind subexpressions into local
> variables.  For example if you were trying to match something like “42:
> Chris”, you should be able to use straw man syntax like this:
>    case /(let id: \d+): (let name: \w+)/: print(id); print(name)
> Unless we were willing to dramatically expand how patterns work, this
> requires baking support into the language.
> 5. Scanner/“Formatter" integration:  Taking the above one step farther, we
> could have default patterns for known types (and make it extensible to user
> defined types of course). For example, \d+ is the obvious pattern for
> integers, so you should be able to write the above like this (in principle):
>    case /(let id: Int): (let name: \w+)/: print(id); print(name)
> In addition to avoiding having to specify \d+ all the time, this
> eliminates the need for a “string to int” conversion after the pattern is
> matched, because id would be bound as type Int already.
> Anyway, to summarize, I think that getting regex’s into the language is
> really important and expect them to be widely used.  As such, I think it is
> worth burning compiler/language complexity to make them be truly great in
> Swift.
> -Chris

Chris Eidhof
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170125/28e14bb4/attachment.html>

More information about the swift-evolution mailing list