[swift-evolution] Empower String type with regular expression

Chris Lattner clattner at apple.com
Sun Jan 31 21:39:15 CST 2016


> On Jan 31, 2016, at 8:32 AM, Patrick Gili via swift-evolution <swift-evolution at swift.org> wrote:
> 
> There have been several threads that have discussed the notion of a regular expression literals. However, I didn't see anyone putting together a formal proposal, and hence I took the liberty to do so. I would appreciate discussion and comments on the proposal:

I am +1 on the concept of adding regex literals to Swift, but -1 on this proposal.

Specifically, instead of introducing regex literals, I’d suggest that you investigate introducing regex’s to the pattern grammar, which is what Swift uses for matching already.  Regex’s should be usable in the cases of a switch, for example.  Similarly, they should be able to bind variables directly to subpatterns.

Further, I highly recommend checking out Perl 6’s regular expressions.  They are a community that has had an obsessive passion for regular expressions, and in Perl 6 they were given the chance to reinvent the wheel based on what they learned.  What they came up with is very powerful, and pretty good all around.

-Chris


> 
> Regular Expression Literals
> Proposal: [SE-NNNN] (https://github/apple/swift-evolution/blob/master/proposals/NNNN-name.md <https://github/apple/swift-evolution/blob/master/proposals/NNNN-name.md>)
> Author: Patrick Gili (https://github.com/gili-patrick-r <https://github.com/gili-patrick-r>)
> Status: Awaiting review
> Review manager: TBD
> Introduction
> The Swift language has no native support for regular expressions, which makes working with regular expressions in Swift awkward, tedious, and error prone. This proposal describes the addition of regular expressions literals to Swift.
> 
> Swift-evolution thread: Empower String type with regular expression <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005126.html>
> Motivation
> The definition of a pattern used by a regular expression is complicated by the fact that Swift does not support raw string literals (i.e., string literals that do not process character escapes or interpolation) or regular expression literals similar to Perl or Ruby. Rather, Swift only supports C-style string literals with support for interpolation. C-style string literals require a backslash to escape a backslash. The patterns representing regular expressions make frequent use of the backslash, and hence patterns represented by a Swift string literal become unreadable and difficult to maintain. For example, consider the following pattern to match and parse URLs:
> 
> ^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
> To define this pattern in Swift, an application needs to define a string, such as
> 
> let pattern = "^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w <smb://///w> \\.- <smb://.->]*)*\\/?$"
> This later is obviously difficult to read and maintain.
> 
> Proposed solution
> The proposed solution is to add regular expression literals to Swift, similar to regular expression literal support by other languages, such as Perl and Ruby. Regular expression literals differ from raw string literals in two primary ways. First, regular expression literals support string interpolation of the pattern, which allows an application to construct a regular expression at run-time. Second, a regular expression literal creates an instance of a NSRegularExpression, rather than an instance of String. The following example declares a regular expression for the purpose of matching and parsing URIs with a specified scheme:
> 
> let scheme = "https"
> let urlRegex = /^(\(scheme)?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
> To make regular expression literals effective, it must be possible to replace the creation of any instance of NSRegularExpression. The initializer for NSRegularExpression accepts a string describing the pattern and a set of options. To support these options, a regular expression literal accepts a list of zero or more options after the delimiter closing the pattern. For example, consider a regular expression to match and parse email addresses.
> 
> let emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
> This regular expression will match "john.appleseed at apple.com <mailto:john.appleseed at apple.com>"", but not "JOHN.APPLESEED at APPLE.COM <mailto:JOHN.APPLESEED at apple.com>". The regular expression has to allow for this, as email addresses are not case sensitive. Rather than add the complexity to the pattern to account for this, the following example achieves the same:
> 
> let emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/i
> In this case, the "i" following the delimiter closing the pattern maps to RegularExpressionOption.CaseInsensitive.
> 
> Detailed design
> The following modification to the grammar of a literal adds support for regular expression literals:
> 
> literal → numeric-literal | string-literal | boolean-literal | regex-literal | nil-literal
> 
> The following grammar describes a regular expression literal:
> 
> regex-literal → static-regex-literal | interpolated-regex-literal
> 
> static-regex-literal → / patternopt / regex-optionsopt
> pattern → pattern_item patternopt
> pattern_item → any Unicode scalar value except /, U+000A, or U+000D
> 
> interpolated-string-literal → / interpolated-patternopt / regex-optionsopt
> interpolated_pattern → interpolated_pattern_item interpolated_patternopt
> interpolated_pattern_item → ( expression ) | pattern_item
> 
> regex-options → regex-option | regex-optionopt
> regex_option → i | x | q | s | m | d | b
> 
> The following table summarize the regular expression options:
> 
> Option 	RegularExpressionOption 
> i 	CaseInsensitive 
> x 	AllowCommentsAndWhitespace 
> q 	IgnoreMetacharacters 
> s 	DotMatchesLineSeparations 
> m 	AnchorsMatchLines 
> d 	UseUnixLineSeparators 
> b 	UseUnicodeWordBoundaries 
> Impact on existing code
> The use of regular expression literals is opt-in, and hence there is no impact to existing code.
> 
> Alternatives considered
> There have been several threads on the swift-evoluation mailing list that have discussed alternatives to regular expression literals.
> 
> String literal suffixes for defining types <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002193.html> discussed string literals with a single character appended after the closing delimiter to denote the type. For example, the string literal r"\d+" in Python denotes a regular expression literal. However, this approach suffers from two disadvantages: 1) it does not support string interpolation in the pattern, and 2) it uses double-quotes for the delimiter, which appears more frequently in regular expressions than the forward slash.
> 
> Muli-line string literals <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002325.html> discussed string literals spanning more than one line. This would be a worthy addition to the regular expression literal discussed in this proposal. We should consider modifying the grammar to support this. However, I wanted to introduce changes incrementally to maintain focus.
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160131/e45d6682/attachment.html>


More information about the swift-evolution mailing list