[swift-evolution] Empower String type with regular expression

Patrick Gili gili.patrick.r at gili-labs.com
Sun Jan 31 10:32:28 CST 2016


There have been several threads that have discussed the notion of a regular expression literals. However, I didn't see anyone putting together a formal proposal, and hence I took the liberty to do so. I would appreciate discussion and comments on the proposal:

Regular Expression Literals
Proposal: [SE-NNNN] (https://github/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
Author: Patrick Gili (https://github.com/gili-patrick-r)
Status: Awaiting review
Review manager: TBD
Introduction
The Swift language has no native support for regular expressions, which makes working with regular expressions in Swift awkward, tedious, and error prone. This proposal describes the addition of regular expressions literals to Swift.

Swift-evolution thread: Empower String type with regular expression <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005126.html>
Motivation
The definition of a pattern used by a regular expression is complicated by the fact that Swift does not support raw string literals (i.e., string literals that do not process character escapes or interpolation) or regular expression literals similar to Perl or Ruby. Rather, Swift only supports C-style string literals with support for interpolation. C-style string literals require a backslash to escape a backslash. The patterns representing regular expressions make frequent use of the backslash, and hence patterns represented by a Swift string literal become unreadable and difficult to maintain. For example, consider the following pattern to match and parse URLs:

^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
To define this pattern in Swift, an application needs to define a string, such as

let pattern = "^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w \\.-]*)*\\/?$"
This later is obviously difficult to read and maintain.

Proposed solution
The proposed solution is to add regular expression literals to Swift, similar to regular expression literal support by other languages, such as Perl and Ruby. Regular expression literals differ from raw string literals in two primary ways. First, regular expression literals support string interpolation of the pattern, which allows an application to construct a regular expression at run-time. Second, a regular expression literal creates an instance of a NSRegularExpression, rather than an instance of String. The following example declares a regular expression for the purpose of matching and parsing URIs with a specified scheme:

let scheme = "https"
let urlRegex = /^(\(scheme)?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
To make regular expression literals effective, it must be possible to replace the creation of any instance of NSRegularExpression. The initializer for NSRegularExpression accepts a string describing the pattern and a set of options. To support these options, a regular expression literal accepts a list of zero or more options after the delimiter closing the pattern. For example, consider a regular expression to match and parse email addresses.

let emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
This regular expression will match "john.appleseed at apple.com"", but not "JOHN.APPLESEED at APPLE.COM". The regular expression has to allow for this, as email addresses are not case sensitive. Rather than add the complexity to the pattern to account for this, the following example achieves the same:

let emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/i
In this case, the "i" following the delimiter closing the pattern maps to RegularExpressionOption.CaseInsensitive.

Detailed design
The following modification to the grammar of a literal adds support for regular expression literals:

literal → numeric-literal | string-literal | boolean-literal | regex-literal | nil-literal

The following grammar describes a regular expression literal:

regex-literal → static-regex-literal | interpolated-regex-literal

static-regex-literal → / patternopt / regex-optionsopt
pattern → pattern_item patternopt
pattern_item → any Unicode scalar value except /, U+000A, or U+000D

interpolated-string-literal → / interpolated-patternopt / regex-optionsopt
interpolated_pattern → interpolated_pattern_item interpolated_patternopt
interpolated_pattern_item → ( expression ) | pattern_item

regex-options → regex-option | regex-optionopt
regex_option → i | x | q | s | m | d | b

The following table summarize the regular expression options:

Option 	RegularExpressionOption 
i 	CaseInsensitive 
x 	AllowCommentsAndWhitespace 
q 	IgnoreMetacharacters 
s 	DotMatchesLineSeparations 
m 	AnchorsMatchLines 
d 	UseUnixLineSeparators 
b 	UseUnicodeWordBoundaries 
Impact on existing code
The use of regular expression literals is opt-in, and hence there is no impact to existing code.

Alternatives considered
There have been several threads on the swift-evoluation mailing list that have discussed alternatives to regular expression literals.

String literal suffixes for defining types <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002193.html> discussed string literals with a single character appended after the closing delimiter to denote the type. For example, the string literal r"\d+" in Python denotes a regular expression literal. However, this approach suffers from two disadvantages: 1) it does not support string interpolation in the pattern, and 2) it uses double-quotes for the delimiter, which appears more frequently in regular expressions than the forward slash.

Muli-line string literals <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002325.html> discussed string literals spanning more than one line. This would be a worthy addition to the regular expression literal discussed in this proposal. We should consider modifying the grammar to support this. However, I wanted to introduce changes incrementally to maintain focus.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160131/3cb35f41/attachment.html>


More information about the swift-evolution mailing list