<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">There have been several threads that have discussed the notion of a regular expression literals. However, I didn't see anyone putting together a formal proposal, and hence I took the liberty to do so. I would appreciate discussion and comments on the proposal:<div class=""><br class=""></div><div class=""><h1 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 28px; font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Regular Expression Literals</h1><ul style="-webkit-print-color-adjust: exact; margin: 15px 0px; padding-left: 30px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><li style="-webkit-print-color-adjust: exact; margin: 0px;" class="">Proposal: [SE-NNNN] (<a href="https://github/apple/swift-evolution/blob/master/proposals/NNNN-name.md" class="">https://github/apple/swift-evolution/blob/master/proposals/NNNN-name.md</a>)</li><li style="-webkit-print-color-adjust: exact; margin: 0px;" class="">Author: Patrick Gili (<a href="https://github.com/gili-patrick-r" class="">https://github.com/gili-patrick-r</a>)</li><li style="-webkit-print-color-adjust: exact; margin: 0px;" class="">Status: <strong style="-webkit-print-color-adjust: exact; margin-top: 0px;" class="">Awaiting review</strong></li><li style="-webkit-print-color-adjust: exact; margin: 0px;" class="">Review manager: TBD</li></ul><h2 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Introduction</h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The Swift language has no native support for regular expressions, which makes working with regular expressions in Swift awkward, tedious, and error prone. This proposal describes the addition of regular expressions literals to Swift.</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">Swift-evolution thread: <a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005126.html" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">Empower String type with regular expression</a></p><h2 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Motivation</h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The definition of a pattern used by a regular expression is complicated by the fact that Swift does not support raw string literals (i.e., string literals that do not process character escapes or interpolation) or regular expression literals similar to Perl or Ruby. Rather, Swift only supports C-style string literals with support for interpolation. C-style string literals require a backslash to escape a backslash. The patterns representing regular expressions make frequent use of the backslash, and hence patterns represented by a Swift string literal become unreadable and difficult to maintain. For example, consider the following pattern to match and parse URLs:</p><pre style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-bottom: 15px; background-color: rgb(248, 248, 248); border: 1px solid rgb(204, 204, 204); font-size: 13px; line-height: 19px; overflow: auto; padding: 6px 10px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class=""><code style="-webkit-print-color-adjust: exact; margin: 0px; padding: 0px; border: none; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
</code></pre><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">To define this pattern in Swift, an application needs to define a string, such as</p><pre style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-bottom: 15px; background-color: rgb(248, 248, 248); border: 1px solid rgb(204, 204, 204); font-size: 13px; line-height: 19px; overflow: auto; padding: 6px 10px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class=""><code style="-webkit-print-color-adjust: exact; margin: 0px; padding: 0px; border: none; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">let pattern = "^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([<a href="smb://///w" class="">\\/\\w</a> <a href="smb://.-" class="">\\.-</a>]*)*\\/?$"
</code></pre><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">This later is obviously difficult to read and maintain.</p><h2 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Proposed solution</h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The proposed solution is to add regular expression literals to Swift, similar to regular expression literal support by other languages, such as Perl and Ruby. Regular expression literals differ from raw string literals in two primary ways. First, regular expression literals support string interpolation of the pattern, which allows an application to construct a regular expression at run-time. Second, a regular expression literal creates an instance of a NSRegularExpression, rather than an instance of String. The following example declares a regular expression for the purpose of matching and parsing URIs with a specified scheme:</p><pre style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-bottom: 15px; background-color: rgb(248, 248, 248); border: 1px solid rgb(204, 204, 204); font-size: 13px; line-height: 19px; overflow: auto; padding: 6px 10px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class=""><code style="-webkit-print-color-adjust: exact; margin: 0px; padding: 0px; border: none; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">let scheme = "https"
let urlRegex = /^(\(scheme)?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
</code></pre><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">To make regular expression literals effective, it must be possible to replace the creation of any instance of NSRegularExpression. The initializer for NSRegularExpression accepts a string describing the pattern and a set of options. To support these options, a regular expression literal accepts a list of zero or more options after the delimiter closing the pattern. For example, consider a regular expression to match and parse email addresses.</p><pre style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-bottom: 15px; background-color: rgb(248, 248, 248); border: 1px solid rgb(204, 204, 204); font-size: 13px; line-height: 19px; overflow: auto; padding: 6px 10px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class=""><code style="-webkit-print-color-adjust: exact; margin: 0px; padding: 0px; border: none; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">let emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
</code></pre><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">This regular expression will match "<a href="mailto:john.appleseed@apple.com" class="">john.appleseed@apple.com</a>"", but not "<a href="mailto:JOHN.APPLESEED@apple.com" class="">JOHN.APPLESEED@APPLE.COM</a>". The regular expression has to allow for this, as email addresses are not case sensitive. Rather than add the complexity to the pattern to account for this, the following example achieves the same:</p><pre style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-bottom: 15px; background-color: rgb(248, 248, 248); border: 1px solid rgb(204, 204, 204); font-size: 13px; line-height: 19px; overflow: auto; padding: 6px 10px; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class=""><code style="-webkit-print-color-adjust: exact; margin: 0px; padding: 0px; border: none; background-color: transparent; border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">let emailRegex = /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/i
</code></pre><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">In this case, the "i" following the delimiter closing the pattern maps to RegularExpressionOption.CaseInsensitive.</p><h2 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Detailed design</h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The following modification to the grammar of a literal adds support for regular expression literals:</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><em style="-webkit-print-color-adjust: exact;" class="">literal</em> → <em style="-webkit-print-color-adjust: exact;" class="">numeric-literal</em> | <em style="-webkit-print-color-adjust: exact;" class="">string-literal</em> | <em style="-webkit-print-color-adjust: exact;" class="">boolean-literal</em> | <em style="-webkit-print-color-adjust: exact;" class="">regex-literal</em> | <em style="-webkit-print-color-adjust: exact;" class="">nil-literal</em></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The following grammar describes a regular expression literal:</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><em style="-webkit-print-color-adjust: exact;" class="">regex-literal</em> → <em style="-webkit-print-color-adjust: exact;" class="">static-regex-literal</em> | <em style="-webkit-print-color-adjust: exact;" class="">interpolated-regex-literal</em></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><em style="-webkit-print-color-adjust: exact;" class="">static-regex-literal</em> → / <em style="-webkit-print-color-adjust: exact;" class="">pattern</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub> / <em style="-webkit-print-color-adjust: exact;" class="">regex-options</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub><br style="-webkit-print-color-adjust: exact;" class=""><em style="-webkit-print-color-adjust: exact;" class="">pattern</em> → <em style="-webkit-print-color-adjust: exact;" class="">pattern_item</em> <em style="-webkit-print-color-adjust: exact;" class="">pattern</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub><br style="-webkit-print-color-adjust: exact;" class=""><em style="-webkit-print-color-adjust: exact;" class="">pattern_item</em> → any Unicode scalar value except /, U+000A, or U+000D</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><em style="-webkit-print-color-adjust: exact;" class="">interpolated-string-literal</em> → / <em style="-webkit-print-color-adjust: exact;" class="">interpolated-pattern</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub> / <em style="-webkit-print-color-adjust: exact;" class="">regex-options</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub><br style="-webkit-print-color-adjust: exact;" class=""><em style="-webkit-print-color-adjust: exact;" class="">interpolated_pattern</em> → <em style="-webkit-print-color-adjust: exact;" class="">interpolated_pattern_item</em> <em style="-webkit-print-color-adjust: exact;" class="">interpolated_pattern</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub><br style="-webkit-print-color-adjust: exact;" class=""><em style="-webkit-print-color-adjust: exact;" class="">interpolated_pattern_item</em> → ( <em style="-webkit-print-color-adjust: exact;" class="">expression</em> ) | <em style="-webkit-print-color-adjust: exact;" class="">pattern_item</em></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><em style="-webkit-print-color-adjust: exact;" class="">regex-options</em> → <em style="-webkit-print-color-adjust: exact;" class="">regex-option</em> | <em style="-webkit-print-color-adjust: exact;" class="">regex-option</em><sub style="-webkit-print-color-adjust: exact;" class="">opt</sub><br style="-webkit-print-color-adjust: exact;" class=""><em style="-webkit-print-color-adjust: exact;" class="">regex_option</em> → <strong style="-webkit-print-color-adjust: exact;" class="">i</strong> | <strong style="-webkit-print-color-adjust: exact;" class="">x</strong> | <strong style="-webkit-print-color-adjust: exact;" class="">q</strong> | <strong style="-webkit-print-color-adjust: exact;" class="">s</strong> | <strong style="-webkit-print-color-adjust: exact;" class="">m</strong> | <strong style="-webkit-print-color-adjust: exact;" class="">d</strong> | <strong style="-webkit-print-color-adjust: exact;" class="">b</strong></p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The following table summarize the regular expression options:</p><table style="-webkit-print-color-adjust: exact; margin: 15px 0px; padding: 0px; border-collapse: collapse; color: rgb(0, 0, 0); font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><thead style="-webkit-print-color-adjust: exact;" class=""><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); margin: 0px; padding: 0px;" class=""><th style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">Option </th><th style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">RegularExpressionOption </th></tr></thead><tbody style="-webkit-print-color-adjust: exact;" class=""><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">i </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">CaseInsensitive </td></tr><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: rgb(248, 248, 248); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">x </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">AllowCommentsAndWhitespace </td></tr><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">q </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">IgnoreMetacharacters </td></tr><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: rgb(248, 248, 248); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">s </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">DotMatchesLineSeparations </td></tr><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">m </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">AnchorsMatchLines </td></tr><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); background-color: rgb(248, 248, 248); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">d </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">UseUnixLineSeparators </td></tr><tr style="-webkit-print-color-adjust: exact; border-top-width: 1px; border-top-style: solid; border-top-color: rgb(204, 204, 204); margin: 0px; padding: 0px;" class=""><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">b </td><td style="-webkit-print-color-adjust: exact; border: 1px solid rgb(204, 204, 204); margin: 0px; padding: 6px 13px;" class="">UseUnicodeWordBoundaries </td></tr></tbody></table><h2 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Impact on existing code</h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">The use of regular expression literals is opt-in, and hence there is no impact to existing code.</p><h2 style="-webkit-print-color-adjust: exact; margin: 20px 0px 10px; padding: 0px; -webkit-font-smoothing: antialiased; cursor: text; position: relative; font-size: 24px; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(204, 204, 204); font-family: Helvetica, arial, sans-serif; background-color: rgb(255, 255, 255);" class="">Alternatives considered</h2><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class="">There have been several threads on the swift-evoluation mailing list that have discussed alternatives to regular expression literals.</p><p style="-webkit-print-color-adjust: exact; margin: 15px 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255);" class=""><a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002193.html" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">String literal suffixes for defining types</a> discussed string literals with a single character appended after the closing delimiter to denote the type. For example, the string literal <code style="-webkit-print-color-adjust: exact; margin: 0px 2px; padding: 0px 5px; white-space: nowrap; border: 1px solid rgb(234, 234, 234); background-color: rgb(248, 248, 248); border-top-left-radius: 3px; border-top-right-radius: 3px; border-bottom-right-radius: 3px; border-bottom-left-radius: 3px;" class="">r"\d+"</code> in Python denotes a regular expression literal. However, this approach suffers from two disadvantages: 1) it does not support string interpolation in the pattern, and 2) it uses double-quotes for the delimiter, which appears more frequently in regular expressions than the forward slash.</p><p style="-webkit-print-color-adjust: exact; margin-top: 15px; margin-right: 0px; margin-left: 0px; font-family: Helvetica, arial, sans-serif; font-size: 14px; background-color: rgb(255, 255, 255); margin-bottom: 0px !important;" class=""><a href="https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002325.html" style="-webkit-print-color-adjust: exact; color: rgb(65, 131, 196);" class="">Muli-line string literals</a> discussed string literals spanning more than one line. This would be a worthy addition to the regular expression literal discussed in this proposal. We should consider modifying the grammar to support this. However, I wanted to introduce changes incrementally to maintain focus.</p></div></body></html>