<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div></div><div><br></div><div>Hi Joshua,</div><div><br></div><div>Thanks for bringing this topic up. It may help to outline why regular expressions were deferred until Swift 5. The work to create a regular expression type <span style="background-color: rgba(255, 255, 255, 0);">itself</span>, and even to add a regex literal protocol, is fairly straightforward and probably could have been done in the Swift 4 timeframe (maybe by making NSRegularExpression a value type and refreshing its API), but there are other design aspects that we need to explore prior to that, many of which have compiler impact, or would need to factor in the underlying representation of string in order to be efficient, in order to make Swift-native regular expressions really great.</div><div><br></div><div>Right now, the top priority for Swift 5 is ABI stability, and String plays a fairly large part in that. We need to finalize the size of String (currently 3 words, but 2 is the likely final size), implement the small string optimization, and decide which parts of String need to be fragile/inlineable and which need to be resilient.</div><div><br></div><div>Since ABI stability takes priority, this give us time in the mean-time to consider the broader design questions of what Swift-native regular expressions would look like. These design considerations probably need to come ahead of designing an API for specific types like a regex, matches etc.</div><div><br></div><div>Some examples of these kind of questions include:</div><div><br></div><div>What syntax changes to the usual form of regexes should be considered? For example, what should “.” mean in regular expressions? It would be out of keeping for it to mean a code unit, when applied to Swift.String. Alternatively, should regular expressions work on all the views? In which case, “.” could mean a grapheme when applied to String, but a code unit when applied to the UTF16 view.</div><div><br></div><div>How can let bindings work on capture groups i.e. rather than having named capture groups like in Perl, can we bind directly to variables in switches etc? Could types conform to a RegularExpressionCapturable that would consume part of the string and initialize self, so that you could capture not just Substring but any type? You can’t express this in the language today, and would need compiler integration. This integration could start more hard-coded in order to deliver value in the Swift 5 release, but hopefully be generalizable in later releases.</div><div><br></div><div>What other std lib APIs should be changed once we have regular expressions? For example, split ought to work with regexes e.g. let words = sentence.split(separator: /\w+/). How can this generalize to Collections where possible? E.g. [1,2,3,4].index(of: [2,3]) ought to work just as “abcd”.index(of: /bc/) should.</div><div><br></div><div>On Aug 10, 2017, at 7:24 AM, Joshua Alvarado via swift-evolution <<a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a>> wrote:<br><br></div><blockquote type="cite"><div><meta http-equiv="Content-Type" content="text/html charset=us-ascii">Hey everyone,<div class=""><br class=""></div><div class="">I would like to pitch an implementation of Regex in Swift and gather all of your thoughts.</div><div class=""><br class=""></div><div class="">Motivation:</div><div class="">In the String Manifesto for Swift 4, addressing regular expressions was not in scope. Swift 5 would be a more fitting version to address the implementation of Regex in Swift. NSRegularExpression is a suitable solution for pattern matching but the API is in unfitting for the future direction of Swift.</div><div class=""><br class=""></div><div class="">Implementation:</div><div class="">The Regular expression API will be implemented by a Regex structure object which is a regular expression that you can apply to Unicode strings. The Regex struct will conform to the RegexProtocol, which is a type that can represent a regular expression. ExpressibleByRegexLiteral will be used to initialize a regex literal creating an easy to use syntax and a Match structure will be used to represent a match found with a Regex.</div><div class=""><br class=""></div><div class="">Draft of implementation:</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">protocol</span><span style="font-variant-ligatures: no-common-ligatures" class=""> ExpressibleByRegexLiteral {</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">associatedtype</span><span style="font-variant-ligatures: no-common-ligatures" class=""> RegexLiteralType</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">init</span><span style="font-variant-ligatures: no-common-ligatures" class="">(regexLiteral value: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Self</span><span style="font-variant-ligatures: no-common-ligatures" class="">.RegexLiteralType)</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(207, 135, 36);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">// Structure of information about a match of regex on a string</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">struct</span><span style="font-variant-ligatures: no-common-ligatures" class=""> Match {</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">var</span><span style="font-variant-ligatures: no-common-ligatures" class=""> regex: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Regex</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">var</span><span style="font-variant-ligatures: no-common-ligatures" class=""> start: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures" class="">.</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Index</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">var</span><span style="font-variant-ligatures: no-common-ligatures" class=""> end: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures" class="">.</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Index</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">protocol</span><span style="font-variant-ligatures: no-common-ligatures" class=""> RegexProtocol {</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">init</span><span style="font-variant-ligatures: no-common-ligatures" class="">(pattern: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures" class="">) </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">throws</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(207, 135, 36);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">var</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> pattern: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> { </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">get</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> } </span><span style="font-variant-ligatures: no-common-ligatures" class="">// string representation of the pattern</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(207, 135, 36);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">func</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> search(string: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">) -> </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Bool</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures" class="">// used to check if a match is found at all in the string</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(207, 135, 36);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">func</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> match(string: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">) -> [</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Match</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">] </span><span style="font-variant-ligatures: no-common-ligatures" class="">// returns an array of all the matches</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">func</span><span style="font-variant-ligatures: no-common-ligatures" class=""> match(string: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">String</span><span style="font-variant-ligatures: no-common-ligatures" class="">, using: ((</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Match</span><span style="font-variant-ligatures: no-common-ligatures" class="">) -> </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Void</span><span style="font-variant-ligatures: no-common-ligatures" class="">)) </span><span style="font-variant-ligatures: no-common-ligatures; color: #cf8724" class="">// enmuerate over matches</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(195, 89, 0);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">struct</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> Regex: </span><span style="font-variant-ligatures: no-common-ligatures" class="">RegexProtocol</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> {</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">init</span><span style="font-variant-ligatures: no-common-ligatures" class="">(pattern: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Regex</span><span style="font-variant-ligatures: no-common-ligatures" class="">, options: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Regex</span><span style="font-variant-ligatures: no-common-ligatures" class="">.</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Options</span><span style="font-variant-ligatures: no-common-ligatures" class="">)</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">let</span><span style="font-variant-ligatures: no-common-ligatures" class=""> options: [</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Regex</span><span style="font-variant-ligatures: no-common-ligatures" class="">.</span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Options</span><span style="font-variant-ligatures: no-common-ligatures" class="">]</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">static</span><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">let</span><span style="font-variant-ligatures: no-common-ligatures" class=""> word: </span><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Regex</span><span style="font-variant-ligatures: no-common-ligatures" class=""> </span><span style="font-variant-ligatures: no-common-ligatures; color: #cf8724" class="">// \w</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(207, 135, 36);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> </span><span style="font-variant-ligatures: no-common-ligatures" class="">// other useful regexes can be added as well</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><div style="margin: 0px; line-height: normal; color: rgb(207, 135, 36);" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">// Examples</span></div></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">let</span><span style="font-variant-ligatures: no-common-ligatures" class=""> regex = \[a-zA-Z]+\</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(232, 35, 0);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">let</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> matches = </span><span style="font-variant-ligatures: no-common-ligatures; color: #587ea8" class="">regex</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">.match(</span><span style="font-variant-ligatures: no-common-ligatures" class="">"Matching words in text."</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">)</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">for</span><span style="font-variant-ligatures: no-common-ligatures" class=""> match </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">in</span><span style="font-variant-ligatures: no-common-ligatures" class=""> matches {</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(232, 35, 0);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> print(</span><span style="font-variant-ligatures: no-common-ligatures" class="">"Found a match at in string at </span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">\</span><span style="font-variant-ligatures: no-common-ligatures" class="">(</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">match.start</span><span style="font-variant-ligatures: no-common-ligatures" class="">) to </span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">\</span><span style="font-variant-ligatures: no-common-ligatures" class="">(</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">match.end</span><span style="font-variant-ligatures: no-common-ligatures" class="">)"</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class="">)</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(232, 35, 0);" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">let</span><span style="font-variant-ligatures: no-common-ligatures; color: #000000" class=""> helloStr = </span><span style="font-variant-ligatures: no-common-ligatures" class="">"Hello world"</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""></span><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures; color: #c35900" class="">Regex</span><span style="font-variant-ligatures: no-common-ligatures" class="">.word.match(</span><span style="font-variant-ligatures: no-common-ligatures; color: #587ea8" class="">helloStr</span><span style="font-variant-ligatures: no-common-ligatures" class="">) { match </span><span style="font-variant-ligatures: no-common-ligatures; color: #36568a" class="">in</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> print(</span><span style="font-variant-ligatures: no-common-ligatures; color: #e82300" class="">"Matched </span><span style="font-variant-ligatures: no-common-ligatures" class="">\</span><span style="font-variant-ligatures: no-common-ligatures; color: #e82300" class="">(</span><span style="font-variant-ligatures: no-common-ligatures; color: #587ea8" class="">helloStr</span><span style="font-variant-ligatures: no-common-ligatures" class="">[match.start..<match.end]</span><span style="font-variant-ligatures: no-common-ligatures; color: #e82300" class="">)"</span><span style="font-variant-ligatures: no-common-ligatures" class="">)</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">}</span></div></div><div class=""> </div><div class="">Of course this is a scratch implementation I made but it is to open discussion on the topic. I feel the Regex struct itself will need more methods and variables such as for flags and number of groups. Please provide feedback with improvements to the code, concerns on the topic, or just open up discussion. Thank you!</div><div class=""><br class=""><div class="">
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;">Joshua Alvarado</div><div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;"><a href="mailto:alvaradojoshua0@gmail.com" class="">alvaradojoshua0@gmail.com</a></div></div></div><div class=""><div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class=""><br class=""></div></div></div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>swift-evolution mailing list</span><br><span><a href="mailto:swift-evolution@swift.org">swift-evolution@swift.org</a></span><br><span><a href="https://lists.swift.org/mailman/listinfo/swift-evolution">https://lists.swift.org/mailman/listinfo/swift-evolution</a></span><br></div></blockquote></body></html>