[swift-evolution] Empower String type with regular expression

Jens Persson jens at bitcycle.com
Sat Feb 6 09:33:21 CST 2016


I just stumbled on some interesting articles and notes about implementing
regular expressions, written by Russ Cox (Google, Go):
https://swtch.com/~rsc/regexp/


On Wed, Feb 3, 2016 at 2:06 PM, Patrick Gili via swift-evolution <
swift-evolution at swift.org> wrote:

> Hi Howard,
>
> I'm not saying the two methods don't have to be exclusive. However, you
> asked us to consider converting regex literals into Swift Verbal
> Expressions. My response highlighted potential issues with this approach.
>
> Cheers,
> -Patrick
>
> On Feb 2, 2016, at 6:53 PM, Howard Lovatt <howard.lovatt at gmail.com> wrote:
>
> I don't see that the two have to be exclusive. If the design of the regex
> literal is suitable for both a traditional NSRegularExpression and a verbal
> type implementation then the two can co-exist. It can also be staged, so
> that a literal can be introduced first with a bridge to legacy
> NSRegularExpression and then later a verbal implementation could be added.
> The key is to design a liberal that is future proofed.
>
> On 3 February 2016 at 10:33, Patrick Gili <gili.patrick.r at gili-labs.com>
> wrote:
>
>> I don't feel good about this direction for the following reasons:
>> 1) Complexity
>> 2) Maturity? I don't know how Verbal Expressions has been implemented.
>> Does it leverage mature regex open source? Or, has it been written from
>> scratch?
>> 3) Performance? Compiling a regex literal typically results in a FSM of a
>> sort, optimized to parse strings. I wouldn't think that converting a regex
>> literal to Verbal Expressions would yield great performance every time a
>> match or substitution is done.
>>
>> -Patrick
>>
>> On Feb 2, 2016, at 5:55 PM, Howard Lovatt <howard.lovatt at gmail.com>
>> wrote:
>>
>> The difference is that I am proposing supporting both verbal expressions
>> and regex literals and that - literals are converted to verbals and the
>> processing happens at the verbal level. The reason for this is that verbals
>> are easy to handle programmatically whilst literals are great for quickly
>> specifying a regex.
>>
>> On Tuesday, 2 February 2016, Patrick Gili <gili.patrick.r at gili-labs.com>
>> wrote:
>>
>>> Hi Howard,
>>>
>>> I don't see how this is very different from the Swift Verbal
>>> Expressions. It would suffer from the same disadvantages I have stated
>>> previously.
>>>
>>> Cheers,
>>> -Patrick
>>>
>>> On Feb 2, 2016, at 1:51 AM, Howard Lovatt via swift-evolution <
>>> swift-evolution at swift.org> wrote:
>>>
>>> Others have suggested a programatic regex instead of a regex literal,
>>> how about doing both? Something like:
>>>
>>> enum RegexElement {
>>>     case capture(name: String, value: String)
>>>     case special(Special)
>>>     // ...
>>>     enum Special: String {
>>>         case startOfLine = "^"
>>>         // ...
>>>         case endOfLine = "$"
>>>     }
>>> }
>>>
>>> // Define a regexLiteral syntax that the compiler understands that is of
>>> type Regex and consists of String representations of RegexElements, e.g.
>>> using forward slash:
>>> //    /<RegexElements>*/
>>>
>>> struct Regex: CustomStringConvertible { // Compiled, immutable, thread
>>> safe, and bridged to NSRegularExpression
>>>     // ... internal compiled representation
>>>     let elements: [RegexElement]
>>>     var description: String {
>>>         return RegexElement.Special.startOfLine.rawValue // Example.
>>> Really returns all the elements converted back to a string
>>>     }
>>>     init(_ elements: RegexElement...) {
>>>         self.elements = elements // Example. Really also compiles the
>>> expression
>>>     }
>>>     // init(regexLiteral regex: Regex) {
>>>     // init(concatAll regexes: Regex...) {
>>>     // init(fromString string: String) {
>>>     // ... more inits
>>>     func map<T>(input: String, @noescape mapper: (element: RegexElement)
>>> throws -> T) rethrows -> [T] {
>>>         return [try mapper(element: RegexElement.special(.startOfLine))]
>>> // Example. Really does the matching
>>>     }
>>>     // func flatMap<T>(input: String, @noescape mapper: (element:
>>> RegexElement) throws -> T?) rethrows -> [T] {
>>>     // func flatMap<S: SequenceType>(input: String, @noescape mapper:
>>> (element: RegexElement) throws -> S) rethrows -> [S.Generator.Element] {
>>>     // func forEach(input: String, @noescape eacher: (element:
>>> RegexElement) throws -> Void) rethrows {
>>>     // ... more funcs
>>> }
>>>
>>> let regex = Regex(RegexElement.special(.startOfLine)) // Normally a
>>> regex literal
>>> let asStringArray = regex.map("Example") { element -> String in //
>>> Returns `["^"]` in example
>>>     switch element {
>>>     case let .capture(_, v): return v
>>>     case let .special(s): return s.rawValue
>>>     }
>>> }
>>>
>>>
>>> The advantages are:
>>>
>>>    1.   We get a literal type for convenience.
>>>    2.   We get a programatic type when we need to manipulate regexes.
>>>    3.   Breaking the regex matches into the enum defined elements of
>>>    the regex works well with Swift pattern matching.
>>>
>>> (Above is a very rough sketch!)
>>>
>>>
>>> On 2 February 2016 at 16:44, Thorsten Seitz via swift-evolution <
>>> swift-evolution at swift.org> wrote:
>>>
>>>> Something like Scala's extractors or F#'s Active Patterns would be most
>>>> welcome to generalize pattern matching.
>>>>
>>>> http://docs.scala-lang.org/tutorials/tour/extractor-objects.html
>>>> https://en.m.wikibooks.org/wiki/F_Sharp_Programming/Active_Patterns
>>>>
>>>> -Thorsten
>>>>
>>>> Am 01.02.2016 um 15:46 schrieb James Campbell via swift-evolution <
>>>> swift-evolution at swift.org>:
>>>>
>>>> It would be great if we could create a generic way of making this
>>>> swifty. You may let say want to implement a matching system for structure
>>>> like JSON or XML (i.e XQuery).
>>>>
>>>>
>>>>
>>>> *___________________________________*
>>>>
>>>> *James⎥Lead Engineer*
>>>>
>>>> *james at supmenow.com⎥supmenow.com <http://supmenow.com/>*
>>>>
>>>> *Sup*
>>>>
>>>> *Runway East *
>>>>
>>>> *10 Finsbury Square*
>>>>
>>>> *London*
>>>>
>>>> * EC2A 1AF *
>>>>
>>>> On Mon, Feb 1, 2016 at 2:43 PM, Patrick Gili via swift-evolution <
>>>> swift-evolution at swift.org> wrote:
>>>>
>>>>> Hi Dany,
>>>>>
>>>>> My response is inline below.
>>>>>
>>>>> Cheers,
>>>>> -Patrick
>>>>>
>>>>> On Jan 31, 2016, at 8:56 PM, Dany St-Amant <dsa.mls at icloud.com> wrote:
>>>>>
>>>>>
>>>>> Le 31 janv. 2016 à 16:46, Patrick Gili <gili.patrick.r at gili-labs.com>
>>>>> a écrit :
>>>>>
>>>>> Hi Dany,
>>>>>
>>>>> Please find my response inline below.
>>>>>
>>>>> Cheers,
>>>>> -Patrick
>>>>>
>>>>> On Jan 31, 2016, at 3:46 PM, Dany St-Amant via swift-evolution <
>>>>> swift-evolution at swift.org> wrote:
>>>>>
>>>>> This seem to be two proposals in one:
>>>>> 1. Initialize NSRegularExpression with a single String which includes
>>>>> options
>>>>>
>>>>> The ultimate goal based on the earlier mail in the thread seems to be
>>>>> able in a future proposal do thing like: string ~= replacePattern, if
>>>>> string =~ pattern, decoupled from the legacy Obj-C. Isn’t
>>>>> NSRegularExpression part of the legacy? The conversion of the literal
>>>>> string as regular expression should probably part of the proposal for these
>>>>> operators; as this is the time we will know how we want the text to be
>>>>> interpreted.
>>>>>
>>>>>
>>>>> I don't see any evidence of NSRegularExpression becoming part of any
>>>>> legacy. Given SE-005, SE-006, and SE-023, the name is probably changing
>>>>> from NSRegularExpression to RegularExpression. However, I don't think the
>>>>> definition of the class will change, only the name.
>>>>>
>>>>> I would like to see an operator regular expression matching operator,
>>>>> like Ruby and Perl. I was trying to keep the proposal a minimal increment
>>>>> that would buy the biggest bang for the buck. We can already accomplish
>>>>> much of what other languages can do with regard to regular expression.
>>>>> However, the notion of a regular expression isn't something we can work
>>>>> around with custom library today. Can you suggest something addition that
>>>>> should be in the proposal?
>>>>>
>>>>>
>>>>> Splitting proposal in smaller ones have its advantage, but here I am
>>>>> just wondering if we are sure that these future operation will use the
>>>>> NSRegularExpression/RegularExpression. And does the currently selected
>>>>> syntax allow for future expansion, it would be bad to introduce something
>>>>>  that need to be torn away or changed in an incompatible way, once we
>>>>> really start to use them in their final location.
>>>>>
>>>>> The proposal is focused on the search, but seem to skip the
>>>>> substitution; I am unable to see an option to replace all matches instead
>>>>> of the first one only in the proposal. I, as many other, would expect
>>>>> regular expression in a language to also support substitution.
>>>>>
>>>>> As for addition to the proposal, the processing of the string could be
>>>>> support for any character (within some limit) for the slash delimiter. With
>>>>> sed, when replacing  path component, one can do: echo $PWD | sed -e
>>>>> "s:^/usr/local/bin:/opt/share/bin:g", instead of escaping every
>>>>> single slashes. Which is really handy to make thing easier to read.
>>>>>
>>>>> Also, putting aside that I think \(scheme) should not be interpreted
>>>>> in the example, with a syntax allowing such interpretation the variable
>>>>> should be processed to generate proper escaping. If one is to use
>>>>> \(filename) you get "main.c", but one must use \(filename.escaped()) to get
>>>>> the proper "main\.c" to avoid matching "mainac". The String.escaped() must
>>>>> be in a format compatible with the format used when converting the regular
>>>>> expression into NSRegularExpression (not sure if the two syntax are the
>>>>> same; I think that at least the handling of /  may differ)
>>>>>
>>>>>
>>>>> I agree. Perhaps I went too far with keeping the proposal
>>>>> short-and-sweet. Especially when you consider the rich syntax that Perl
>>>>> supports for substitution.
>>>>>
>>>>>
>>>>> 2. Easily create a String without escaping (\n is not linefeed, but \
>>>>> and n)
>>>>>
>>>>> The ability to not interpret the backslash as escape can be useful in
>>>>> other scenario that creating a NSRegularExpression; like creating a Windows
>>>>> pathname, or creating regular expression which are then given to external
>>>>> tool.  So this part of the proposal should probably be generalized.
>>>>>
>>>>>
>>>>> Generalize it for what? If you're thinking along the line of raw
>>>>> strings, I agree that we need this capability, as well as multi-line string
>>>>> literals. However, I just soon we have separate proposals for this.
>>>>>
>>>>>
>>>>> My point/opinion here, is that a regular expressions are just a String
>>>>> which are then interpreted; the same way as "Good Morning", "Bonjour", or
>>>>> "Marhaba" (even when using the arabic script) are just String when you
>>>>> assign then to a variable in Swift, and then interpreted by the intended
>>>>> user. They are not String, frenchString, rigthToLeftString. So I do not see
>>>>> why a regular expression should have privileged treatment and have its own
>>>>> language level syntax. The only difference when writing regular expression,
>>>>> or Windows pathname, or any String with a syntax with heavily uses of
>>>>> backslashes, is that one may want to disable the special meaning of the
>>>>> backslashes, to make thing more readable.
>>>>>
>>>>> On the page of geeky-ing the String there’s four main part IMHO
>>>>> - multi-line support
>>>>> - no backslash escaping version (which should include no processing
>>>>> the \(variable) format)
>>>>> - inclusion of String delimiter inside the String
>>>>> - concat of backslash/no backslash version. Bash example echo 'echo
>>>>> "$BASH" shows '"$BASH"
>>>>>
>>>>> I’m still trying to find back the mail thread crumbs on these topics,
>>>>> since before restarting the discussion in these topics, the previous one
>>>>> should be properly summarized; unless such summary already exist.
>>>>>
>>>>>
>>>>> I think supporting interpolation is important. Both Perl and Ruby
>>>>> support it, and I'm sure there are other languages. One thing I forgot to
>>>>> put into the proposal: an option to disable interpolation or limit it to
>>>>> single pass.
>>>>>
>>>>> Looking ahead at the other responses, Chris Lattner has suggested that
>>>>> the proposal would have more traction if we can find a way to fold this
>>>>> into Swift's pattern matching. I can't say as I disagree, as this makes
>>>>> regular expression more Swifty.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Dany
>>>>>
>>>>> Dany
>>>>>
>>>>> Le 31 janv. 2016 à 12:18, Patrick Gili via swift-evolution <
>>>>> swift-evolution at swift.org> a écrit :
>>>>>
>>>>> Here is the link to the proposal on GitHub:
>>>>>
>>>>>
>>>>> https://github.com/gili-patrick-r/swift-evolution/blob/master/proposals/NNNN-regular-expression-literals.md
>>>>>
>>>>> Cheers,
>>>>> -Patrick
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>
>>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>>
>>>
>>>
>>> --
>>>   -- Howard.
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>>>
>>
>> --
>>   -- Howard.
>>
>>
>>
>
>
> --
>   -- Howard.
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>


-- 
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160206/57e85f6e/attachment.html>


More information about the swift-evolution mailing list