[swift-evolution] Empower String type with regular expression

Patrick Gili gili.patrick.r at gili-labs.com
Tue Feb 2 17:33:31 CST 2016


I don't feel good about this direction for the following reasons:
1) Complexity
2) Maturity? I don't know how Verbal Expressions has been implemented. Does it leverage mature regex open source? Or, has it been written from scratch?
3) Performance? Compiling a regex literal typically results in a FSM of a sort, optimized to parse strings. I wouldn't think that converting a regex literal to Verbal Expressions would yield great performance every time a match or substitution is done.

-Patrick

> On Feb 2, 2016, at 5:55 PM, Howard Lovatt <howard.lovatt at gmail.com> wrote:
> 
> The difference is that I am proposing supporting both verbal expressions and regex literals and that - literals are converted to verbals and the processing happens at the verbal level. The reason for this is that verbals are easy to handle programmatically whilst literals are great for quickly specifying a regex.
> 
> On Tuesday, 2 February 2016, Patrick Gili <gili.patrick.r at gili-labs.com <mailto:gili.patrick.r at gili-labs.com>> wrote:
> Hi Howard,
> 
> I don't see how this is very different from the Swift Verbal Expressions. It would suffer from the same disadvantages I have stated previously.
> 
> Cheers,
> -Patrick
> 
>> On Feb 2, 2016, at 1:51 AM, Howard Lovatt via swift-evolution <swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>> wrote:
>> 
>> Others have suggested a programatic regex instead of a regex literal, how about doing both? Something like:
>> 
>> enum RegexElement {
>>     case capture(name: String, value: String)
>>     case special(Special)
>>     // ...
>>     enum Special: String {
>>         case startOfLine = "^"
>>         // ...
>>         case endOfLine = "$"
>>     }
>> }
>> 
>> // Define a regexLiteral syntax that the compiler understands that is of type Regex and consists of String representations of RegexElements, e.g. using forward slash:
>> //    /<RegexElements>*/
>> 
>> struct Regex: CustomStringConvertible { // Compiled, immutable, thread safe, and bridged to NSRegularExpression
>>     // ... internal compiled representation
>>     let elements: [RegexElement]
>>     var description: String {
>>         return RegexElement.Special.startOfLine.rawValue // Example. Really returns all the elements converted back to a string
>>     }
>>     init(_ elements: RegexElement...) {
>>         self.elements = elements // Example. Really also compiles the expression
>>     }
>>     // init(regexLiteral regex: Regex) {
>>     // init(concatAll regexes: Regex...) {
>>     // init(fromString string: String) {
>>     // ... more inits
>>     func map<T>(input: String, @noescape mapper: (element: RegexElement) throws -> T) rethrows -> [T] {
>>         return [try mapper(element: RegexElement.special(.startOfLine))] // Example. Really does the matching
>>     }
>>     // func flatMap<T>(input: String, @noescape mapper: (element: RegexElement) throws -> T?) rethrows -> [T] {
>>     // func flatMap<S: SequenceType>(input: String, @noescape mapper: (element: RegexElement) throws -> S) rethrows -> [S.Generator.Element] {
>>     // func forEach(input: String, @noescape eacher: (element: RegexElement) throws -> Void) rethrows {
>>     // ... more funcs
>> }
>> 
>> let regex = Regex(RegexElement.special(.startOfLine)) // Normally a regex literal
>> let asStringArray = regex.map("Example") { element -> String in // Returns `["^"]` in example
>>     switch element {
>>     case let .capture(_, v): return v
>>     case let .special(s): return s.rawValue
>>     }
>> }
>> 
>> The advantages are:
>>   We get a literal type for convenience.
>>   We get a programatic type when we need to manipulate regexes.
>>   Breaking the regex matches into the enum defined elements of the regex works well with Swift pattern matching.
>> (Above is a very rough sketch!)
>> 
>> 
>> On 2 February 2016 at 16:44, Thorsten Seitz via swift-evolution <swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>> wrote:
>> Something like Scala's extractors or F#'s Active Patterns would be most welcome to generalize pattern matching.
>> 
>> http://docs.scala-lang.org/tutorials/tour/extractor-objects.html <http://docs.scala-lang.org/tutorials/tour/extractor-objects.html>
>> https://en.m.wikibooks.org/wiki/F_Sharp_Programming/Active_Patterns <https://en.m.wikibooks.org/wiki/F_Sharp_Programming/Active_Patterns>
>> 
>> -Thorsten 
>> 
>> Am 01.02.2016 um 15:46 schrieb James Campbell via swift-evolution <swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>>:
>> 
>>> It would be great if we could create a generic way of making this swifty. You may let say want to implement a matching system for structure like JSON or XML (i.e XQuery).
>>> 
>>> 
>>> 
>>> ___________________________________
>>> 
>>> James⎥Lead Engineer
>>> 
>>> james at supmenow.com <javascript:_e(%7B%7D,'cvml','james at supmenow.com');>⎥supmenow.com <http://supmenow.com/>
>>> Sup
>>> 
>>> Runway East
>>> 
>>> 
>>> 10 Finsbury Square
>>> 
>>> London
>>> 
>>> 
>>> EC2A 1AF 
>>> 
>>> 
>>> On Mon, Feb 1, 2016 at 2:43 PM, Patrick Gili via swift-evolution <swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>> wrote:
>>> Hi Dany,
>>> 
>>> My response is inline below.
>>> 
>>> Cheers,
>>> -Patrick
>>> 
>>>> On Jan 31, 2016, at 8:56 PM, Dany St-Amant <dsa.mls at icloud.com <javascript:_e(%7B%7D,'cvml','dsa.mls at icloud.com');>> wrote:
>>>> 
>>>>> 
>>>>> Le 31 janv. 2016 à 16:46, Patrick Gili <gili.patrick.r at gili-labs.com <javascript:_e(%7B%7D,'cvml','gili.patrick.r at gili-labs.com');>> a écrit :
>>>>> 
>>>>> Hi Dany,
>>>>> 
>>>>> Please find my response inline below.
>>>>> 
>>>>> Cheers,
>>>>> -Patrick
>>>>> 
>>>>>> On Jan 31, 2016, at 3:46 PM, Dany St-Amant via swift-evolution <swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>> wrote:
>>>>>> 
>>>>>> This seem to be two proposals in one:
>>>>>> 1. Initialize NSRegularExpression with a single String which includes options
>>>>>> 
>>>>>> The ultimate goal based on the earlier mail in the thread seems to be able in a future proposal do thing like: string ~= replacePattern, if string =~ pattern, decoupled from the legacy Obj-C. Isn’t NSRegularExpression part of the legacy? The conversion of the literal string as regular expression should probably part of the proposal for these operators; as this is the time we will know how we want the text to be interpreted.
>>>>> 
>>>>> I don't see any evidence of NSRegularExpression becoming part of any legacy. Given SE-005, SE-006, and SE-023, the name is probably changing from NSRegularExpression to RegularExpression. However, I don't think the definition of the class will change, only the name.
>>>>> 
>>>>> I would like to see an operator regular expression matching operator, like Ruby and Perl. I was trying to keep the proposal a minimal increment that would buy the biggest bang for the buck. We can already accomplish much of what other languages can do with regard to regular expression. However, the notion of a regular expression isn't something we can work around with custom library today. Can you suggest something addition that should be in the proposal?
>>>> 
>>>> Splitting proposal in smaller ones have its advantage, but here I am just wondering if we are sure that these future operation will use the NSRegularExpression/RegularExpression. And does the currently selected syntax allow for future expansion, it would be bad to introduce something  that need to be torn away or changed in an incompatible way, once we really start to use them in their final location.
>>>> 
>>>> The proposal is focused on the search, but seem to skip the substitution; I am unable to see an option to replace all matches instead of the first one only in the proposal. I, as many other, would expect regular expression in a language to also support substitution.
>>>> 
>>>> As for addition to the proposal, the processing of the string could be support for any character (within some limit) for the slash delimiter. With sed, when replacing  path component, one can do: echo $PWD | sed -e "s:^/usr/local/bin:/opt/share/bin:g", instead of escaping every single slashes. Which is really handy to make thing easier to read.
>>>> 
>>>> Also, putting aside that I think \(scheme) should not be interpreted in the example, with a syntax allowing such interpretation the variable should be processed to generate proper escaping. If one is to use \(filename) you get "main.c", but one must use \(filename.escaped()) to get the proper "main\.c" to avoid matching "mainac". The String.escaped() must be in a format compatible with the format used when converting the regular expression into NSRegularExpression (not sure if the two syntax are the same; I think that at least the handling of /  may differ)
>>> 
>>> I agree. Perhaps I went too far with keeping the proposal short-and-sweet. Especially when you consider the rich syntax that Perl supports for substitution.
>>> 
>>>>> 
>>>>>> 2. Easily create a String without escaping (\n is not linefeed, but \ and n)
>>>>>> 
>>>>>> The ability to not interpret the backslash as escape can be useful in other scenario that creating a NSRegularExpression; like creating a Windows pathname, or creating regular expression which are then given to external tool.  So this part of the proposal should probably be generalized.
>>>>> 
>>>>> Generalize it for what? If you're thinking along the line of raw strings, I agree that we need this capability, as well as multi-line string literals. However, I just soon we have separate proposals for this.
>>>> 
>>>> My point/opinion here, is that a regular expressions are just a String which are then interpreted; the same way as "Good Morning", "Bonjour", or "Marhaba" (even when using the arabic script) are just String when you assign then to a variable in Swift, and then interpreted by the intended user. They are not String, frenchString, rigthToLeftString. So I do not see why a regular expression should have privileged treatment and have its own language level syntax. The only difference when writing regular expression, or Windows pathname, or any String with a syntax with heavily uses of backslashes, is that one may want to disable the special meaning of the backslashes, to make thing more readable.
>>>> 
>>>> On the page of geeky-ing the String there’s four main part IMHO
>>>> - multi-line support
>>>> - no backslash escaping version (which should include no processing the \(variable) format)
>>>> - inclusion of String delimiter inside the String
>>>> - concat of backslash/no backslash version. Bash example echo 'echo "$BASH" shows '"$BASH"
>>>> 
>>>> I’m still trying to find back the mail thread crumbs on these topics, since before restarting the discussion in these topics, the previous one should be properly summarized; unless such summary already exist.
>>> 
>>> I think supporting interpolation is important. Both Perl and Ruby support it, and I'm sure there are other languages. One thing I forgot to put into the proposal: an option to disable interpolation or limit it to single pass.
>>> 
>>> Looking ahead at the other responses, Chris Lattner has suggested that the proposal would have more traction if we can find a way to fold this into Swift's pattern matching. I can't say as I disagree, as this makes regular expression more Swifty.
>>> 
>>>> 
>>>> Regards,
>>>> Dany
>>>> 
>>>>>> Dany
>>>>>> 
>>>>>>> Le 31 janv. 2016 à 12:18, Patrick Gili via swift-evolution <swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>> a écrit :
>>>>>>> 
>>>>>>> Here is the link to the proposal on GitHub:
>>>>>>> 
>>>>>>> https://github.com/gili-patrick-r/swift-evolution/blob/master/proposals/NNNN-regular-expression-literals.md <https://github.com/gili-patrick-r/swift-evolution/blob/master/proposals/NNNN-regular-expression-literals.md>
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> -Patrick
>>>>>> 
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> 
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>
>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> 
>> 
>> 
>> 
>> -- 
>>   -- Howard.
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org <javascript:_e(%7B%7D,'cvml','swift-evolution at swift.org');>
>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
> 
> 
> 
> -- 
>   -- Howard.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160202/0b26f9a6/attachment-0001.html>


More information about the swift-evolution mailing list