[swift-evolution] Empower String type with regular expression

Howard Lovatt howard.lovatt at gmail.com
Tue Feb 2 17:53:22 CST 2016


I don't see that the two have to be exclusive. If the design of the regex
literal is suitable for both a traditional NSRegularExpression and a verbal
type implementation then the two can co-exist. It can also be staged, so
that a literal can be introduced first with a bridge to legacy
NSRegularExpression and then later a verbal implementation could be added.
The key is to design a liberal that is future proofed.

On 3 February 2016 at 10:33, Patrick Gili <gili.patrick.r at gili-labs.com>
wrote:

> I don't feel good about this direction for the following reasons:
> 1) Complexity
> 2) Maturity? I don't know how Verbal Expressions has been implemented.
> Does it leverage mature regex open source? Or, has it been written from
> scratch?
> 3) Performance? Compiling a regex literal typically results in a FSM of a
> sort, optimized to parse strings. I wouldn't think that converting a regex
> literal to Verbal Expressions would yield great performance every time a
> match or substitution is done.
>
> -Patrick
>
> On Feb 2, 2016, at 5:55 PM, Howard Lovatt <howard.lovatt at gmail.com> wrote:
>
> The difference is that I am proposing supporting both verbal expressions
> and regex literals and that - literals are converted to verbals and the
> processing happens at the verbal level. The reason for this is that verbals
> are easy to handle programmatically whilst literals are great for quickly
> specifying a regex.
>
> On Tuesday, 2 February 2016, Patrick Gili <gili.patrick.r at gili-labs.com>
> wrote:
>
>> Hi Howard,
>>
>> I don't see how this is very different from the Swift Verbal Expressions.
>> It would suffer from the same disadvantages I have stated previously.
>>
>> Cheers,
>> -Patrick
>>
>> On Feb 2, 2016, at 1:51 AM, Howard Lovatt via swift-evolution <
>> swift-evolution at swift.org> wrote:
>>
>> Others have suggested a programatic regex instead of a regex literal, how
>> about doing both? Something like:
>>
>> enum RegexElement {
>>     case capture(name: String, value: String)
>>     case special(Special)
>>     // ...
>>     enum Special: String {
>>         case startOfLine = "^"
>>         // ...
>>         case endOfLine = "$"
>>     }
>> }
>>
>> // Define a regexLiteral syntax that the compiler understands that is of
>> type Regex and consists of String representations of RegexElements, e.g.
>> using forward slash:
>> //    /<RegexElements>*/
>>
>> struct Regex: CustomStringConvertible { // Compiled, immutable, thread
>> safe, and bridged to NSRegularExpression
>>     // ... internal compiled representation
>>     let elements: [RegexElement]
>>     var description: String {
>>         return RegexElement.Special.startOfLine.rawValue // Example.
>> Really returns all the elements converted back to a string
>>     }
>>     init(_ elements: RegexElement...) {
>>         self.elements = elements // Example. Really also compiles the
>> expression
>>     }
>>     // init(regexLiteral regex: Regex) {
>>     // init(concatAll regexes: Regex...) {
>>     // init(fromString string: String) {
>>     // ... more inits
>>     func map<T>(input: String, @noescape mapper: (element: RegexElement)
>> throws -> T) rethrows -> [T] {
>>         return [try mapper(element: RegexElement.special(.startOfLine))] //
>> Example. Really does the matching
>>     }
>>     // func flatMap<T>(input: String, @noescape mapper: (element:
>> RegexElement) throws -> T?) rethrows -> [T] {
>>     // func flatMap<S: SequenceType>(input: String, @noescape mapper:
>> (element: RegexElement) throws -> S) rethrows -> [S.Generator.Element] {
>>     // func forEach(input: String, @noescape eacher: (element:
>> RegexElement) throws -> Void) rethrows {
>>     // ... more funcs
>> }
>>
>> let regex = Regex(RegexElement.special(.startOfLine)) // Normally a
>> regex literal
>> let asStringArray = regex.map("Example") { element -> String in //
>> Returns `["^"]` in example
>>     switch element {
>>     case let .capture(_, v): return v
>>     case let .special(s): return s.rawValue
>>     }
>> }
>>
>>
>> The advantages are:
>>
>>    1.   We get a literal type for convenience.
>>    2.   We get a programatic type when we need to manipulate regexes.
>>    3.   Breaking the regex matches into the enum defined elements of the
>>    regex works well with Swift pattern matching.
>>
>> (Above is a very rough sketch!)
>>
>>
>> On 2 February 2016 at 16:44, Thorsten Seitz via swift-evolution <
>> swift-evolution at swift.org> wrote:
>>
>>> Something like Scala's extractors or F#'s Active Patterns would be most
>>> welcome to generalize pattern matching.
>>>
>>> http://docs.scala-lang.org/tutorials/tour/extractor-objects.html
>>> https://en.m.wikibooks.org/wiki/F_Sharp_Programming/Active_Patterns
>>>
>>> -Thorsten
>>>
>>> Am 01.02.2016 um 15:46 schrieb James Campbell via swift-evolution <
>>> swift-evolution at swift.org>:
>>>
>>> It would be great if we could create a generic way of making this
>>> swifty. You may let say want to implement a matching system for structure
>>> like JSON or XML (i.e XQuery).
>>>
>>>
>>>
>>> *___________________________________*
>>>
>>> *James⎥Lead Engineer*
>>>
>>> *james at supmenow.com⎥supmenow.com <http://supmenow.com/>*
>>>
>>> *Sup*
>>>
>>> *Runway East *
>>>
>>> *10 Finsbury Square*
>>>
>>> *London*
>>>
>>> * EC2A 1AF *
>>>
>>> On Mon, Feb 1, 2016 at 2:43 PM, Patrick Gili via swift-evolution <
>>> swift-evolution at swift.org> wrote:
>>>
>>>> Hi Dany,
>>>>
>>>> My response is inline below.
>>>>
>>>> Cheers,
>>>> -Patrick
>>>>
>>>> On Jan 31, 2016, at 8:56 PM, Dany St-Amant <dsa.mls at icloud.com> wrote:
>>>>
>>>>
>>>> Le 31 janv. 2016 à 16:46, Patrick Gili <gili.patrick.r at gili-labs.com>
>>>> a écrit :
>>>>
>>>> Hi Dany,
>>>>
>>>> Please find my response inline below.
>>>>
>>>> Cheers,
>>>> -Patrick
>>>>
>>>> On Jan 31, 2016, at 3:46 PM, Dany St-Amant via swift-evolution <
>>>> swift-evolution at swift.org> wrote:
>>>>
>>>> This seem to be two proposals in one:
>>>> 1. Initialize NSRegularExpression with a single String which includes
>>>> options
>>>>
>>>> The ultimate goal based on the earlier mail in the thread seems to be
>>>> able in a future proposal do thing like: string ~= replacePattern, if
>>>> string =~ pattern, decoupled from the legacy Obj-C. Isn’t
>>>> NSRegularExpression part of the legacy? The conversion of the literal
>>>> string as regular expression should probably part of the proposal for these
>>>> operators; as this is the time we will know how we want the text to be
>>>> interpreted.
>>>>
>>>>
>>>> I don't see any evidence of NSRegularExpression becoming part of any
>>>> legacy. Given SE-005, SE-006, and SE-023, the name is probably changing
>>>> from NSRegularExpression to RegularExpression. However, I don't think the
>>>> definition of the class will change, only the name.
>>>>
>>>> I would like to see an operator regular expression matching operator,
>>>> like Ruby and Perl. I was trying to keep the proposal a minimal increment
>>>> that would buy the biggest bang for the buck. We can already accomplish
>>>> much of what other languages can do with regard to regular expression.
>>>> However, the notion of a regular expression isn't something we can work
>>>> around with custom library today. Can you suggest something addition that
>>>> should be in the proposal?
>>>>
>>>>
>>>> Splitting proposal in smaller ones have its advantage, but here I am
>>>> just wondering if we are sure that these future operation will use the
>>>> NSRegularExpression/RegularExpression. And does the currently selected
>>>> syntax allow for future expansion, it would be bad to introduce something
>>>>  that need to be torn away or changed in an incompatible way, once we
>>>> really start to use them in their final location.
>>>>
>>>> The proposal is focused on the search, but seem to skip the
>>>> substitution; I am unable to see an option to replace all matches instead
>>>> of the first one only in the proposal. I, as many other, would expect
>>>> regular expression in a language to also support substitution.
>>>>
>>>> As for addition to the proposal, the processing of the string could be
>>>> support for any character (within some limit) for the slash delimiter. With
>>>> sed, when replacing  path component, one can do: echo $PWD | sed -e
>>>> "s:^/usr/local/bin:/opt/share/bin:g", instead of escaping every single
>>>> slashes. Which is really handy to make thing easier to read.
>>>>
>>>> Also, putting aside that I think \(scheme) should not be interpreted in
>>>> the example, with a syntax allowing such interpretation the variable should
>>>> be processed to generate proper escaping. If one is to use \(filename) you
>>>> get "main.c", but one must use \(filename.escaped()) to get the proper
>>>> "main\.c" to avoid matching "mainac". The String.escaped() must be in a
>>>> format compatible with the format used when converting the regular
>>>> expression into NSRegularExpression (not sure if the two syntax are the
>>>> same; I think that at least the handling of /  may differ)
>>>>
>>>>
>>>> I agree. Perhaps I went too far with keeping the proposal
>>>> short-and-sweet. Especially when you consider the rich syntax that Perl
>>>> supports for substitution.
>>>>
>>>>
>>>> 2. Easily create a String without escaping (\n is not linefeed, but \
>>>> and n)
>>>>
>>>> The ability to not interpret the backslash as escape can be useful in
>>>> other scenario that creating a NSRegularExpression; like creating a Windows
>>>> pathname, or creating regular expression which are then given to external
>>>> tool.  So this part of the proposal should probably be generalized.
>>>>
>>>>
>>>> Generalize it for what? If you're thinking along the line of raw
>>>> strings, I agree that we need this capability, as well as multi-line string
>>>> literals. However, I just soon we have separate proposals for this.
>>>>
>>>>
>>>> My point/opinion here, is that a regular expressions are just a String
>>>> which are then interpreted; the same way as "Good Morning", "Bonjour", or
>>>> "Marhaba" (even when using the arabic script) are just String when you
>>>> assign then to a variable in Swift, and then interpreted by the intended
>>>> user. They are not String, frenchString, rigthToLeftString. So I do not see
>>>> why a regular expression should have privileged treatment and have its own
>>>> language level syntax. The only difference when writing regular expression,
>>>> or Windows pathname, or any String with a syntax with heavily uses of
>>>> backslashes, is that one may want to disable the special meaning of the
>>>> backslashes, to make thing more readable.
>>>>
>>>> On the page of geeky-ing the String there’s four main part IMHO
>>>> - multi-line support
>>>> - no backslash escaping version (which should include no processing the
>>>> \(variable) format)
>>>> - inclusion of String delimiter inside the String
>>>> - concat of backslash/no backslash version. Bash example echo 'echo
>>>> "$BASH" shows '"$BASH"
>>>>
>>>> I’m still trying to find back the mail thread crumbs on these topics,
>>>> since before restarting the discussion in these topics, the previous one
>>>> should be properly summarized; unless such summary already exist.
>>>>
>>>>
>>>> I think supporting interpolation is important. Both Perl and Ruby
>>>> support it, and I'm sure there are other languages. One thing I forgot to
>>>> put into the proposal: an option to disable interpolation or limit it to
>>>> single pass.
>>>>
>>>> Looking ahead at the other responses, Chris Lattner has suggested that
>>>> the proposal would have more traction if we can find a way to fold this
>>>> into Swift's pattern matching. I can't say as I disagree, as this makes
>>>> regular expression more Swifty.
>>>>
>>>>
>>>> Regards,
>>>> Dany
>>>>
>>>> Dany
>>>>
>>>> Le 31 janv. 2016 à 12:18, Patrick Gili via swift-evolution <
>>>> swift-evolution at swift.org> a écrit :
>>>>
>>>> Here is the link to the proposal on GitHub:
>>>>
>>>>
>>>> https://github.com/gili-patrick-r/swift-evolution/blob/master/proposals/NNNN-regular-expression-literals.md
>>>>
>>>> Cheers,
>>>> -Patrick
>>>>
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>
>>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>>
>>
>> --
>>   -- Howard.
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>>
>>
>
> --
>   -- Howard.
>
>
>


-- 
  -- Howard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160203/70c0fcbd/attachment.html>


More information about the swift-evolution mailing list