[swift-evolution] Empower String type with regular expression

Howard Lovatt howard.lovatt at gmail.com
Tue Feb 2 00:51:23 CST 2016


Others have suggested a programatic regex instead of a regex literal, how
about doing both? Something like:

enum RegexElement {

    case capture(name: String, value: String)

    case special(Special)

    // ...

    enum Special: String {

        case startOfLine = "^"

        // ...

        case endOfLine = "$"

    }

}


// Define a regexLiteral syntax that the compiler understands that is of
type Regex and consists of String representations of RegexElements, e.g.
using forward slash:

//    /<RegexElements>*/


struct Regex: CustomStringConvertible { // Compiled, immutable, thread
safe, and bridged to NSRegularExpression

    // ... internal compiled representation

    let elements: [RegexElement]

    var description: String {

        return RegexElement.Special.startOfLine.rawValue // Example. Really
returns all the elements converted back to a string

    }

    init(_ elements: RegexElement...) {

        self.elements = elements // Example. Really also compiles the
expression

    }

    // init(regexLiteral regex: Regex) {

    // init(concatAll regexes: Regex...) {

    // init(fromString string: String) {

    // ... more inits

    func map<T>(input: String, @noescape mapper: (element: RegexElement)
throws -> T) rethrows -> [T] {

        return [try mapper(element: RegexElement.special(.startOfLine))] //
Example. Really does the matching

    }

    // func flatMap<T>(input: String, @noescape mapper: (element:
RegexElement) throws -> T?) rethrows -> [T] {

    // func flatMap<S: SequenceType>(input: String, @noescape mapper:
(element: RegexElement) throws -> S) rethrows -> [S.Generator.Element] {

    // func forEach(input: String, @noescape eacher: (element:
RegexElement) throws -> Void) rethrows {

    // ... more funcs

}


let regex = Regex(RegexElement.special(.startOfLine)) // Normally a regex
literal

let asStringArray = regex.map("Example") { element -> String in // Returns
`["^"]` in example

    switch element {

    case let .capture(_, v): return v

    case let .special(s): return s.rawValue

    }

}


The advantages are:

   1.   We get a literal type for convenience.
   2.   We get a programatic type when we need to manipulate regexes.
   3.   Breaking the regex matches into the enum defined elements of the
   regex works well with Swift pattern matching.

(Above is a very rough sketch!)


On 2 February 2016 at 16:44, Thorsten Seitz via swift-evolution <
swift-evolution at swift.org> wrote:

> Something like Scala's extractors or F#'s Active Patterns would be most
> welcome to generalize pattern matching.
>
> http://docs.scala-lang.org/tutorials/tour/extractor-objects.html
> https://en.m.wikibooks.org/wiki/F_Sharp_Programming/Active_Patterns
>
> -Thorsten
>
> Am 01.02.2016 um 15:46 schrieb James Campbell via swift-evolution <
> swift-evolution at swift.org>:
>
> It would be great if we could create a generic way of making this swifty.
> You may let say want to implement a matching system for structure like JSON
> or XML (i.e XQuery).
>
>
>
> *___________________________________*
>
> *James⎥Lead Engineer*
>
> *james at supmenow.com <james at supmenow.com>⎥supmenow.com
> <http://supmenow.com>*
>
> *Sup*
>
> *Runway East *
>
> *10 Finsbury Square*
>
> *London*
>
> * EC2A 1AF *
>
> On Mon, Feb 1, 2016 at 2:43 PM, Patrick Gili via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>> Hi Dany,
>>
>> My response is inline below.
>>
>> Cheers,
>> -Patrick
>>
>> On Jan 31, 2016, at 8:56 PM, Dany St-Amant <dsa.mls at icloud.com> wrote:
>>
>>
>> Le 31 janv. 2016 à 16:46, Patrick Gili <gili.patrick.r at gili-labs.com> a
>> écrit :
>>
>> Hi Dany,
>>
>> Please find my response inline below.
>>
>> Cheers,
>> -Patrick
>>
>> On Jan 31, 2016, at 3:46 PM, Dany St-Amant via swift-evolution <
>> swift-evolution at swift.org> wrote:
>>
>> This seem to be two proposals in one:
>> 1. Initialize NSRegularExpression with a single String which includes
>> options
>>
>> The ultimate goal based on the earlier mail in the thread seems to be
>> able in a future proposal do thing like: string ~= replacePattern, if
>> string =~ pattern, decoupled from the legacy Obj-C. Isn’t
>> NSRegularExpression part of the legacy? The conversion of the literal
>> string as regular expression should probably part of the proposal for these
>> operators; as this is the time we will know how we want the text to be
>> interpreted.
>>
>>
>> I don't see any evidence of NSRegularExpression becoming part of any
>> legacy. Given SE-005, SE-006, and SE-023, the name is probably changing
>> from NSRegularExpression to RegularExpression. However, I don't think the
>> definition of the class will change, only the name.
>>
>> I would like to see an operator regular expression matching operator,
>> like Ruby and Perl. I was trying to keep the proposal a minimal increment
>> that would buy the biggest bang for the buck. We can already accomplish
>> much of what other languages can do with regard to regular expression.
>> However, the notion of a regular expression isn't something we can work
>> around with custom library today. Can you suggest something addition that
>> should be in the proposal?
>>
>>
>> Splitting proposal in smaller ones have its advantage, but here I am just
>> wondering if we are sure that these future operation will use the
>> NSRegularExpression/RegularExpression. And does the currently selected
>> syntax allow for future expansion, it would be bad to introduce something
>>  that need to be torn away or changed in an incompatible way, once we
>> really start to use them in their final location.
>>
>> The proposal is focused on the search, but seem to skip the substitution;
>> I am unable to see an option to replace all matches instead of the first
>> one only in the proposal. I, as many other, would expect regular expression
>> in a language to also support substitution.
>>
>> As for addition to the proposal, the processing of the string could be
>> support for any character (within some limit) for the slash delimiter. With
>> sed, when replacing  path component, one can do: echo $PWD | sed -e
>> "s:^/usr/local/bin:/opt/share/bin:g", instead of escaping every single
>> slashes. Which is really handy to make thing easier to read.
>>
>> Also, putting aside that I think \(scheme) should not be interpreted in
>> the example, with a syntax allowing such interpretation the variable should
>> be processed to generate proper escaping. If one is to use \(filename) you
>> get "main.c", but one must use \(filename.escaped()) to get the proper
>> "main\.c" to avoid matching "mainac". The String.escaped() must be in a
>> format compatible with the format used when converting the regular
>> expression into NSRegularExpression (not sure if the two syntax are the
>> same; I think that at least the handling of /  may differ)
>>
>>
>> I agree. Perhaps I went too far with keeping the proposal
>> short-and-sweet. Especially when you consider the rich syntax that Perl
>> supports for substitution.
>>
>>
>> 2. Easily create a String without escaping (\n is not linefeed, but \ and
>> n)
>>
>> The ability to not interpret the backslash as escape can be useful in
>> other scenario that creating a NSRegularExpression; like creating a Windows
>> pathname, or creating regular expression which are then given to external
>> tool.  So this part of the proposal should probably be generalized.
>>
>>
>> Generalize it for what? If you're thinking along the line of raw strings,
>> I agree that we need this capability, as well as multi-line string
>> literals. However, I just soon we have separate proposals for this.
>>
>>
>> My point/opinion here, is that a regular expressions are just a String
>> which are then interpreted; the same way as "Good Morning", "Bonjour", or
>> "Marhaba" (even when using the arabic script) are just String when you
>> assign then to a variable in Swift, and then interpreted by the intended
>> user. They are not String, frenchString, rigthToLeftString. So I do not see
>> why a regular expression should have privileged treatment and have its own
>> language level syntax. The only difference when writing regular expression,
>> or Windows pathname, or any String with a syntax with heavily uses of
>> backslashes, is that one may want to disable the special meaning of the
>> backslashes, to make thing more readable.
>>
>> On the page of geeky-ing the String there’s four main part IMHO
>> - multi-line support
>> - no backslash escaping version (which should include no processing the
>> \(variable) format)
>> - inclusion of String delimiter inside the String
>> - concat of backslash/no backslash version. Bash example echo 'echo
>> "$BASH" shows '"$BASH"
>>
>> I’m still trying to find back the mail thread crumbs on these topics,
>> since before restarting the discussion in these topics, the previous one
>> should be properly summarized; unless such summary already exist.
>>
>>
>> I think supporting interpolation is important. Both Perl and Ruby support
>> it, and I'm sure there are other languages. One thing I forgot to put into
>> the proposal: an option to disable interpolation or limit it to single pass.
>>
>> Looking ahead at the other responses, Chris Lattner has suggested that
>> the proposal would have more traction if we can find a way to fold this
>> into Swift's pattern matching. I can't say as I disagree, as this makes
>> regular expression more Swifty.
>>
>>
>> Regards,
>> Dany
>>
>> Dany
>>
>> Le 31 janv. 2016 à 12:18, Patrick Gili via swift-evolution <
>> swift-evolution at swift.org> a écrit :
>>
>> Here is the link to the proposal on GitHub:
>>
>>
>> https://github.com/gili-patrick-r/swift-evolution/blob/master/proposals/NNNN-regular-expression-literals.md
>>
>> Cheers,
>> -Patrick
>>
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>>
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>


-- 
  -- Howard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160202/a6aa6f1e/attachment.html>


More information about the swift-evolution mailing list