[swift-evolution] String update

Michael Ilseman milseman at apple.com
Tue Jan 16 16:18:34 CST 2018


(Replying to both Eneko and George at once)

>>>> I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

>>> 


It is certainly worth thought; even if we don’t go down that path there’s lessons to pick up along the way. I believe “verbal expressions” is basically what you’re describing: https://github.com/VerbalExpressions/SwiftVerbalExpressions


> On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Thank you for the reply. The part I didn’t understand is if if giving names to the captured groups would be mandatory. Hopefully not.
> 
> Assuming we the user does not need names, the groups could be captures on an unlabeled tuple.
> 

I mention this through use of ‘_’.

A construct like (let _ = \d+) could produce an unlabeled tuple element.



Thinking about explicit capture names, etc., is all subject to change based on more investigation and playing around with examples. See my email exchange with John Holdsworth, where most names end up being redundant with destructuring at their only use site. That may have just been overly simplistic examples, but maybe not.


> Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):
> 
> let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ + .digits(4)
> 

What if you want to match a sequence of digits that are too large to fit in an Int? For example, the market cap of any stock in the S&P 500 would overflow Int on 32-bit platforms. Having the default represent a portion of the input (whether that be Substring or just a Range) is more faithful to the purposes of captures, which is matching parts of text. Explicitly specifying a type is syntax for passing the capture into an init that serves as both a capture-validator as well as a value constructor, which is really just yet another kind of Pattern. (This might be generalizable to use beyond regexes, but that’s a whole other digression.) This also aids discovery, as you know what type’s conformance to RegexSubmatchableiblewobble to check.

(Note that some way to get slices or ranges will always be important for things like case-insensitive matching: changing case can change the number of graphemes in a string).


> Personally, I like the `.optional` better than `.oneOrZero`:
> 
> let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ + .digits(4)
> 
> Would it be possible to support both condensed and extended syntax? 
> 
> let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) /
> 
> Maybe only extended (verbose) syntax would support named groups?
> 

“\d” is just syntax for a built-in character class named “digit”. There will be some way to use a character class, whether built-in or user-defined, in a regex.

For example, in Perl 6, you can say “\d” or “<digit>”, both of which are equivalent. Shortcuts for some built-in character classes are convenient and leverage the collective understanding of regexes amongst developers, and I don’t think they cause harm.

> Eneko
> 
> 
>> On Jan 16, 2018, at 10:01 AM, George Leontiev <georgeleontiev at gmail.com <mailto:georgeleontiev at gmail.com>> wrote:
>> 
>> @Eneko While it sure seems possible to specify the type, I think this would go against the salient point "If something’s worth capturing, it’s worth giving it a name.” Putting the name further away seems like a step backward.
>> 
>> 
>> I could imagine a slightly more succinct syntax where things like .numberFromDigits are replaced by protocol conformance of the bound type:
>> ```
>> extension Int: Regexable {
>>     func baseRegex<T>() -> Regex<T, Int>
>> }
>> let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
>>                     /let routing: Int/.exactDigits(3) + "-" +
>>                     /let local: Int/.exactDigits(4)
>> ```
>> 
>> In this model, the `//` syntax will only be used for initial binding and swifty transformations will build the final regex.
>> 
>> 
>>> On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> 
>>> Could it be possible to specify the regex type ahead avoiding having to specify the type of each captured group?
>>> 
>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
>>>   (\d{3}?) -
>>>   (\d{3}) -
>>>   (\d{4}) /
>>> 
>>> “Verbose” alternative:
>>> 
>>> let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = / 
>>>   .optional(.numberFromDigits(.exactly(3)) + "-“) +
>>>   .numberFromDigits(.exactly(3)) + "-"
>>>   .numberFromDigits(.exactly(4)) /
>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>
>>> 
>>> 
>>> Thanks,
>>> Eneko
>>> 
>>> 
>>>> On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>> 
>>>> Thanks, Michael. This is very interesting!
>>>> 
>>>> I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.
>>>> 
>>>> For instance, your example:
>>>> ```
>>>> let usPhoneNumber = /
>>>>   (let area: Int? <- \d{3}?) -
>>>>   (let routing: Int <- \d{3}) -
>>>>   (let local: Int <- \d{4}) /
>>>> ```
>>>> would become something like (strawman syntax):
>>>> ```
>>>> let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
>>>>                     /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
>>>>                     /let local: Int <- .numberFromDigits(.exactly(4))/
>>>> ```
>>>> With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:
>>>> ```
>>>> let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
>>>>                     /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
>>>>                     /let local: Int <- .numberFromDigits(.exactly(4))/
>>>> ```
>>>> Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.

That is a good catch and illustrates some of the trappings of regexes and the need for pick the right syntax. BTW, when you say optional, does it mean the match didn’t happen or the capture-validation didn’t succeed? In this example, it seems like the inclusive-or of both.

>>>> Other directives may be:
>>>> ```
>>>> let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
>>>>               .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/
>>>> ```
>>>> 
>>>> In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.
>>>> 
>>>> 
>>>> I also think it would be helpful if `Regex` was generic over all sequence types.
>>>> Going back to the phone example, this would looks something like:
>>>> ```
>>>> let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
>>>>                     /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
>>>>                     /let local: Int <- .numberFromDigits(.exactly(4))/
>>>> print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>
>>>> ```
>>>> Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!
>>>> 

I think I missed something. What does the `UnicodeScalar` type parameter do?

>>>> - George
>>>> 
>>>>> On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>>>> 
>>>>> Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html <https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html>, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f <https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f>. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:
>>>>> 
>>>>> ### Recap: Potential Additions for Swift 5
>>>>> 
>>>>> * Some form of unmanaged or unsafe Strings, and corresponding APIs
>>>>> * Exposing performance flags, and some way to request a scan to populate them
>>>>> * API gaps
>>>>> * Character and UnicodeScalar properties, such as isNewline
>>>>> * Generalizing, and optimizing, String interpolation
>>>>> * Regex literals, Regex type, and generalized pattern match destructuring
>>>>> * Substitution APIs, in conjunction with Regexes.
>>>>> 
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>> 
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> 
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20180116/80b4e622/attachment.html>


More information about the swift-evolution mailing list