[swift-evolution] URL Literals

David Sweeris davesweeris at mac.com
Mon Dec 19 12:53:00 CST 2016

> On Dec 19, 2016, at 1:26 AM, Xiaodi Wu <xiaodi.wu at gmail.com> wrote:
> URLs are unlikely to be something that can be validated by regex. See, for instance, this discussion: <https://webkit.org/blog/7086/url-parsing-in-webkit/ <https://webkit.org/blog/7086/url-parsing-in-webkit/>>. The full spec is here: <https://url.spec.whatwg.org <https://url.spec.whatwg.org/>>. If Swift were to implement parsing of URLs at the level of the compiler or core library, I'd expect it to be the full spec, as we do with Unicode.
> On Mon, Dec 19, 2016 at 2:26 AM, Benjamin Spratling via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
> Howdy,
> 	Yes, I was also intrigued by the “Regex” validation mentioned in another post.  It could offer a convenient way to get some literals support in without the headaches associated with the constexpr C++ approach.
> 	I’m curious, though, how many types can we image in can be validated by this method?  If it really is just URL’s, then I’d actually lean towards making this a compiler magic feature.
> 	Someone else mentioned fetching the URL’s for a preview.  Given that we might be coding “deletes” in URL’s (yes, I recently met a backend developer who coded a delete as a GET), I really highly suggest we not ping people’s API’s artificially.  At least we shouldn’t for non-file-scheme URLs.  IMHO, verifying that a service is active isn’t really the Swift compiler’s job.  It might happen as part of coordinated run-time tests, which sometimes have to be balanced to keep test data correct, something the IDE wouldn’t know how to enforce correctly.
> -Ben
>> On Dec 19, 2016, at 1:41 AM, David Sweeris via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> On Dec 17, 2016, at 1:12 PM, Micah Hainline via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> I'd love a fleshed out elegant example for URL that shows what a complete implementation of that special init method would look like. 
>> Sorry this took so long… the weekend kinda got away from me.
>> Anyway, I was thinking something like this (which has been very simplified on account of my regexing being sub-sketchy, and me not knowing exactly what’s valid in an URL anyway):
>> #literalpatterns += (name: “URLLiteralType”, components: (name: url, type: StringLiteralType, pattern: “(http|https)://(www.)?[a-z|A-Z|0-9]+.(com|org|net)(/[a-z|A-Z|0-9]+)*(/[a-z|A-Z|0-9]+.[a-z|A-Z|0-9]+)?”), protocol: ExpressibleByURLLiteral)
>> This would let the compiler know pretty much everything it needs to know… that the “new” type is called “URLLiteralType", that it starts out life as young StringLiteralType with a bright future in the computer industry, that in order to succeed it has to match a given pattern, and what protocol a type has to conform to in order to use an URLLiteral. In practice, the compiler would synthesize a struct containing the specified members and validate the literal with the specified pattern before making an “instance” of it (since we’re talking about literals and compile-time code here, I’m pretty sure that “instance" the wrong terminology… pardon my ignorance)
>> struct URLLiteralType: {
>>     let url: StringLiteralType
>> }
>> A tuple would be better, IMHO, but according to the playground, single-element tuples can’t have element labels. As for the implementation of the init function:
>> init(urlLiteral value: URLLiteralType) {
>>     let urlString = value.url
>>     //Do whatever URL is doing now, except there’s no need to check for errors since the compiler pre-validated it for us
>> }
>> If it’d be more useful, the pattern could be split into multiple pieces:
>> #literalpatterns += (name: “URLLiteralType”,
>>                      components: ((name: “`protocol`", type: StringLiteralType, pattern: “(http|https)”),
>>                                   (name: _,            type: StringLiteralType, pattern: “://”),
>>                                   (name: “domain",     type: StringLiteralType, pattern: “(www.)?[a-z|A-Z|0-9]+.(com|org|net)”),
>>                                   (name: “path”,       type: StringLiteralType, pattern: "(/[a-z|A-Z|0-9]+)*(/[a-z|A-Z|0-9]+.[a-z|A-Z|0-9]+)?”))
>>                      protocol: ExpressibleByURLLiteral)
>> This would result in URLLiteralType looking like this:
>> struct URLLiteralType: {
>>     let `protocol`: StringLiteralType
>>     let domain: StringLiteralType
>>     let path: StringLiteralType
>> }
>> And in the init would start out like this:
>> init(urlLiteral value: URLLiteralType) {
>>     let protocolType = value.protocol
>>     let domain = value.domain
>>     let path = value.path
>>     //Do whatever with the components
>> }
>> The “base” types of literals like Int or String that don’t refine pre-existing literal types would still need a bit of compiler magic (or at least a different mechanism for becoming actual types), but as long as a type doesn’t take advantage of reference semantics in its stored properties or something, I *think* pretty much any data type could become “literalizeable” with something like this. Oh, and there’s nothing particularly magical about regular expressions as far as this idea is concerned; they’re just usually the first thing that comes to mind when I think of pattern matching in a string. 
>> I know this looks like a lot of code, but the scary-looking parts with the regex stuff only has to be written once for each “type” of literal… types that want to be expressible by such a literal just have to write an init function.

It doesn’t have to be regex per se… instead of
#literalpatterns += (name: “URLLiteralType”, components: (name: url, type: StringLiteralType, pattern: “(http|https)://(www.)?[a-z|A-Z|0-9]+.(com|org|net)(/[a-z|A-Z|0-9]+)*(/[a-z|A-Z|0-9]+.[a-z|A-Z|0-9]+)?”), protocol: ExpressibleByURLLiteral)
I probably should’ve written something more like:
#literalpatterns += (name: “URLLiteralType”, components: (name: url, type: StringLiteralType, matching: Regex(“(http|https)://(www.)?[a-z|A-Z|0-9]+.(com|org|net)(/[a-z|A-Z|0-9]+)*(/[a-z|A-Z|0-9]+.[a-z|A-Z|0-9]+)?”)), protocol: ExpressibleByURLLiteral)
where the `matching` argument can be anything that can (“@purely”-ly) use some specified mechanism (I’d vote for the ~= operator) with a literal to test whether it matches. Also, there is no existing `Regex` struct/class/mechanism in Swift, unless you count `NSRegularExpression`. I didn’t want to use that for a couple reasons… 1) I don’t think it’s part of the stdlib, and 2) it doesn’t have a non-failable init that just takes a string, so using it unmodified would kinda put us in a “it’s turtles all the way down” kind of situation. What I’d started doing was to look for the existing mechanism for specifying literals in the compiler so I could use the existing name for it (somehow I doubt there’s a actually an array of patterns called “literalpatterns" in the compiler) and copy the existing methods for specifying a valid literal. After being unsuccessful for some amount of time, I decided I was getting too tired and made up what I sent last night.

The more I think about it, the more I’m starting to be of the opinion that we really ought to have two mechanisms here… One for specifying what constitutes a “base” literal (like `43`, `[“foo”, “bar”]`, or `true`), and one for types that merely need to perform some sort of validation on existing “base” literals. The first mechanism probably should be fairly arcane and involved, because you’d essentially be able to create new syntaxes, which should be kinda scary and hard to understand because it’s most certainly not an area beginners should be in. The second mechanism — something like that `ExpressibleByValidatedStringLiteral` idea — isn’t nearly as complicated. In the case of URLs, I’d vote the second approach. We only really need two extra features to implement it (“@constexpr” and the compiler being able to use the REPL to evaluate @costexpr statements), and both of them have more uses other than just getting a few more compile-time checks or allowing for more inits to be non-failable. With both of those in place, getting an url “literal” becomes just this:
protocol ExpressibleByValidatedStringLiteral {
    init?(stringLiteral value: StringLiteralType)
struct URL : ExpressibleByValidatedStringLiteral {
    init?(stringLiteral value: StringLiteralType) {
        //Perform validation here; return nil if it fails
var lru: URL = "foo" //Compiler throws this to the init? function, it returns nil, the compiler raises a syntax error
var url: URL = "http://www.some.valid.url.com" //Compiler throws this to the init? function, it returns an optional URL, the compiler unwraps it and does the assignment

I still very much want a way to define custom literals (precisely because it’d let me make new syntaxes), but I’m starting to think that something like the second, disappointingly easy idea, is probably the way to go in this case.

- Dave Sweeris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20161219/5cea5d96/attachment.html>

More information about the swift-evolution mailing list