[swift-evolution] [swift-evolution-announce] [Returned for revision] SE-0089: Renaming String.init<T>(_: T)

Sun Jun 5 04:20:54 CDT 2016

> On 5 Jun 2016, at 4:31 PM, Brent Royal-Gordon <brent at architechies.com> wrote:
> 
> Sorry, I meant to reply to this but forgot.

No worries Brent! Thanks for the thoughtful reply.

> 
>> For developers, like ourselves, it seems straight-forward that a string is this simple primitive. We get them in, we process them, and we spit them back out. However, this is a flawed system, as it one that is made easiest for the programmer, and is really designed for a context where the user is also a programmer. It best suits technical scenarios such as configuration files, environment variables, and command line arguments, as Brent suggests. However, I don’t think this is the best case to design for.
> 
> It is *a* case we need to design for. It is also the *base* case: localization is layered on top of non-localized constructs.
> 
>>> So, here's my version of your table:
>>> 
>>> User-readable, nonlocalized: CustomStringConvertible
>>> User- and machine-readable, nonlocalized: LosslessStringConvertible
>>> User-readable, localized: (nothing)
>>> Developer-readable: CustomDebugStringConvertible
>>> 
>>> (Playground isn't necessarily working with strings, so it doesn't belong in this list.)
>> 
>> The first item in your table ‘User-readable, non-localised’, is the big problem area to me. Ideally in my mind all of these should be moved to other areas, such as the second area that LosslessStringConvertible occupies, which command line arguments and configuration keys certainly could. And user-readable should use a system that always allows localisation to be added progressively, by use of type extensions or protocols.
>> 
>> In a UI application, everything that is displayed should be using a system which allows localisation.
> 
> In theory, yes. In practice? We write code that will be used only once (like an ad-hoc fix for some problem). We write code that will never be exposed to users (like a background process). We write code that is kept in one limited environment (like a company internal app). We write code that simply isn't going to be localized for business reasons (like an app that wouldn't be profitable to translate).
> 
> We write code that constructs strings without caring what their contents are (think of a Markdown converter). We write code that emits strings which are primarily for machines, but formatted to be convenient for humans—particularly human programmers working with the formats—to understand (think of a reporting tool that emits CSV with column headings in English). We write code where we know everyone will understand a certain language (air traffic control is conducted entirely in English worldwide). We write code that's too low-level to be localized. We write unit tests (hopefully).
> 
> And we write code when we're just learning how to program, and printing the result of 1 + 2 in French is the last thing on our minds.
> 
> So yes, in a meticulously-engineered ideal application, you would have little call for "user-readable, nonlocalized". But that's not what people write a lot of the time.

Strings are this very flexible type. Currently the only validations I know of that the String type does are conformance to the various Unicode encodings.

I think it’s similar to pointer safety. A pointer in C can point to anything. The programmer might be sure its valid, but the computer isn’t until it dereferences it. It could be null (crash) or it could be pointing to an already deallocated object or something else entirely (worse than a crash). Swift tries to save us from making these mistakes. Similarly, the programmer could be 100% percent sure her cast will succeed, that this object is of a certain class or conforms to a certain protocol. Swift makes these casts safe, by either crashing immediately or letting the programmer decide what to do with nil, or avoids them by use of generics.

A String in Swift can mean anything. It could be empty, it could be 7000 characters long, it could be formatted incorrectly or contain illegal characters. `String` says as much to me as `id` does in Objective-C. It’s up to me to decide what the meaning is and whether it’s valid yet or not.

I wonder if most data-facing strings could use a string-represented enum or struct instead?

e.g.

enum GitCommand : String {
  case clone = "clone"
  case init = "init"
  case add = "add"
  case mv = "mv"
  ...
}

which can be conveniently shortened to:

enum GitCommand : String {
  case clone, init, add, mv, ...
}

A initialized GitCommand value can only be valid, which leads to clearer and safer code.

Loose string identifiers such as CSV column headings could use a struct that conforms to RawRepresentable / LosslessStringConvertible. The failable initializer could trim whitespace and validate, and generally conform it into an ideal form. There’s no flags for `isValidated` or assumptions that you bring by using a naked String — if you have an CSVHeading value in hand, you know that it is valid:

struct CSVHeading : RawRepresentable {
  typealias RawValue = String

  var rawValue: String

  init?(rawValue: String) {
    let trimmed = rawValue.stringByTrimmingCharactersInSet(.whitespaceCharacterSet)

    guard trimmed.rangeOfCharacterFromSet(.illegalCharacterSet) == nil else {
      return nil
    }

    // More validations here as per https://tools.ietf.org/html/rfc4180

    self.rawValue = trimmed
  }
}

(This raises a point — what’s the difference between the proposed LosslessStringConvertible and RawRepresentable where RawValue = String? They both have a failable init. Is it due to current limitations with typealiases that makes this hard?)

Swift makes this so easy compared to Objective-C, where you would have worried about the overhead of allocating about a wrapper object. In Swift, as I understand it, a struct with a single String member should be of similar weight in memory and performance to using that String by itself. A whole bunch of them in a typed Collection would take up the same amount of memory?

Note the String type would be still used for a situations such as parsing and formatting. But I don’t think they need to be used for everything where something better constructed can be used. And Formatting could have a whole range of interesting designs too.

> 
> To be clear: If there is a *low-cost* way to make sure that UI text is localizable by default, I'm all for it. (And I even have an idea or two in that area.) But I don't think bringing localization into the standard library is how you make it low-cost. Remember, Foundation can always add localization to any standard library type it wants through extensions.

I would love to find a low-cost way because I think Swift opens many opportunities to enable it, has an amazing team of library designers in the Swift standard library and from the Cocoa frameworks, and we have a chance here with a blank canvas to raise the bar like with Swift’s Unicode support. Glad to hear you have some ideas — look forward to hearing them!

> 
>> I would argue a command line tool is also a UI application.
> 
> 
> Sure, but see the above. (Plus, command line tools *do* have a stronger legitimate need for non-localized stuff—think of things like command-line switches and environment variables, communicating over filehandles and pipes, "text" that's actually UI like twirlers and progress bars, etc.)
> 
>>> Localization is an obvious hole in our string conversions, but I think the reality here is that localization is part of a higher layer than the standard library. From what I can see, all of the "standard library" APIs which handle localization are actually part of Foundation. I'm sure that, if we build any localization-related features into the language, we'll add basic supporting code to the standard library if needed, but other than that, I don't think the standard library is the right place.
>> 
>> I believe best practices can be put in place with a system no more complicated for the programmer than the one we have now. This could be possible with protocols: a core protocols in the standard library that are then fleshed out in a Foundation-level framework above, with Locale / CultureCode / etc types extending or conforming.
> 
> I'm not sure what the purpose would be of having a protocol in the standard library which didn't offer even a lick of the promised functionality without a higher-level framework. What do we gain by having `localizedDescription` in the standard library if nothing written against only the standard library can actually emit a localized description?

I had tried to design something as I was writing the email. I wasn’t thinking a `localizedDescription` method (which would rely on global state, an issue with Foundation’s current design), but a context that is used generically or as a type to customise string conversion. Here’s one design idea, but I’m sure there are many others possible:

enum Fruit : String {
  case raspberry, guava, passionFruit
}

extension Fruit : StringDisplayable {
  func toDisplayString(context: Swift.PrintDisplay) -> String { // Extension with `Self : RawRepresentable where RawValue = String` could add this by default one day.
    return rawValue
  }

  func toDisplayString(context: Foundation.CultureCode.EnglishUS) -> String {
    switch self {
    case raspberry: return "Raspberry"
    case guava: return "Guava"
    case passionFruit: return "Passion Fruit"
    }
  }
}

> 
>>>> I’m not sure if anyone else shares the concern, so I’ll leave it. I do believe it’s important however.
>>> 
>>> I do think this is an important concern, and I also think it's important to ask how interpolation interacts with it. For instance, I think it would be very useful to be able to say "interpolate developer representations" or "interpolate user representations" or "interpolate localized user representations", and have the compiler reject interpolated expressions which don't have the necessary representation.
>> 
>> I like this idea. I think “interpolate localised user representations” should not be distinct from “interpolate user representations”. Instead non-localised is specifically denoted as ‘technical’ or perhaps ‘en-US’. Locales, or more broadly ‘contexts’, are not something additional, instead, everything already has a context, and the context of a string could be made more explicit.
> 
> I mean, you can call it "non-localized" or you can call it "technical", but a rose by any other name smells just as sweet.

Non-localised can mean ‘my language’, ‘US english’, ‘we haven’t localised this yet but might in the future’, or ‘a domain-specific key word or phrase’. Technical means just ‘a domain-specific key word or phrase’, and could have the additional properties of losslessness or robustness (conservative in what you send, liberal in what you accept).

> 
> -- 
> Brent Royal-Gordon
> Architechies
>