[swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Fri Mar 17 16:25:46 CDT 2017

> On Mar 17, 2017, at 2:42 PM, Itai Ferber <iferber at apple.com> wrote:
> 
> On 16 Mar 2017, at 14:29, Matthew Johnson wrote:
> 
> 
> This is a fantastic proposal! I am very much looking forward to robust Swift-native encoding and decoding in Foundation. The compiler synthesized conformances is especially great! I want to thank everyone who worked on it. It is clear that a lot of work went into the proposal.
> 
> The proposal covers a lot of ground so I’m breaking my comments up by topic in the order the occur in the proposal.
> 
> Thanks for the feedback, Matthew! Responses inline.
> 
> 

And thank you for the responses!

> 
> Encode / Decode only types:
> 
> Brent raised the question of decode only types. Encode only types are also not uncommon when an API accepts an argument payload that gets serialized into the body of a request. The compiler synthesis feature in the proposal makes providing both encoding and decoding easy in common cases but this won’t always work as needed.
> 
> The obvious alternative is to have Decodable and Encodable protocols which Codable refines. This would allow us to omit a conformance we don’t need when it can’t be synthesized.
> 
> If conformances are still synthesized individually (i.e. for just Decodable or just Encodable), it would be way too easy to accidentally conform to one or the other and not realize that you’re not conforming to Codable, since the synthesis is invisible. You’d just be missing half of the protocol.
> 
This is the kind of mistake people don’t tend to make often and Swift’s typing will alert someone pretty quickly if they make this mistake.  A fixit could even be offered if the type is in the same module as it is used incorrectly.  I really don’t think it’s that big a deal to expect people to understand the differences.  They already need to understand encoders and decoders to make use of these protocols and this is just the other side of that distinction.
> If the way out of this is to only synthesize conformance to Codable, then it’s much harder to justify the inclusion of Encodable or Decodable since those would require a manual implementation and would much more rarely be used.
> 
I wouldn’t limit synthesis in that way.

This isn’t that big a deal given that synthesis will do the work for us most of the time but I think it’s unfortunate to see these coupled.  There will be times when we have to choose between fatalError and maintaining code we don’t need.  That’s a bad choice to have to make.  I don’t like designs that impose it on me.

> 
> Your reply to Brent mentions using `fatalError` to avoid implementing the direction that isn't needed. I think it would be better if the conformance can reflect what is actually supported by the type. Requiring us to write `fatalError` as a stub for functionality we don’t need is a design problem IMO. I don’t think the extra protocols are really that big a burden. They don’t add any new functionality and are very easy to understand, especially considering the symmetry they would have with the other types you are introducing.
> 
> Coding Keys:
> 
> As others have mentioned, the design of this protocol does not require a value of a conforming type to actually be a valid key (it can return nil for both `intValue` and `stringValue`). This seems problematic to me.
> 
> In the reply to Brent again you mention throwing and `preconditionFailure` as a way to handle incompatible keys. This also seems problematic to me and feels like a design problem. If we really need to support more than one underlying key type and some encoders will reject some key types this information should be captured in the type system. An encoder would only vend a keyed container for keys it actually supports. Ideally the conformance of a type’s CodingKeys could be leveraged to produce a compiler error if an attempt was made to encode this type into an encoder that can’t support its keys. In general, the idea is to produce static errors as close to the origin of the programming mistake as possible.
> 
> I would very much prefer that we don’t defer to runtime assertions or thrown errors, etc for conditions that could be caught statically at compile time given the right design. Other comments have indicated that static guarantees are important to the design (encoders *must* guarantee support of primitives specified by the protocols, etc). Why is a static guarantee of compatible coding keys considered less important?
> 
> I agree that it would be nice to support this in a static way, but while not impossible to represent in the type system, it absolutely explodes the API into a ton of different types and protocols which are not dissimilar. We’ve considered this in the past (see approach #4 in the Alternatives Considered <https://github.com/itaiferber/swift-evolution/blob/637532e2abcbdb9861e424359bb6dac99dc6b638/proposals/XXXX-swift-archival-serialization.md#alternatives-considered> section) and moved away from it for a reason.
> 
> To summarize:
> 
> To statically represent the difference between an encoder which supports string keys and one which supports integer keys, we would have to introduce two different protocol types (say, StringKeyEncoder and IntKeyEncoder)
> Now that there are two different encoder types, the Codable protocol also needs to be split up into two — one version which encodes using a StringKeyEncoderand one version which encodes using an IntKeyEncoder. If you want to support encoding to an encoder which supports both types of keys, we’d need a thirdCodable protocol which takes something that’s StringKeyEncoder & IntKeyEncoder (because you cannot just conform to both StringCodable and IntCodable — it’s ambiguous when given something that’s StringKeyEncoder & IntKeyEncoder)
> On encoders which support both string and integer keys, you need overloads for encode<T : StringCodable>(…), encode<T : IntCodable>(…), and encode<T : StringCodable & IntCodable>(…) or else the call is ambiguous
> Repeat for both encode<T : …>(_ t: T?, forKey: String) and encode<T : …>(_ t: T?, forKey: Int)
> Repeat for decoders, with all of their overloads as well
> This is not to mention the issue of things which are single-value encodable, which adds additional complexity. Overall, the complexity of this makes it unapproachable and confusing as API, and is a hassle for both consumers and for Encoder/Decoder writers.
> 
> We are willing to make the runtime failure tradeoff for keeping the rest of the API consumable with the understanding that we expect that the vast majority of CodingKeyconformances will be automatically generated, and that type authors will generally provide key types which are appropriate for the formats they expect to encode their own types in.
> 
What if we go in a different direction here?  Instead of distinguishing both, why not at least require all keys to support strings?  This is the common use case.  As you have already noted elsewhere, people using Int keys are already kind of in “expert” territory.  I would feel a lot better if at least the common use case was statically safe.  Is it really that important to support Int-only keys?  A type that is primarily intended to be used with Int keys could just stringing it’s Int values to provide strings.  You could even a new protocol providing default implementations for coding key types which want to be implemented in terms of Int:

public protocol CodingKey {
    var stringValue: String { get }
    init?(stringValue: String)
    var intValue: Int? { get }
    init?(intValue: Int)
}
protocol IntCodingKey: CodingKey {
    var guaranteedIntValue: Int { get }
}
extension IntCodingKey {
    var intValue: Int? { return guaranteedIntValue }
    var stringValue: String { return "\(guaranteedIntValue)" }
    init?(stringValue: String) {
        if let int = Int(stringValue) {
            self.init(intValue: int)
        } else {
            return nil
        }
    }
}

IntCodingKey would not need to appear anywhere in the APIs, it could exist solely to provide the default implementations of the string members.  Libraries that wish to discover that *all* instances of this CodingKey type could cast to IntCodingKey to determine that.

> 
> Keyed Containers:
> 
> Joe posted raised the topic of the alternative of using manual type erasure for the keyed containers rather than abstract classes. Did you explore this direction at all? It feels like a more natural approach for Swift and as Joe noted, it can be designed in such a way that eases migration to existentials in the future if that is the “ideal” design (which you mentioned in your response).
> 
> Joe mentions the type-erased types as an example of where we’ve had to use those because we’re lacking other features — I don’t see how type erasure would be the solution. We’re doing the opposite of type-erasure: we’re trying to offer an abstract type that is generic and specified on a type you give it. The base KeyedEncodingContainer is effectively a type-erased base type, but it’s the generics that we really need.
> 
You’re trying to erase some type information while preserving the key type.  There are several ways to accomplish this.  I don’t understand the specific problem you’re running into in writing a non-class type that does this.  If there is a technical limitation around this I am interested in learning what it is.  But maybe it’s possible to use a different approach to these types.  I would appreciate it if you can elaborate specific technical details behind the belief that a struct like AnyCollection / AnySequence is not viable.

> 
> Decoding Containers:
> 
> returns: A value of the requested type, if present for the given key and convertible to the requested type.
> 
> Can you elaborate on the details of “convertible to the requested type” means? It think this is an important detail for the proposal.
> 
> For example, I would expect numeric values to attempt conversion using the SE-0080 failable numeric conversion initializers (decoding JSON was a primary motivating use case for that proposal). If the requested type conforms to RawRepresentable and the encoded value can be converted to RawValue (perhaps using a failable numeric initializer) I would expect the raw value initializer to be used to attempt conversion. If Swift ever gained a standard mechanism for generalized value conversion I would also expect that to be used if a conversion is available from the encoded type to the requested type.
> 
> If either of those conversions fail I would expect something like an “invalid value” error or a “conversion failure” error rather than a “type mismatch” error. The types don’t exactly mismatch, we just have a failable conversion process that did not succeed.
> 
> Keep in mind that no type information is written into the payload, so the interpretation of this is up to the Encoder and its format.
> JSON, for instance, has no delineation between number types. For {"value": 1}, you should be able to decode(…, forKey: .value) the value through any one of the numeric types, since 1 is representable by any of them. However, requesting it as a String should throw a .coderTypeMismatch.
> 
It sounds like the behavior we’ll get for JSON numbers is what I would expect.  But what about encoders that do keep track of the type of a numeric value during serialization?  What behavior is expected by them?  

Protocols are about semantics, not just syntax.  "A value of the requested type, if present for the given key and convertible to the requested type” is a very vague semantic statement.  Should we define the semantics of what kind of conversions are valid and which are not more precisely?  Or do you think it is important to leave this up to individual decoders to decide?  Mandating failable numeric conversions would make it easier to change a numeric property types without necessarily breaking archived data (especially when promoting to larger numeric types).

You didn’t answer my question about decoding RawRepresentable types where the payload is RawValue.  It looks like that won’t happen and instead the RawRepresentable type would need to conform to Codable, which would usually be synthesized to store a single value.  That seems reasonable.

> If you try to ask for 3.14 as an Int, I think it’s valid to get a .coderTypeMismatch — you asked for something of the wrong type altogether. I don’t see much value in providing a different error type to represent the same thing.
> 
> 
> Context:
> 
> I’m glad Brent raised the topic of supporting multiple formats. In his example the difference was remote and local formats. I’ve also seen cases where the same API requires the same model to be encoded differently in different endpoints. This is awful, but it also happens sometimes in the wild. Supporting an application specified encoding context would be very useful for handling these situations (the `codingKeyContex` would not always be sufficient and would usually not be a good way to determine the required encoding or decoding format).
> 
> A `[UserInfoKey: Any]` context was mentioned as a possibility. This would be better than nothing, but why not use the type system to preserve information about the type of the context? I have a slightly different approach in mind. Why not just have a protocol that refines Codable with context awareness?
> 
> public protocol ContextAwareCodable: Codable {
> associatedtype Context
> init(from decoder: Decoder, with context: Context?) throws
> func encode(to encoder: Encoder, with context: Context?) throws
> }
> extension ContextAwareCodable {
> init(from decoder: Decoder) throws {
> try self.init(from: decoder, with: nil
> }
> func encode(to encoder: Encoder) throws {
> try self.encode(to: encoder, with: nil)
> }
> }
> 
> There are a few issues with this:
> 
> For an Encoder to be able to call encode(to:with:) with the Context type, it would have to know the Context type statically. That means it would have to be generic on the Context type (we’ve looked at this in the past with regards to the encoder declaring the type of key it’s willing to accept)
This is not true.  It could expose a generic top-level method and capture the type of Context there in the internal type used to perform the actual encoding.
> It makes more sense for the Encoder to define what context type it vends, rather than have Codable things discriminate on what they can accept. If you have two types in a payload which require mutually-exclusive context types, you cannot encode the payload at all
> associatedtype requirements cannot be overridden by subclasses. That means if you subclass a ContextAwareCodable type, you cannot require a different context type than your parent, which may be a no-go. This, by the way, is why we don’t have an official associatedtype CodingKeys : CodingKey requirement on Codable
I based my suggestion on real world use cases I have encountered.  In these use cases the context would usually be an enum that all types involved in any given (de)serialization share. 

There is a tradeoff here - you get increased type safety but as you point you you can’t set up a context where some types involved in the (de)serialization know about one part of the context and others know about a different part of the context.  In my experience the type safety would provide a lot more benefit than the additional flexibility.  IMO we need to stop using Any payload dictionaries when we can avoid them.  Does anyone actually have specific real world use cases where the statically typed approach wouldn’t work?

> 
> Encoders and Decoders would be encouraged to support a top level encode / decode method which is generic and takes an application supplied context. When the context is provided it would be given to all `ContextAwareCodable` types that are aware of this context type during encoding or decoding. The coding container protocols would include an overload for `ContextAwareCodable` allowing the container to know whether the Value understands the context given to the top level encoder / decoder:
> 
> open func encode<Value : ContextAwareCodable>(_ value: Value?, forKey key: Key) throws
> 
> A common top level signature for coders and decoders would look something like this:
> 
> open func encode<Value : Codable, Context>(_ value: Value, with context: Context) throws -> Data
> 
> This approach would preserve type information about the encoding / decoding context. It falls back to the basic Codable implementation when a Value doesn’t know about the current context. The default implementation simply translates this to a nil context allowing ContextAwareCodable types to have a single implementation of the initializer and the encoding method that is used whether they are able to understand the current context or not.
> 
> I amend the above — this means that if you have two types which require different contexts in the same payload, only one of them will get the context and the other silently will not. I’m not sure this is better.
> 
Like I said, there is a tradeoff involved here.  A type can only choose one kind of context to care about, but then we get static type safety.  If that context isn’t available they type does not need need to know about it.  Careful design using existentials for context types would allow users to gradually reduce type safety if necessary.  If a type really wants to know about all contexts it would simply use Any as its context type.  This gives us more type safety when we want it without losing the ability to access arbitrary context data when we want to do that.
> A slightly more type-erased context type would allow all members to look at the context if desired without having to split the protocol, require multiple different types and implementations, etc.
> 
I don’t quite follow you here.  Can you elaborate?

> 
> - Matthew
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170317/b7d3b5e4/attachment.html>