[swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Fri Mar 17 13:27:33 CDT 2017

> On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution <swift-evolution at swift.org> wrote:
> 
> On 15 Mar 2017, at 22:58, Zach Waldowski wrote:
> 
> 
> Another issue of scale - I had to switch to a native mail client as replying inline severely broke my webmail client. ;-)
> 
> Again, lots of love here. Responses inline.
> On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution <swift-evolution at swift.org> wrote:
> Proposed solution
> We will be introducing the following new types:
> 
> protocol Codable: Adopted by types to opt into archival. Conformance may be automatically derived in cases where all properties are also Codable.
> 
> FWIW I think this is acceptable compromise. If the happy path is derived conformances, only-decodable or only-encodable types feel like a lazy way out on the part of a user of the API, and builds a barrier to proper testing.
> 
> [snip]
> 
> Structured types (i.e. types which encode as a collection of properties) encode and decode their properties in a keyed manner. Keys may be String-convertible or Int-convertible (or both), and user types which have properties should declare semantic key enums which map keys to their properties. Keys must conform to the CodingKey protocol:
> public protocol CodingKey { <##snip##> }
> 
> A few things here:
> 
> The protocol leaves open the possibility of having both a String or Int representation, or neither. What should a coder do in either case? Are the representations intended to be mutually exclusive, or not? The protocol design doesn’t seem particularly matching with the flavor of Swift; I’d expect something along the lines of a CodingKey enum and the protocol CodingKeyRepresentable. It’s also possible that the concerns of the two are orthogonal enough that they deserve separate container(keyedBy:) requirements.
> 
> The general answer to "what should a coder do" is "what is appropriate for its format". For a format that uses exclusively string keys (like JSON), the string representation (if present on a key) will always be used. If the key has no string representation but does have an integer representation, the encoder may choose to stringify the integer. If the key has neither, it is appropriate for the Encoder to fail in some way.
> 
> On the flip side, for totally flat formats, an Encoder may choose to ignore keys altogether, in which case it doesn’t really matter. The choice is up to the Encoder and its format.
> 
> The string and integer representations are not meant to be mutually exclusive at all, and in fact, where relevant, we encourage providing both types of representations for flexibility.
> 
> As for the possibility of having neither representation, this question comes up often. I’d like to summarize the thought process here by quoting some earlier review (apologies for the poor formatting from my mail client):
> 
> 
> If there are two options, each of which is itself optional, we have 4 possible combinations. But! At the same time we prohibit one combination by what? Runtime error? Why not use a 3-case enum for it? Even further down the rabbit whole there might be a CodingKey<> specialized for a concrete combination, like CodingKey<StringAndIntKey> or just CodingKey<StringKey>, but I’m not sure whether our type system will make it useful or possible…
> 
> public enum CodingKeyValue {
> case integer(value: Int)
> case string(value: String)
> case both(intValue: Int, stringValue: String)
> }
> public protocol CodingKey {
> init?(value: CodingKeyValue)
> var value: CodingKeyValue { get }
> }
> 
> I agree that this certainly feels suboptimal. We’ve certainly explored other possibilities before sticking to this one, so let me try to summarize here:
> 
> * Having a concrete 3-case CodingKey enum would preclude the possibility of having neither a stringValue nor an intValue. However, there is a lot of value in having the key types belong to the type being encoded (more safety, impossible to accidentally mix key types, private keys, etc.); if the CodingKey type itself is an enum (which cannot be inherited from), then this prevents differing key types.
> * Your solution as presented is better: CodingKey itself is still a protocol, and the value itself is the 3-case enum. However, since CodingKeyValue is not literal-representable, user keys cannot be enums RawRepresentable by CodingKeyValue. That means that the values must either be dynamically returned, or (for attaining the benefits that we want to give users — easy representation, autocompletion, etc.) the type has to be a struct with static lets on it giving the CodingKeyValues. This certainly works, but is likely not what a developer would have in mind when working with the API; the power of enums in Swift makes them very easy to reach for, and I’m thinking most users would expect their keys to be enums. We’d like to leverage that where we can, especially since RawRepresentable enums are appropriate in the vast majority of use cases.
> * Three separate CodingKey protocols (one for Strings, one for Ints, and one for both). You could argue that this is the most correct version, since it most clearly represents what we’re looking for. However, this means that every method now accepting a CodingKey must be converted into 3 overloads each accepting different types. This explodes the API surface, is confusing for users, and also makes it impossible to use CodingKey as an existential (unless it’s an empty 4th protocol which makes no static guarantees and the others inherit from).
> * [The current] approach. On the one hand, this allows for the accidental representation of a key with neither a stringValue nor an intValue. On the other, we want to make it really easy to use autogenerated keys, or autogenerated key implementations if you provide the cases and values yourself. The nil value possibility is only a concern when writing stringValue and intValue yourself, which the vast majority of users should not have to do.
> * Additionally, a key word in that sentence bolded above is “generally”. As part of making this API more generalized, we push a lot of decisions to Encoders and Decoders. For many formats, it’s true that having a key with no value is an error, but this is not necessarily true for all formats; for a linear, non-keyed format, it is entirely reasonable to ignore the keys in the first place, or replaced them with fixed-format values. The decision of how to handle this case is left up to Encoders and Decoders; for most formats (and for our implementations), this is certainly an error, and we would likely document this and either throw or preconditionFailure. But this is not the case always.
> * In terms of syntax, there’s another approach that would be really nice (but is currently not feasible) — if enums were RawRepresentable in terms of tuples, it would be possible to give implementations for String, Int, (Int, String), (String, Int), etc., making this condition harder to represent by default unless you really mean to.
> 
> Hope that gives some helpful background on this decision. FWIW, the only way to end up with a key having no intValue or stringValue is manually implementing the CodingKey protocol (which should be exceedingly rare) and implementing the methods by not switching on self, or some other method that would allow you to forget to give a key neither value.
> 
> 
> Speaking of the mutually exclusive representations - what above serializations that doesn’t code as one of those two things? YAML can have anything be a “key”, and despite that being not particularly sane, it is a use case.
> 
> We’ve explored this, but at the end of the day, it’s not possible to generalize this to the point where we could represent all possible options on all possible formats because you cannot make any promises as to what’s possible and what’s not statically.
> 
> We’d like to strike a balance here between strong static guarantees on one end (the extreme end of which introduces a new API for every single format, since you can almost perfectly statically express what’s possible and what isn’) and generalization on the other (the extreme end of which is an empty protocol because there really are encoding formats which are mutually exclusive). So in this case, this API would support producing and consuming YAML with string or integer keys, but not arbitrary YAML.
> 
> 
> For most types, String-convertible keys are a reasonable default; for performance, however, Int-convertible keys are preferred, and Encoders may choose to make use of Ints over Strings. Framework types should provide keys which have both for flexibility and performance across different types of Encoders. It is generally an error to provide a key which has neither a stringValue nor an intValue.
> Could you speak a little more to using Int-convertible keys for performance? I get the feeling int-based keys parallel the legacy of NSCoder’s older design, and I don’t really see anyone these days supporting non-keyed archivers. They strike me as fragile. What other use cases are envisioned for ordered archiving than that?
> 
> We agree that integer keys are fragile, and from years (decades) of experience with NSArchiver, we are aware of the limitations that such encoding offers. For this reason, we will never synthesize integer keys on your behalf. This is something you must put thought into, if using an integer key for archival.
> 
> However, there are use-cases (both in archival and in serialization, but especially so in serialization) where integer keys are useful. Ordered encoding is one such possibility (when the format supports it, integer keys are sequential, etc.), and is helpful for, say, marshaling objects in an XPC context (where both sides are aware of the format, are running the same version of the same code, on the same device) — keys waste time and bandwidth unnecessarily in some cases.
> 
> Integer keys don’t necessarily imply ordered encoding, however. There are binary encoding formats which support integer-keyed dictionaries (read: serialized hash maps) which are more efficient to encode and decode than similar string-keyed ones. In that case, as long as integer keys are chosen with care, the end result is more performant.
> 
> But again, this depends on the application and use case. Defining integer keys requires manual effort because we want thought put into defining them; they are indeed fragile when used carelessly.
> 
> 
> [snip]
> 
> Keyed Encoding Containers
> 
> Keyed encoding containers are the primary interface that most Codable types interact with for encoding and decoding. Through these, Codable types have strongly-keyed access to encoded data by using keys that are semantically correct for the operations they want to express.
> 
> Since semantically incompatible keys will rarely (if ever) share the same key type, it is impossible to mix up key types within the same container (as is possible with Stringkeys), and since the type is known statically, keys get autocompletion by the compiler.
> 
> open class KeyedEncodingContainer<Key : CodingKey> {
> 
> Like others, I’m a little bummed about this part of the design. Your reasoning up-thread is sound, but I chafe a bit on having to reabstract and a little more on having to be a reference type. Particularly knowing that it’s got a bit more overhead involved… I /like/ that NSKeyedArchiver can simply push some state and pass itself as the next encoding container down the stack.
> 
> There’s not much more to be said about why this is a class that I haven’t covered; if it were possible to do otherwise at the moment, then we would.
> 
It is possible using a manually written type-erased wrapper along the lines of AnySequence and AnyCollection.  I don’t recall seeing a rationale for why you don’t want to go this route.  I would still like to hear more on this topic.

> As for why we do this — this is the crux of the whole API. We not only want to make it easy to use a custom key type that is semantically correct for your type, we want to make it difficult to do the easy but incorrect thing. From experience with NSKeyedArchiver, we’d like to move away from unadorned string (and integer) keys, where typos and accidentally reused keys are common, and impossible to catch statically.
> encode<T : Codable>(_: T?, forKey: String) unfortunately not only encourages code like encode(foo, forKey: "foi") // whoops, typo, it is more difficult to use a semantic key type: encode(foo, forKey: CodingKeys.foo.stringValue). The additional typing and lack of autocompletion makes it an active disincentive. encode<T : Codable>(_: T?, forKey: Key) reverses both of these — it makes it impossible to use unadorned strings or accidentally use keys from another type, and nets shorter code with autocompletion: encode(foo, forKey: .foo)
> 
> The side effect of this being the fact that keyed containers are classes is suboptimal, I agree, but necessary.
> 
> 
> 
> open func encode<Value : Codable>(_ value: Value?, forKey key: Key) throws
> 
> Does this win anything over taking a Codable?
> 
> Taking the concrete type over an existential allows for static dispatch on the type within the implementation, and is a performance win in some cases.
> 
> 
> open func encode(_ value: Bool?, forKey key: Key) throws
> open func encode(_ value: Int?, forKey key: Key) throws
> open func encode(_ value: Int8?, forKey key: Key) throws
> open func encode(_ value: Int16?, forKey key: Key) throws
> open func encode(_ value: Int32?, forKey key: Key) throws
> open func encode(_ value: Int64?, forKey key: Key) throws
> open func encode(_ value: UInt?, forKey key: Key) throws
> open func encode(_ value: UInt8?, forKey key: Key) throws
> open func encode(_ value: UInt16?, forKey key: Key) throws
> open func encode(_ value: UInt32?, forKey key: Key) throws
> open func encode(_ value: UInt64?, forKey key: Key) throws
> open func encode(_ value: Float?, forKey key: Key) throws
> open func encode(_ value: Double?, forKey key: Key) throws
> open func encode(_ value: String?, forKey key: Key) throws
> open func encode(_ value: Data?, forKey key: Key) throws
> 
> What is the motivation behind abandoning the idea of “primitives” from the Alternatives Considered? Performance? Being unable to close the protocol?
> 
> Being unable to close the protocol is the primary reason. Not being able to tell at a glance what the concrete types belonging to this set are is related, and also a top reason.
> 
Looks like we have another strong motivating use case for closed protocols.  I hope that will be in scope for Swift 5.  

It would be great for the auto-generated documentation and “headers" to provide a list of all public or open types inheriting from a closed class or conforming to a closed protocol (when we get them).  This would go a long way towards addressing your second reason.

> 
> What ways is encoding a value envisioned to fail? I understand wanting to allow maximum flexibility, and being symmetric to `decode` throwing, but there are plenty of “conversion” patterns the are asymmetric in the ways they can fail (Date formatters, RawRepresentable, LosslessStringConvertible, etc.).
> 
> Different formats support different concrete values, even of primitive types. For instance, you cannot natively encode Double.nan in JSON, but you can in plist. Without additional options on JSONEncoder, encode(Double.nan, forKey: …) will throw.
> 
> 
> /// For `Encoder`s that implement this functionality, this will only encode the given object and associate it with the given key if it encoded unconditionally elsewhere in the archive (either previously or in the future).
> open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?, forKey key: Key) throws
> 
> Is this correct that if I send a Cocoa-style object graph (with weak backrefs), an encoder could infinitely recurse? Or is a coder supposed to detect that?
> 
> encodeWeak has a default implementation that calls the regular encode<T : Codable>(_: T, forKey: Key); only formats which actually support weak backreferencing should override this implementation, so it should always be safe to call (it will simply unconditionally encode the object by default).
> 
> 
> open var codingKeyContext: [CodingKey]
> }
> [snippity snip]
> Alright, those are just my first thoughts. I want to spend a little time marinating in the code from PR #8124 before I comment further. Cheers! I owe you, Michael, and Tony a few drinks for sure.
> 
> Hehe, thanks :)
> 
> 
> Zach Waldowski
> zach at waldowski.me
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170317/f85504b1/attachment.html>