[swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Fri Mar 17 13:51:16 CDT 2017

Is there any sense of encoding versions (as in, changes to the JSON
representation, for instance?) I don't know that it is necessarily a good
idea overall but now is the time to consider it.

On Fri, Mar 17, 2017 at 2:27 PM, Matthew Johnson via swift-evolution <
swift-evolution at swift.org> wrote:

>
> On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution <
> swift-evolution at swift.org> wrote:
>
> On 15 Mar 2017, at 22:58, Zach Waldowski wrote:
>
> Another issue of scale - I had to switch to a native mail client as
> replying inline severely broke my webmail client. ;-)
>
> Again, lots of love here. Responses inline.
>
> On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution <
> swift-evolution at swift.org> wrote:
> Proposed solution
> We will be introducing the following new types:
>
> protocol Codable: Adopted by types to opt into archival. Conformance may
> be automatically derived in cases where all properties are also Codable.
>
> FWIW I think this is acceptable compromise. If the happy path is derived
> conformances, only-decodable or only-encodable types feel like a lazy way
> out on the part of a user of the API, and builds a barrier to proper
> testing.
>
> [snip]
>
> Structured types (i.e. types which encode as a collection of properties)
> encode and decode their properties in a keyed manner. Keys may be
> String-convertible or Int-convertible (or both), and user types which have
> properties should declare semantic key enums which map keys to their
> properties. Keys must conform to the CodingKey protocol:
> public protocol CodingKey { <##snip##> }
>
> A few things here:
>
> The protocol leaves open the possibility of having both a String or Int
> representation, or neither. What should a coder do in either case? Are the
> representations intended to be mutually exclusive, or not? The protocol
> design doesn’t seem particularly matching with the flavor of Swift; I’d
> expect something along the lines of a CodingKey enum and the protocol
> CodingKeyRepresentable. It’s also possible that the concerns of the two are
> orthogonal enough that they deserve separate container(keyedBy:)
> requirements.
>
> The general answer to "what should a coder do" is "what is appropriate for
> its format". For a format that uses exclusively string keys (like JSON),
> the string representation (if present on a key) will always be used. If the
> key has no string representation but does have an integer representation,
> the encoder may choose to stringify the integer. If the key has neither, it
> is appropriate for the Encoder to fail in some way.
>
> On the flip side, for totally flat formats, an Encoder may choose to
> ignore keys altogether, in which case it doesn’t really matter. The choice
> is up to the Encoder and its format.
>
> The string and integer representations are not meant to be mutually
> exclusive at all, and in fact, where relevant, we encourage providing both
> types of representations for flexibility.
>
> As for the possibility of having neither representation, this question
> comes up often. I’d like to summarize the thought process here by quoting
> some earlier review (apologies for the poor formatting from my mail client):
>
> If there are two options, each of which is itself optional, we have 4
> possible combinations. But! At the same time we prohibit one combination by
> what? Runtime error? Why not use a 3-case enum for it? Even further down
> the rabbit whole there might be a CodingKey<> specialized for a concrete
> combination, like CodingKey<StringAndIntKey> or just CodingKey<StringKey>,
> but I’m not sure whether our type system will make it useful or possible…
>
> public enum CodingKeyValue {
> case integer(value: Int)
> case string(value: String)
> case both(intValue: Int, stringValue: String)
> }
> public protocol CodingKey {
> init?(value: CodingKeyValue)
> var value: CodingKeyValue { get }
> }
>
> I agree that this certainly feels suboptimal. We’ve certainly explored
> other possibilities before sticking to this one, so let me try to summarize
> here:
>
> * Having a concrete 3-case CodingKey enum would preclude the possibility
> of having neither a stringValue nor an intValue. However, there is a lot of
> value in having the key types belong to the type being encoded (more
> safety, impossible to accidentally mix key types, private keys, etc.); if
> the CodingKey type itself is an enum (which cannot be inherited from), then
> this prevents differing key types.
> * Your solution as presented is better: CodingKey itself is still a
> protocol, and the value itself is the 3-case enum. However, since
> CodingKeyValue is not literal-representable, user keys cannot be enums
> RawRepresentable by CodingKeyValue. That means that the values must either
> be dynamically returned, or (for attaining the benefits that we want to
> give users — easy representation, autocompletion, etc.) the type has to be
> a struct with static lets on it giving the CodingKeyValues. This certainly
> works, but is likely not what a developer would have in mind when working
> with the API; the power of enums in Swift makes them very easy to reach
> for, and I’m thinking most users would expect their keys to be enums. We’d
> like to leverage that where we can, especially since RawRepresentable enums
> are appropriate in the vast majority of use cases.
> * Three separate CodingKey protocols (one for Strings, one for Ints, and
> one for both). You could argue that this is the most correct version, since
> it most clearly represents what we’re looking for. However, this means that
> every method now accepting a CodingKey must be converted into 3 overloads
> each accepting different types. This explodes the API surface, is confusing
> for users, and also makes it impossible to use CodingKey as an existential
> (unless it’s an empty 4th protocol which makes no static guarantees and the
> others inherit from).
> * [The current] approach. On the one hand, this allows for the accidental
> representation of a key with neither a stringValue nor an intValue. On the
> other, we want to make it really easy to use autogenerated keys, or
> autogenerated key implementations if you provide the cases and values
> yourself. The nil value possibility is only a concern when writing
> stringValue and intValue yourself, which the vast majority of users should
> not have to do.
> * Additionally, a key word in that sentence bolded above is “generally”.
> As part of making this API more generalized, we push a lot of decisions to
> Encoders and Decoders. For many formats, it’s true that having a key with
> no value is an error, but this is not necessarily true for all formats; for
> a linear, non-keyed format, it is entirely reasonable to ignore the keys in
> the first place, or replaced them with fixed-format values. The decision of
> how to handle this case is left up to Encoders and Decoders; for most
> formats (and for our implementations), this is certainly an error, and we
> would likely document this and either throw or preconditionFailure. But
> this is not the case always.
> * In terms of syntax, there’s another approach that would be really nice
> (but is currently not feasible) — if enums were RawRepresentable in terms
> of tuples, it would be possible to give implementations for String, Int,
> (Int, String), (String, Int), etc., making this condition harder to
> represent by default unless you really mean to.
>
> Hope that gives some helpful background on this decision. FWIW, the only
> way to end up with a key having no intValue or stringValue is manually
> implementing the CodingKey protocol (which should be *exceedingly* rare)
> and implementing the methods by not switching on self, or some other
> method that would allow you to forget to give a key neither value.
>
> Speaking of the mutually exclusive representations - what above
> serializations that doesn’t code as one of those two things? YAML can have
> anything be a “key”, and despite that being not particularly sane, it is a
> use case.
>
> We’ve explored this, but at the end of the day, it’s not possible to
> generalize this to the point where we could represent all possible options
> on all possible formats because you cannot make any promises as to what’s
> possible and what’s not statically.
>
> We’d like to strike a balance here between strong static guarantees on one
> end (the extreme end of which introduces a new API for every single format,
> since you can almost perfectly statically express what’s possible and what
> isn’) and generalization on the other (the extreme end of which is an empty
> protocol because there really are encoding formats which are mutually
> exclusive). So in this case, this API would support producing and consuming
> YAML with string or integer keys, but not arbitrary YAML.
>
> For most types, String-convertible keys are a reasonable default; for
> performance, however, Int-convertible keys are preferred, and Encoders may
> choose to make use of Ints over Strings. Framework types should provide
> keys which have both for flexibility and performance across different types
> of Encoders. It is generally an error to provide a key which has neither a
> stringValue nor an intValue.
>
> Could you speak a little more to using Int-convertible keys for
> performance? I get the feeling int-based keys parallel the legacy of
> NSCoder’s older design, and I don’t really see anyone these days supporting
> non-keyed archivers. They strike me as fragile. What other use cases are
> envisioned for ordered archiving than that?
>
> We agree that integer keys are fragile, and from years (decades) of
> experience with NSArchiver, we are aware of the limitations that such
> encoding offers. For this reason, we will never synthesize integer keys on
> your behalf. This is something you must put thought into, if using an
> integer key for archival.
>
> However, there are use-cases (both in archival and in serialization, but
> especially so in serialization) where integer keys are useful. Ordered
> encoding is one such possibility (when the format supports it, integer keys
> are sequential, etc.), and is helpful for, say, marshaling objects in an
> XPC context (where both sides are aware of the format, are running the same
> version of the same code, on the same device) — keys waste time and
> bandwidth unnecessarily in some cases.
>
> Integer keys don’t necessarily imply ordered encoding, however. There are
> binary encoding formats which support integer-keyed dictionaries (read:
> serialized hash maps) which are more efficient to encode and decode than
> similar string-keyed ones. In that case, as long as integer keys are chosen
> with care, the end result is more performant.
>
> But again, this depends on the application and use case. Defining integer
> keys requires manual effort because we want thought put into defining them;
> they are indeed fragile when used carelessly.
>
> [snip]
>
> Keyed Encoding Containers
>
> Keyed encoding containers are the primary interface that most Codable
> types interact with for encoding and decoding. Through these, Codable types
> have strongly-keyed access to encoded data by using keys that are
> semantically correct for the operations they want to express.
>
> Since semantically incompatible keys will rarely (if ever) share the same
> key type, it is impossible to mix up key types within the same container
> (as is possible with Stringkeys), and since the type is known statically,
> keys get autocompletion by the compiler.
>
> open class KeyedEncodingContainer<Key : CodingKey> {
>
> Like others, I’m a little bummed about this part of the design. Your
> reasoning up-thread is sound, but I chafe a bit on having to reabstract and
> a little more on having to be a reference type. Particularly knowing that
> it’s got a bit more overhead involved… I /like/ that NSKeyedArchiver can
> simply push some state and pass itself as the next encoding container down
> the stack.
>
> There’s not much more to be said about why this is a class that I haven’t
> covered; if it were possible to do otherwise at the moment, then we would.
>
> It is possible using a manually written type-erased wrapper along the
> lines of AnySequence and AnyCollection.  I don’t recall seeing a rationale
> for why you don’t want to go this route.  I would still like to hear more
> on this topic.
>
> As for *why* we do this — this is the crux of the whole API. We not only
> want to make it easy to use a custom key type that is semantically correct
> for your type, we want to make it difficult to do the easy but incorrect
> thing. From experience with NSKeyedArchiver, we’d like to move away from
> unadorned string (and integer) keys, where typos and accidentally reused
> keys are common, and impossible to catch statically.
> encode<T : Codable>(_: T?, forKey: String) unfortunately not only
> encourages code like encode(foo, forKey: "foi") // whoops, typo, it is *more
> difficult* to use a semantic key type: encode(foo, forKey:
> CodingKeys.foo.stringValue). The additional typing and lack of
> autocompletion makes it an active disincentive. encode<T : Codable>(_:
> T?, forKey: Key) reverses both of these — it makes it impossible to use
> unadorned strings or accidentally use keys from another type, and nets
> shorter code with autocompletion: encode(foo, forKey: .foo)
>
> The side effect of this being the fact that keyed containers are classes
> is suboptimal, I agree, but necessary.
>
>
> open func encode<Value : Codable>(_ value: Value?, forKey key: Key) throws
>
> Does this win anything over taking a Codable?
>
> Taking the concrete type over an existential allows for static dispatch on
> the type within the implementation, and is a performance win in some cases.
>
> open func encode(_ value: Bool?, forKey key: Key) throws
> open func encode(_ value: Int?, forKey key: Key) throws
> open func encode(_ value: Int8?, forKey key: Key) throws
> open func encode(_ value: Int16?, forKey key: Key) throws
> open func encode(_ value: Int32?, forKey key: Key) throws
> open func encode(_ value: Int64?, forKey key: Key) throws
> open func encode(_ value: UInt?, forKey key: Key) throws
> open func encode(_ value: UInt8?, forKey key: Key) throws
> open func encode(_ value: UInt16?, forKey key: Key) throws
> open func encode(_ value: UInt32?, forKey key: Key) throws
> open func encode(_ value: UInt64?, forKey key: Key) throws
> open func encode(_ value: Float?, forKey key: Key) throws
> open func encode(_ value: Double?, forKey key: Key) throws
> open func encode(_ value: String?, forKey key: Key) throws
> open func encode(_ value: Data?, forKey key: Key) throws
>
> What is the motivation behind abandoning the idea of “primitives” from the
> Alternatives Considered? Performance? Being unable to close the protocol?
>
> Being unable to close the protocol is the primary reason. Not being able
> to tell at a glance what the concrete types belonging to this set are is
> related, and also a top reason.
>
> Looks like we have another strong motivating use case for closed
> protocols.  I hope that will be in scope for Swift 5.
>
> It would be great for the auto-generated documentation and “headers" to
> provide a list of all public or open types inheriting from a closed class
> or conforming to a closed protocol (when we get them).  This would go a
> long way towards addressing your second reason.
>
>
> What ways is encoding a value envisioned to fail? I understand wanting to
> allow maximum flexibility, and being symmetric to `decode` throwing, but
> there are plenty of “conversion” patterns the are asymmetric in the ways
> they can fail (Date formatters, RawRepresentable,
> LosslessStringConvertible, etc.).
>
> Different formats support different concrete values, even of primitive
> types. For instance, you cannot natively encode Double.nan in JSON, but
> you can in plist. Without additional options on JSONEncoder, encode(Double.nan,
> forKey: …) will throw.
>
> /// For `Encoder`s that implement this functionality, this will only
> encode the given object and associate it with the given key if it encoded
> unconditionally elsewhere in the archive (either previously or in the
> future).
> open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?,
> forKey key: Key) throws
>
> Is this correct that if I send a Cocoa-style object graph (with weak
> backrefs), an encoder could infinitely recurse? Or is a coder supposed to
> detect that?
>
> encodeWeak has a default implementation that calls the regular encode<T :
> Codable>(_: T, forKey: Key); only formats which actually support weak
> backreferencing should override this implementation, so it should always be
> safe to call (it will simply unconditionally encode the object by default).
>
> open var codingKeyContext: [CodingKey]
> }
> [snippity snip]
>
> Alright, those are just my first thoughts. I want to spend a little time
> marinating in the code from PR #8124 before I comment further. Cheers! I
> owe you, Michael, and Tony a few drinks for sure.
>
> Hehe, thanks :)
>
> Zach Waldowski
> zach at waldowski.me
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170317/14f0945d/attachment.html>