[swift-evolution] [Proposal] Foundation Swift Archival & Serialization

T.J. Usiyan griotspeak at gmail.com
Fri Mar 17 19:14:47 CDT 2017

This does work and can be the solution, I suppose,  but I was thinking
about agreeing upon some convention that could actually make it into the
protocols or–at the least–documentation. As it is, the problem creeps up on
newcomers and isn't so obvious until you actually have a second version of
your format.

On Fri, Mar 17, 2017 at 3:47 PM, Itai Ferber <iferber at apple.com> wrote:

> Do you mean versions of the format, or versions of your type?
> If the latter, this can be done on a case-by-case basis, as needed. You
> can always do something like
> struct Foo : Codable {
>     // Name this as appropriate
>     private let jsonVersion = 1.1
> }
> and have it encode as well.
> On 17 Mar 2017, at 11:51, T.J. Usiyan wrote:
> Is there any sense of encoding versions (as in, changes to the JSON
> representation, for instance?) I don't know that it is necessarily a good
> idea overall but now is the time to consider it.
> On Fri, Mar 17, 2017 at 2:27 PM, Matthew Johnson via swift-evolution <
> swift-evolution at swift.org> wrote:
>> On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution <
>> swift-evolution at swift.org> wrote:
>> On 15 Mar 2017, at 22:58, Zach Waldowski wrote:
>> Another issue of scale - I had to switch to a native mail client as
>> replying inline severely broke my webmail client. ;-)
>> Again, lots of love here. Responses inline.
>> On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution <
>> swift-evolution at swift.org> wrote:
>> Proposed solution
>> We will be introducing the following new types:
>> protocol Codable: Adopted by types to opt into archival. Conformance may
>> be automatically derived in cases where all properties are also Codable.
>> FWIW I think this is acceptable compromise. If the happy path is derived
>> conformances, only-decodable or only-encodable types feel like a lazy way
>> out on the part of a user of the API, and builds a barrier to proper
>> testing.
>> [snip]
>> Structured types (i.e. types which encode as a collection of properties)
>> encode and decode their properties in a keyed manner. Keys may be
>> String-convertible or Int-convertible (or both), and user types which have
>> properties should declare semantic key enums which map keys to their
>> properties. Keys must conform to the CodingKey protocol:
>> public protocol CodingKey { <##snip##> }
>> A few things here:
>> The protocol leaves open the possibility of having both a String or Int
>> representation, or neither. What should a coder do in either case? Are the
>> representations intended to be mutually exclusive, or not? The protocol
>> design doesn’t seem particularly matching with the flavor of Swift; I’d
>> expect something along the lines of a CodingKey enum and the protocol
>> CodingKeyRepresentable. It’s also possible that the concerns of the two are
>> orthogonal enough that they deserve separate container(keyedBy:)
>> requirements.
>> The general answer to "what should a coder do" is "what is appropriate
>> for its format". For a format that uses exclusively string keys (like
>> JSON), the string representation (if present on a key) will always be used.
>> If the key has no string representation but does have an integer
>> representation, the encoder may choose to stringify the integer. If the key
>> has neither, it is appropriate for the Encoder to fail in some way.
>> On the flip side, for totally flat formats, an Encoder may choose to
>> ignore keys altogether, in which case it doesn’t really matter. The choice
>> is up to the Encoder and its format.
>> The string and integer representations are not meant to be mutually
>> exclusive at all, and in fact, where relevant, we encourage providing both
>> types of representations for flexibility.
>> As for the possibility of having neither representation, this question
>> comes up often. I’d like to summarize the thought process here by quoting
>> some earlier review (apologies for the poor formatting from my mail client):
>> If there are two options, each of which is itself optional, we have 4
>> possible combinations. But! At the same time we prohibit one combination by
>> what? Runtime error? Why not use a 3-case enum for it? Even further down
>> the rabbit whole there might be a CodingKey<> specialized for a concrete
>> combination, like CodingKey<StringAndIntKey> or just CodingKey<StringKey>,
>> but I’m not sure whether our type system will make it useful or possible…
>> public enum CodingKeyValue {
>> case integer(value: Int)
>> case string(value: String)
>> case both(intValue: Int, stringValue: String)
>> }
>> public protocol CodingKey {
>> init?(value: CodingKeyValue)
>> var value: CodingKeyValue { get }
>> }
>> I agree that this certainly feels suboptimal. We’ve certainly explored
>> other possibilities before sticking to this one, so let me try to summarize
>> here:
>> * Having a concrete 3-case CodingKey enum would preclude the possibility
>> of having neither a stringValue nor an intValue. However, there is a lot of
>> value in having the key types belong to the type being encoded (more
>> safety, impossible to accidentally mix key types, private keys, etc.); if
>> the CodingKey type itself is an enum (which cannot be inherited from), then
>> this prevents differing key types.
>> * Your solution as presented is better: CodingKey itself is still a
>> protocol, and the value itself is the 3-case enum. However, since
>> CodingKeyValue is not literal-representable, user keys cannot be enums
>> RawRepresentable by CodingKeyValue. That means that the values must either
>> be dynamically returned, or (for attaining the benefits that we want to
>> give users — easy representation, autocompletion, etc.) the type has to be
>> a struct with static lets on it giving the CodingKeyValues. This certainly
>> works, but is likely not what a developer would have in mind when working
>> with the API; the power of enums in Swift makes them very easy to reach
>> for, and I’m thinking most users would expect their keys to be enums. We’d
>> like to leverage that where we can, especially since RawRepresentable enums
>> are appropriate in the vast majority of use cases.
>> * Three separate CodingKey protocols (one for Strings, one for Ints, and
>> one for both). You could argue that this is the most correct version, since
>> it most clearly represents what we’re looking for. However, this means that
>> every method now accepting a CodingKey must be converted into 3 overloads
>> each accepting different types. This explodes the API surface, is confusing
>> for users, and also makes it impossible to use CodingKey as an existential
>> (unless it’s an empty 4th protocol which makes no static guarantees and the
>> others inherit from).
>> * [The current] approach. On the one hand, this allows for the accidental
>> representation of a key with neither a stringValue nor an intValue. On the
>> other, we want to make it really easy to use autogenerated keys, or
>> autogenerated key implementations if you provide the cases and values
>> yourself. The nil value possibility is only a concern when writing
>> stringValue and intValue yourself, which the vast majority of users should
>> not have to do.
>> * Additionally, a key word in that sentence bolded above is “generally”.
>> As part of making this API more generalized, we push a lot of decisions to
>> Encoders and Decoders. For many formats, it’s true that having a key with
>> no value is an error, but this is not necessarily true for all formats; for
>> a linear, non-keyed format, it is entirely reasonable to ignore the keys in
>> the first place, or replaced them with fixed-format values. The decision of
>> how to handle this case is left up to Encoders and Decoders; for most
>> formats (and for our implementations), this is certainly an error, and we
>> would likely document this and either throw or preconditionFailure. But
>> this is not the case always.
>> * In terms of syntax, there’s another approach that would be really nice
>> (but is currently not feasible) — if enums were RawRepresentable in terms
>> of tuples, it would be possible to give implementations for String, Int,
>> (Int, String), (String, Int), etc., making this condition harder to
>> represent by default unless you really mean to.
>> Hope that gives some helpful background on this decision. FWIW, the only
>> way to end up with a key having no intValue or stringValue is manually
>> implementing the CodingKey protocol (which should be *exceedingly* rare)
>> and implementing the methods by not switching on self, or some other
>> method that would allow you to forget to give a key neither value.
>> Speaking of the mutually exclusive representations - what above
>> serializations that doesn’t code as one of those two things? YAML can have
>> anything be a “key”, and despite that being not particularly sane, it is a
>> use case.
>> We’ve explored this, but at the end of the day, it’s not possible to
>> generalize this to the point where we could represent all possible options
>> on all possible formats because you cannot make any promises as to what’s
>> possible and what’s not statically.
>> We’d like to strike a balance here between strong static guarantees on
>> one end (the extreme end of which introduces a new API for every single
>> format, since you can almost perfectly statically express what’s possible
>> and what isn’) and generalization on the other (the extreme end of which is
>> an empty protocol because there really are encoding formats which are
>> mutually exclusive). So in this case, this API would support producing and
>> consuming YAML with string or integer keys, but not arbitrary YAML.
>> For most types, String-convertible keys are a reasonable default; for
>> performance, however, Int-convertible keys are preferred, and Encoders may
>> choose to make use of Ints over Strings. Framework types should provide
>> keys which have both for flexibility and performance across different types
>> of Encoders. It is generally an error to provide a key which has neither a
>> stringValue nor an intValue.
>> Could you speak a little more to using Int-convertible keys for
>> performance? I get the feeling int-based keys parallel the legacy of
>> NSCoder’s older design, and I don’t really see anyone these days supporting
>> non-keyed archivers. They strike me as fragile. What other use cases are
>> envisioned for ordered archiving than that?
>> We agree that integer keys are fragile, and from years (decades) of
>> experience with NSArchiver, we are aware of the limitations that such
>> encoding offers. For this reason, we will never synthesize integer keys on
>> your behalf. This is something you must put thought into, if using an
>> integer key for archival.
>> However, there are use-cases (both in archival and in serialization, but
>> especially so in serialization) where integer keys are useful. Ordered
>> encoding is one such possibility (when the format supports it, integer keys
>> are sequential, etc.), and is helpful for, say, marshaling objects in an
>> XPC context (where both sides are aware of the format, are running the same
>> version of the same code, on the same device) — keys waste time and
>> bandwidth unnecessarily in some cases.
>> Integer keys don’t necessarily imply ordered encoding, however. There are
>> binary encoding formats which support integer-keyed dictionaries (read:
>> serialized hash maps) which are more efficient to encode and decode than
>> similar string-keyed ones. In that case, as long as integer keys are chosen
>> with care, the end result is more performant.
>> But again, this depends on the application and use case. Defining integer
>> keys requires manual effort because we want thought put into defining them;
>> they are indeed fragile when used carelessly.
>> [snip]
>> Keyed Encoding Containers
>> Keyed encoding containers are the primary interface that most Codable
>> types interact with for encoding and decoding. Through these, Codable types
>> have strongly-keyed access to encoded data by using keys that are
>> semantically correct for the operations they want to express.
>> Since semantically incompatible keys will rarely (if ever) share the same
>> key type, it is impossible to mix up key types within the same container
>> (as is possible with Stringkeys), and since the type is known statically,
>> keys get autocompletion by the compiler.
>> open class KeyedEncodingContainer<Key : CodingKey> {
>> Like others, I’m a little bummed about this part of the design. Your
>> reasoning up-thread is sound, but I chafe a bit on having to reabstract and
>> a little more on having to be a reference type. Particularly knowing that
>> it’s got a bit more overhead involved… I /like/ that NSKeyedArchiver can
>> simply push some state and pass itself as the next encoding container down
>> the stack.
>> There’s not much more to be said about why this is a class that I
>> haven’t covered; if it were possible to do otherwise at the moment, then we
>> would.
>> It is possible using a manually written type-erased wrapper along the
>> lines of AnySequence and AnyCollection.  I don’t recall seeing a rationale
>> for why you don’t want to go this route.  I would still like to hear more
>> on this topic.
>> As for *why* we do this — this is the crux of the whole API. We not only
>> want to make it easy to use a custom key type that is semantically correct
>> for your type, we want to make it difficult to do the easy but incorrect
>> thing. From experience with NSKeyedArchiver, we’d like to move away from
>> unadorned string (and integer) keys, where typos and accidentally reused
>> keys are common, and impossible to catch statically.
>> encode<T : Codable>(_: T?, forKey: String) unfortunately not only
>> encourages code like encode(foo, forKey: "foi") // whoops, typo, it is *more
>> difficult* to use a semantic key type: encode(foo, forKey:
>> CodingKeys.foo.stringValue). The additional typing and lack of
>> autocompletion makes it an active disincentive. encode<T : Codable>(_:
>> T?, forKey: Key) reverses both of these — it makes it impossible to use
>> unadorned strings or accidentally use keys from another type, and nets
>> shorter code with autocompletion: encode(foo, forKey: .foo)
>> The side effect of this being the fact that keyed containers are classes
>> is suboptimal, I agree, but necessary.
>> open func encode<Value : Codable>(_ value: Value?, forKey key: Key) throws
>> Does this win anything over taking a Codable?
>> Taking the concrete type over an existential allows for static dispatch
>> on the type within the implementation, and is a performance win in some
>> cases.
>> open func encode(_ value: Bool?, forKey key: Key) throws
>> open func encode(_ value: Int?, forKey key: Key) throws
>> open func encode(_ value: Int8?, forKey key: Key) throws
>> open func encode(_ value: Int16?, forKey key: Key) throws
>> open func encode(_ value: Int32?, forKey key: Key) throws
>> open func encode(_ value: Int64?, forKey key: Key) throws
>> open func encode(_ value: UInt?, forKey key: Key) throws
>> open func encode(_ value: UInt8?, forKey key: Key) throws
>> open func encode(_ value: UInt16?, forKey key: Key) throws
>> open func encode(_ value: UInt32?, forKey key: Key) throws
>> open func encode(_ value: UInt64?, forKey key: Key) throws
>> open func encode(_ value: Float?, forKey key: Key) throws
>> open func encode(_ value: Double?, forKey key: Key) throws
>> open func encode(_ value: String?, forKey key: Key) throws
>> open func encode(_ value: Data?, forKey key: Key) throws
>> What is the motivation behind abandoning the idea of “primitives” from
>> the Alternatives Considered? Performance? Being unable to close the
>> protocol?
>> Being unable to close the protocol is the primary reason. Not being able
>> to tell at a glance what the concrete types belonging to this set are is
>> related, and also a top reason.
>> Looks like we have another strong motivating use case for closed
>> protocols.  I hope that will be in scope for Swift 5.
>> It would be great for the auto-generated documentation and “headers" to
>> provide a list of all public or open types inheriting from a closed class
>> or conforming to a closed protocol (when we get them).  This would go a
>> long way towards addressing your second reason.
>> What ways is encoding a value envisioned to fail? I understand wanting to
>> allow maximum flexibility, and being symmetric to `decode` throwing, but
>> there are plenty of “conversion” patterns the are asymmetric in the ways
>> they can fail (Date formatters, RawRepresentable,
>> LosslessStringConvertible, etc.).
>> Different formats support different concrete values, even of primitive
>> types. For instance, you cannot natively encode Double.nan in JSON, but
>> you can in plist. Without additional options on JSONEncoder, encode(Double.nan,
>> forKey: …) will throw.
>> /// For `Encoder`s that implement this functionality, this will only
>> encode the given object and associate it with the given key if it encoded
>> unconditionally elsewhere in the archive (either previously or in the
>> future).
>> open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?,
>> forKey key: Key) throws
>> Is this correct that if I send a Cocoa-style object graph (with weak
>> backrefs), an encoder could infinitely recurse? Or is a coder supposed to
>> detect that?
>> encodeWeak has a default implementation that calls the regular encode<T
>> : Codable>(_: T, forKey: Key); only formats which actually support weak
>> backreferencing should override this implementation, so it should always be
>> safe to call (it will simply unconditionally encode the object by default).
>> open var codingKeyContext: [CodingKey]
>> }
>> [snippity snip]
>> Alright, those are just my first thoughts. I want to spend a little time
>> marinating in the code from PR #8124 before I comment further. Cheers! I
>> owe you, Michael, and Tony a few drinks for sure.
>> Hehe, thanks :)
>> Zach Waldowski
>> zach at waldowski.me
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170317/81237f79/attachment.html>

More information about the swift-evolution mailing list