[swift-evolution] [Proposal] Foundation Swift Archival & Serialization

Itai Ferber iferber at apple.com
Fri Mar 17 14:47:09 CDT 2017


Do you mean versions of the format, or versions of your type?

If the latter, this can be done on a case-by-case basis, as needed. You 
can always do something like

```swift
struct Foo : Codable {
     // Name this as appropriate
     private let jsonVersion = 1.1
}
```

and have it encode as well.

On 17 Mar 2017, at 11:51, T.J. Usiyan wrote:

> Is there any sense of encoding versions (as in, changes to the JSON
> representation, for instance?) I don't know that it is necessarily a 
> good
> idea overall but now is the time to consider it.
>
> On Fri, Mar 17, 2017 at 2:27 PM, Matthew Johnson via swift-evolution <
> swift-evolution at swift.org> wrote:
>
>>
>> On Mar 17, 2017, at 1:15 PM, Itai Ferber via swift-evolution <
>> swift-evolution at swift.org> wrote:
>>
>> On 15 Mar 2017, at 22:58, Zach Waldowski wrote:
>>
>> Another issue of scale - I had to switch to a native mail client as
>> replying inline severely broke my webmail client. ;-)
>>
>> Again, lots of love here. Responses inline.
>>
>> On Mar 15, 2017, at 6:40 PM, Itai Ferber via swift-evolution <
>> swift-evolution at swift.org> wrote:
>> Proposed solution
>> We will be introducing the following new types:
>>
>> protocol Codable: Adopted by types to opt into archival. Conformance 
>> may
>> be automatically derived in cases where all properties are also 
>> Codable.
>>
>> FWIW I think this is acceptable compromise. If the happy path is 
>> derived
>> conformances, only-decodable or only-encodable types feel like a lazy 
>> way
>> out on the part of a user of the API, and builds a barrier to proper
>> testing.
>>
>> [snip]
>>
>> Structured types (i.e. types which encode as a collection of 
>> properties)
>> encode and decode their properties in a keyed manner. Keys may be
>> String-convertible or Int-convertible (or both), and user types which 
>> have
>> properties should declare semantic key enums which map keys to their
>> properties. Keys must conform to the CodingKey protocol:
>> public protocol CodingKey { <##snip##> }
>>
>> A few things here:
>>
>> The protocol leaves open the possibility of having both a String or 
>> Int
>> representation, or neither. What should a coder do in either case? 
>> Are the
>> representations intended to be mutually exclusive, or not? The 
>> protocol
>> design doesn’t seem particularly matching with the flavor of Swift; 
>> I’d
>> expect something along the lines of a CodingKey enum and the protocol
>> CodingKeyRepresentable. It’s also possible that the concerns of the 
>> two are
>> orthogonal enough that they deserve separate container(keyedBy:)
>> requirements.
>>
>> The general answer to "what should a coder do" is "what is 
>> appropriate for
>> its format". For a format that uses exclusively string keys (like 
>> JSON),
>> the string representation (if present on a key) will always be used. 
>> If the
>> key has no string representation but does have an integer 
>> representation,
>> the encoder may choose to stringify the integer. If the key has 
>> neither, it
>> is appropriate for the Encoder to fail in some way.
>>
>> On the flip side, for totally flat formats, an Encoder may choose to
>> ignore keys altogether, in which case it doesn’t really matter. The 
>> choice
>> is up to the Encoder and its format.
>>
>> The string and integer representations are not meant to be mutually
>> exclusive at all, and in fact, where relevant, we encourage providing 
>> both
>> types of representations for flexibility.
>>
>> As for the possibility of having neither representation, this 
>> question
>> comes up often. I’d like to summarize the thought process here by 
>> quoting
>> some earlier review (apologies for the poor formatting from my mail 
>> client):
>>
>> If there are two options, each of which is itself optional, we have 4
>> possible combinations. But! At the same time we prohibit one 
>> combination by
>> what? Runtime error? Why not use a 3-case enum for it? Even further 
>> down
>> the rabbit whole there might be a CodingKey<> specialized for a 
>> concrete
>> combination, like CodingKey<StringAndIntKey> or just 
>> CodingKey<StringKey>,
>> but I’m not sure whether our type system will make it useful or 
>> possible…
>>
>> public enum CodingKeyValue {
>> case integer(value: Int)
>> case string(value: String)
>> case both(intValue: Int, stringValue: String)
>> }
>> public protocol CodingKey {
>> init?(value: CodingKeyValue)
>> var value: CodingKeyValue { get }
>> }
>>
>> I agree that this certainly feels suboptimal. We’ve certainly 
>> explored
>> other possibilities before sticking to this one, so let me try to 
>> summarize
>> here:
>>
>> * Having a concrete 3-case CodingKey enum would preclude the 
>> possibility
>> of having neither a stringValue nor an intValue. However, there is a 
>> lot of
>> value in having the key types belong to the type being encoded (more
>> safety, impossible to accidentally mix key types, private keys, 
>> etc.); if
>> the CodingKey type itself is an enum (which cannot be inherited 
>> from), then
>> this prevents differing key types.
>> * Your solution as presented is better: CodingKey itself is still a
>> protocol, and the value itself is the 3-case enum. However, since
>> CodingKeyValue is not literal-representable, user keys cannot be 
>> enums
>> RawRepresentable by CodingKeyValue. That means that the values must 
>> either
>> be dynamically returned, or (for attaining the benefits that we want 
>> to
>> give users — easy representation, autocompletion, etc.) the type 
>> has to be
>> a struct with static lets on it giving the CodingKeyValues. This 
>> certainly
>> works, but is likely not what a developer would have in mind when 
>> working
>> with the API; the power of enums in Swift makes them very easy to 
>> reach
>> for, and I’m thinking most users would expect their keys to be 
>> enums. We’d
>> like to leverage that where we can, especially since RawRepresentable 
>> enums
>> are appropriate in the vast majority of use cases.
>> * Three separate CodingKey protocols (one for Strings, one for Ints, 
>> and
>> one for both). You could argue that this is the most correct version, 
>> since
>> it most clearly represents what we’re looking for. However, this 
>> means that
>> every method now accepting a CodingKey must be converted into 3 
>> overloads
>> each accepting different types. This explodes the API surface, is 
>> confusing
>> for users, and also makes it impossible to use CodingKey as an 
>> existential
>> (unless it’s an empty 4th protocol which makes no static guarantees 
>> and the
>> others inherit from).
>> * [The current] approach. On the one hand, this allows for the 
>> accidental
>> representation of a key with neither a stringValue nor an intValue. 
>> On the
>> other, we want to make it really easy to use autogenerated keys, or
>> autogenerated key implementations if you provide the cases and values
>> yourself. The nil value possibility is only a concern when writing
>> stringValue and intValue yourself, which the vast majority of users 
>> should
>> not have to do.
>> * Additionally, a key word in that sentence bolded above is 
>> “generally”.
>> As part of making this API more generalized, we push a lot of 
>> decisions to
>> Encoders and Decoders. For many formats, it’s true that having a 
>> key with
>> no value is an error, but this is not necessarily true for all 
>> formats; for
>> a linear, non-keyed format, it is entirely reasonable to ignore the 
>> keys in
>> the first place, or replaced them with fixed-format values. The 
>> decision of
>> how to handle this case is left up to Encoders and Decoders; for most
>> formats (and for our implementations), this is certainly an error, 
>> and we
>> would likely document this and either throw or preconditionFailure. 
>> But
>> this is not the case always.
>> * In terms of syntax, there’s another approach that would be really 
>> nice
>> (but is currently not feasible) — if enums were RawRepresentable in 
>> terms
>> of tuples, it would be possible to give implementations for String, 
>> Int,
>> (Int, String), (String, Int), etc., making this condition harder to
>> represent by default unless you really mean to.
>>
>> Hope that gives some helpful background on this decision. FWIW, the 
>> only
>> way to end up with a key having no intValue or stringValue is 
>> manually
>> implementing the CodingKey protocol (which should be *exceedingly* 
>> rare)
>> and implementing the methods by not switching on self, or some other
>> method that would allow you to forget to give a key neither value.
>>
>> Speaking of the mutually exclusive representations - what above
>> serializations that doesn’t code as one of those two things? YAML 
>> can have
>> anything be a “key”, and despite that being not particularly 
>> sane, it is a
>> use case.
>>
>> We’ve explored this, but at the end of the day, it’s not possible 
>> to
>> generalize this to the point where we could represent all possible 
>> options
>> on all possible formats because you cannot make any promises as to 
>> what’s
>> possible and what’s not statically.
>>
>> We’d like to strike a balance here between strong static guarantees 
>> on one
>> end (the extreme end of which introduces a new API for every single 
>> format,
>> since you can almost perfectly statically express what’s possible 
>> and what
>> isn’) and generalization on the other (the extreme end of which is 
>> an empty
>> protocol because there really are encoding formats which are mutually
>> exclusive). So in this case, this API would support producing and 
>> consuming
>> YAML with string or integer keys, but not arbitrary YAML.
>>
>> For most types, String-convertible keys are a reasonable default; for
>> performance, however, Int-convertible keys are preferred, and 
>> Encoders may
>> choose to make use of Ints over Strings. Framework types should 
>> provide
>> keys which have both for flexibility and performance across different 
>> types
>> of Encoders. It is generally an error to provide a key which has 
>> neither a
>> stringValue nor an intValue.
>>
>> Could you speak a little more to using Int-convertible keys for
>> performance? I get the feeling int-based keys parallel the legacy of
>> NSCoder’s older design, and I don’t really see anyone these days 
>> supporting
>> non-keyed archivers. They strike me as fragile. What other use cases 
>> are
>> envisioned for ordered archiving than that?
>>
>> We agree that integer keys are fragile, and from years (decades) of
>> experience with NSArchiver, we are aware of the limitations that such
>> encoding offers. For this reason, we will never synthesize integer 
>> keys on
>> your behalf. This is something you must put thought into, if using an
>> integer key for archival.
>>
>> However, there are use-cases (both in archival and in serialization, 
>> but
>> especially so in serialization) where integer keys are useful. 
>> Ordered
>> encoding is one such possibility (when the format supports it, 
>> integer keys
>> are sequential, etc.), and is helpful for, say, marshaling objects in 
>> an
>> XPC context (where both sides are aware of the format, are running 
>> the same
>> version of the same code, on the same device) — keys waste time and
>> bandwidth unnecessarily in some cases.
>>
>> Integer keys don’t necessarily imply ordered encoding, however. 
>> There are
>> binary encoding formats which support integer-keyed dictionaries 
>> (read:
>> serialized hash maps) which are more efficient to encode and decode 
>> than
>> similar string-keyed ones. In that case, as long as integer keys are 
>> chosen
>> with care, the end result is more performant.
>>
>> But again, this depends on the application and use case. Defining 
>> integer
>> keys requires manual effort because we want thought put into defining 
>> them;
>> they are indeed fragile when used carelessly.
>>
>> [snip]
>>
>> Keyed Encoding Containers
>>
>> Keyed encoding containers are the primary interface that most Codable
>> types interact with for encoding and decoding. Through these, Codable 
>> types
>> have strongly-keyed access to encoded data by using keys that are
>> semantically correct for the operations they want to express.
>>
>> Since semantically incompatible keys will rarely (if ever) share the 
>> same
>> key type, it is impossible to mix up key types within the same 
>> container
>> (as is possible with Stringkeys), and since the type is known 
>> statically,
>> keys get autocompletion by the compiler.
>>
>> open class KeyedEncodingContainer<Key : CodingKey> {
>>
>> Like others, I’m a little bummed about this part of the design. 
>> Your
>> reasoning up-thread is sound, but I chafe a bit on having to 
>> reabstract and
>> a little more on having to be a reference type. Particularly knowing 
>> that
>> it’s got a bit more overhead involved… I /like/ that 
>> NSKeyedArchiver can
>> simply push some state and pass itself as the next encoding container 
>> down
>> the stack.
>>
>> There’s not much more to be said about why this is a class that I 
>> haven’t
>> covered; if it were possible to do otherwise at the moment, then we 
>> would.
>>
>> It is possible using a manually written type-erased wrapper along the
>> lines of AnySequence and AnyCollection.  I don’t recall seeing a 
>> rationale
>> for why you don’t want to go this route.  I would still like to 
>> hear more
>> on this topic.
>>
>> As for *why* we do this — this is the crux of the whole API. We not 
>> only
>> want to make it easy to use a custom key type that is semantically 
>> correct
>> for your type, we want to make it difficult to do the easy but 
>> incorrect
>> thing. From experience with NSKeyedArchiver, we’d like to move away 
>> from
>> unadorned string (and integer) keys, where typos and accidentally 
>> reused
>> keys are common, and impossible to catch statically.
>> encode<T : Codable>(_: T?, forKey: String) unfortunately not only
>> encourages code like encode(foo, forKey: "foi") // whoops, typo, it 
>> is *more
>> difficult* to use a semantic key type: encode(foo, forKey:
>> CodingKeys.foo.stringValue). The additional typing and lack of
>> autocompletion makes it an active disincentive. encode<T : 
>> Codable>(_:
>> T?, forKey: Key) reverses both of these — it makes it impossible to 
>> use
>> unadorned strings or accidentally use keys from another type, and 
>> nets
>> shorter code with autocompletion: encode(foo, forKey: .foo)
>>
>> The side effect of this being the fact that keyed containers are 
>> classes
>> is suboptimal, I agree, but necessary.
>>
>>
>> open func encode<Value : Codable>(_ value: Value?, forKey key: Key) 
>> throws
>>
>> Does this win anything over taking a Codable?
>>
>> Taking the concrete type over an existential allows for static 
>> dispatch on
>> the type within the implementation, and is a performance win in some 
>> cases.
>>
>> open func encode(_ value: Bool?, forKey key: Key) throws
>> open func encode(_ value: Int?, forKey key: Key) throws
>> open func encode(_ value: Int8?, forKey key: Key) throws
>> open func encode(_ value: Int16?, forKey key: Key) throws
>> open func encode(_ value: Int32?, forKey key: Key) throws
>> open func encode(_ value: Int64?, forKey key: Key) throws
>> open func encode(_ value: UInt?, forKey key: Key) throws
>> open func encode(_ value: UInt8?, forKey key: Key) throws
>> open func encode(_ value: UInt16?, forKey key: Key) throws
>> open func encode(_ value: UInt32?, forKey key: Key) throws
>> open func encode(_ value: UInt64?, forKey key: Key) throws
>> open func encode(_ value: Float?, forKey key: Key) throws
>> open func encode(_ value: Double?, forKey key: Key) throws
>> open func encode(_ value: String?, forKey key: Key) throws
>> open func encode(_ value: Data?, forKey key: Key) throws
>>
>> What is the motivation behind abandoning the idea of “primitives” 
>> from the
>> Alternatives Considered? Performance? Being unable to close the 
>> protocol?
>>
>> Being unable to close the protocol is the primary reason. Not being 
>> able
>> to tell at a glance what the concrete types belonging to this set are 
>> is
>> related, and also a top reason.
>>
>> Looks like we have another strong motivating use case for closed
>> protocols.  I hope that will be in scope for Swift 5.
>>
>> It would be great for the auto-generated documentation and 
>> “headers" to
>> provide a list of all public or open types inheriting from a closed 
>> class
>> or conforming to a closed protocol (when we get them).  This would go 
>> a
>> long way towards addressing your second reason.
>>
>>
>> What ways is encoding a value envisioned to fail? I understand 
>> wanting to
>> allow maximum flexibility, and being symmetric to `decode` throwing, 
>> but
>> there are plenty of “conversion” patterns the are asymmetric in 
>> the ways
>> they can fail (Date formatters, RawRepresentable,
>> LosslessStringConvertible, etc.).
>>
>> Different formats support different concrete values, even of 
>> primitive
>> types. For instance, you cannot natively encode Double.nan in JSON, 
>> but
>> you can in plist. Without additional options on JSONEncoder, 
>> encode(Double.nan,
>> forKey: …) will throw.
>>
>> /// For `Encoder`s that implement this functionality, this will only
>> encode the given object and associate it with the given key if it 
>> encoded
>> unconditionally elsewhere in the archive (either previously or in the
>> future).
>> open func encodeWeak<Object : AnyObject & Codable>(_ object: Object?,
>> forKey key: Key) throws
>>
>> Is this correct that if I send a Cocoa-style object graph (with weak
>> backrefs), an encoder could infinitely recurse? Or is a coder 
>> supposed to
>> detect that?
>>
>> encodeWeak has a default implementation that calls the regular 
>> encode<T :
>> Codable>(_: T, forKey: Key); only formats which actually support weak
>> backreferencing should override this implementation, so it should 
>> always be
>> safe to call (it will simply unconditionally encode the object by 
>> default).
>>
>> open var codingKeyContext: [CodingKey]
>> }
>> [snippity snip]
>>
>> Alright, those are just my first thoughts. I want to spend a little 
>> time
>> marinating in the code from PR #8124 before I comment further. 
>> Cheers! I
>> owe you, Michael, and Tony a few drinks for sure.
>>
>> Hehe, thanks :)
>>
>> Zach Waldowski
>> zach at waldowski.me
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>>
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170317/91fdf6c5/attachment.html>


More information about the swift-evolution mailing list