[swift-evolution] [Proposal] Foundation Swift Encoders
Brent Royal-Gordon
brent at architechies.com
Tue Apr 4 20:18:13 CDT 2017
> On Apr 4, 2017, at 2:43 PM, Itai Ferber <iferber at apple.com> wrote:
> I like the separation between keyed and unkeyed containers (and I think "unkeyed" is a good name, though not perfect), but I'm not quite happy with the unkeyed container API. Encoding a value into an unkeyed container appends it to the container's end; decoding a value from an unkeyed container removes it from the container's front. These are very important semantics that the method names in question do not imply at all.
>
> I think that consistency of phrasing is really important here, and the action words "encode" and "decode" are even more important to connote than the semantics of pushing and popping.
> (Note that there need not be specific directionality to an unkeyed container as long as the ordering of encoded items is eventually maintained on decode.) But on a practical note, names like encodeAtEnd and decodeFromFront (or similar) don't feel like they communicate anything much more useful than the current encode/decode.
>
Yeah; I stopped short of suggesting specific names because I wasn't totally happy with the ones I could think of. (`appendEncoded` and `decodeNext` were my best ideas, but they don't make a matched pair.)
> Certain aspects of `UnkeyedDecodingContainer` also feel like they do the same things as `Sequence` and `IteratorProtocol`, but in different and incompatible ways. And I certainly think that the `encode(contentsOf:)` methods on `UnkeyedEncodingContainer` could use equivalents on the `UnkeyedDecodingContainer`. Still, the design in this area is much improved compared to the previous iteration.
>
> Which aspects of Sequence and IteratorProtocol do you feel like you're missing on UnkeyedDecodingContainer? Keep in mind that methods on UnkeyedDecodingContainer must be able to throw, and an UnkeyedDecodingContainer can hold heterogeneous items whose type is not known, two things that Sequence and IteratorProtocol do not do.
>
Yeah, that's true. One possibility is to use a closure-scoped block which gets passed a sequence that terminates early if it encounters an error:
self.pets = try encoder.withDecoded(Pet.self) { seq in
return Array(seq)
// If there is an error, `seq` terminates early, and once we return control from
// this closure, `withSequence` will throw it.
}
But that's awkward in a couple of different ways.
> In terms of an equivalent to encode(contentsOf:), keep in mind that this would only work if the collection you're decoding is homogeneous, in which case, you would likely prefer to decode an Array over getting an unkeyed container, no? (As soon as conditional conformance arrives in Swift, we will be able to express extension Array : Decodable where Element : Decodable { ... } making decoding homogeneous arrays trivial.)
>
That's true (and I assumed that `Array`s and friends would be `Codable`—we don't even need to wait for conditional conformances ), but it's hardly unheard of to write your own `Collection` types when you need different semantics.
Swift has two mutation protocols that are important for this purpose: `RangeReplaceableCollection` and `SetAlgebra`. You could provide methods on `UnkeyedDecodingContainer` that work with them:
func decodeAll<C: RangeReplaceableCollection>(_ type: C.Type) throws -> C where C.Iterator.Element: Encodable {
var collection = C()
if let capacity = self.count {
collection.reserveCapacity(capacity)
}
while !self.isAtEnd {
collection.append(try self.decode(C.Iterator.Element.self))
}
}
func decodeAll<S: SetAlgebra>(_ type: S.Type) throws -> S where S.Element: Encodable {
var set = S()
while !self.isAtEnd {
set.insert(try self.decode(C.Iterator.Element.self))
}
}
// Usage:
let array = container.decodeAll(ContiguousArray<Pet>.self)
let set = container.decodeAll(Set<Pet>.self)
Alternatively, you could take a more informal approach, and simply take a function which takes a sequence and returns a value constructed from it. Most collection types have an initializer like that. (Interestingly, this is basically the same as `withDecoded` above, just with a different signature.)
// I'm showing this as public, but you could hide it behind an AnyIterator instead.
public final class DecodingIterator<Element: Encodable>: IteratorProtocol, Sequence {
var container: UnkeyedDecodingContainer
var decodingError: Error?
init(_ container: UnkeyedDecodingContainer) {
self.container = container
}
public func next() {
if decodingError != nil || container.isAtEnd {
return nil
}
do {
return try container.decode(Element.self)
}
catch {
decodingError = error
return nil
}
}
}
public func decodeAll<T: Encodable, Result>(_ makeResult: (DecodingIterator<T>) throws -> Result) throws -> Result {
let iterator = DecodingSequence<T>(self)
let result = try makeResult(iterator)
if let error = iterator.decodingError {
throw error
}
return result
}
// Usage:
let array = container.decodeAll(ContiguousArray<Pet>.init)
let set = container.decodeAll(Set<Pet>.init)
// I'm imagining here a future version of Swift where it's possible to
// make tuples of `Codable` types `Codable` themselves.
let dictionary = container.decodeAll(Dictionary<Pet, Person>.init)
> (Tiny nitpick: I keep finding myself saying "encode into", not "encode to" as the API name suggests. Would that be a better parameter label?)
>
> On a personal note here — I agree with you, and had originally used "into". However, we've reviewed our APIs and more often have balanced from:/to: rather than from:/into: on read/write/streaming calls. We'd like to rein these in a bit and keep them consistent within our naming guidelines, as much as possible.
>
That's fair. Oh well.
> Should there be a way for an `init(from:)` implementation to determine the type of container in the encoder it's just been handed? Or perhaps the better question is, do we want to promise users that all decoders can tell the difference?
>
> I think it would be very rare to need this type of information. If a type wants to encode as an array or as a dictionary conditionally, the context for that would likely be present in userInfo.
>
Yeah, that makes sense. To make that work, though, you might end up having a top-level `Decodable` type read a `version` field or something and write it into the `userInfo` for other types to examine. Nothing necessarily wrong with that, I suppose.
(Hmm...coming back after looking at your `JSONDecoder`, it looks like it might not be possible to do this, or perhaps that it could only be done by storing a reference type into the `userInfo` before you started writing and modifying it during decoding. Is that intentional?)
> If you really must try to decode regardless, you can always try to grab one container type from the decoder, and if it fails, attempt to grab the other container type.
>
But I assume that some decoders might just misinterpret whatever is present? For instance, I could imagine an encoder which doesn't natively support keyed containers, so it fakes it by alternating key and value fields in a flat list; if you wrote with a keyed container and read with an unkeyed container, or vice versa, it would then happily misinterpret whatever was present. So it seems like this would probably be a pattern we'd want to discourage.
> * I think it may make sense to class-constrain some of these protocols. `Encodable` and its containers seem to inherently have reference semantics—otherwise data could never be communicated from all those `encode` calls out to the ultimate caller of the API. Class-constraining would clearly communicate this to both the implementer and the compiler. `Decoder` and its containers don't *inherently* have reference semantics, but I'm not sure it's a good idea to potentially copy around a lot of state in a value type.
>
> I don't think class constraints are necessary. You can take a look at the current implementation of JSONEncoder and JSONDecoder here <https://github.com/itaiferber/swift/blob/3c59bfa749adad2575975e47130b28b731f763e0/stdlib/public/SDK/Foundation/JSONEncoder.swift> (note that this is still a rough implementation and will be updated soon). The model I've followed there is that the encoder itself (_JSONEncoder) has reference semantics, but the containers (_JSONKeyedEncodingContainer, _JSONUnkeyedEncodingContainer) are value-type views into the encoder itself.
>
> Keep in mind that during the encoding process, the entities created most often will be containers. Without some additional optimizations in place, you end up with a lot of small, short-lived class allocations as containers are brought into and out of scope.
> By not requiring the class constraints, it's at least possible to make all those containers value types with references to the shared encoder.
>
Okay, I see what you're getting at. But that still makes me think that:
* `Encoder` and `Decoder` should be class-constrained. An `Encoder` *must* share some kind of reference with its containers, and while it would certainly be possible to hide that reference away somewhere inside a value-typed `Encoder`, I can't imagine it's worth penalties like having to pass a larger existential container (`class`-constrained protocol existentials are 2+ words instead of 5+) and having to virtualize reference-counting operations through a value witness table. (See https://github.com/apple/swift/blob/master/docs/ABI.rst#class-existential-containers for details here.)
* There is no need for the `encode` calls to be marked `mutating`. An encoding container is just a view on state within a reference type, so it doesn't mutate the container itself, only the portion of the encoder the container refers to. (It is not possible for an encoding container to hold value-typed state, either, because the encoder never gets a chance to grab that state once you're finished with it.) While these calls do mutate *some* state *somewhere*, they don't mutate the state of the container.
Basically, what I'm saying is that `Encoder` should always be a reference *type* and encoding containers, even if implemented with a value *type*, should always have reference *semantics*.
(The `decode` calls on an `UnkeyedDecodingContainer` are a harder case, because it really *is* plausible that the "next element" pointer might be stored in the container rather than the decoder. I'm not totally sure what to do there.)
(Oh, one other thing: Does `UnkeyedDecodingContainer.count` return the total number of elements, if known, or the number of elements *remaining*, if known? The latter sounds more useful to me.)
> Having implemented these myself multiple times, I agree — it can be a pain to repeat these implementations, and if you look at the linked implementations above, funneling to one method from all of those is exactly what I do (and in fact, this can be shortened significantly, which I plan on doing soon).
>
> There is a tradeoff here between ease of use for the end consumer of the API, and ease of coding for the writer of a new Encoder/Decoder, and my argument will always be for the benefit of the end consumer. (There will be orders of magnitude more end consumers of this API than those writing new Encoders and Decoders 😉)
>
This is true—we must put the API's consumer first.
> Think of the experience for the consumer of this API, especially someone learning it for the first time. It can already be somewhat of a hurdle to figure out what kind of container you need, but even once you get a keyed container (which is what we want to encourage), then what? You start typing container.enc... and in the autocomplete list in Xcode, the only thing that shows up is one autocomplete result: encode(value: Encodable, forKey: ...) Consider the clarity (or lack thereof) of that, as opposed to seeing encode(value: Int, forKey: ...), encode(value: String, forKey: ...), etc. Given a list of types that users are already familiar with helps immensely with pushing them in the right direction and reducing cognitive load. When you see String in that list, you don't have to question whether it's possible to encode a string or not, you just pick it. I have an Int8, can I encode it? Ah, it's in the list, so I can.
>
This is a very interesting argument that I hadn't considered. But I think it still fails, because the list of primitive types doesn't convey what you want developers to understand about the types they can encode.
Imagine being new to the API. You start writing your first `encode(to:)` method, and you get to this point:
func encode(to encoder: Encoder) throws {
encoder.
You expect to see some way to start putting types in the encoder, but instead you get this list:
[CodingKey?] codingPath
KeyedEncodingContainer<Key> container(keyedBy: CodingKey.Protocol)
SingleValueEncodingContainer singleValueContainer()
[CodingUserInfoKey : Any] userInfo
UnkeyedEncodingContainer unkeyedContainer()
Okay, two of these are obviously just properties, and the other three ask you to choose a "container". Maybe you read some documentation or tutorials, maybe you just guess, but either way, you figure out that you want a keyed container and that you need a CodingKey enum to key it with.
So you make a keyed container, assign it to a variable, and get to this point:
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: CodingKeys.self)
container.
When you ask autocomplete what you can do with your shiny new container, you get this:
[CodingKey?] codingPath
Void encode(value: Bool?, forKey: MyModel.CodingKeys) throws
Void encode(value: Data?, forKey: MyModel.CodingKeys) throws
Void encode(value: Double?, forKey: MyModel.CodingKeys) throws
Void encode(value: Encodable?, forKey: MyModel.CodingKeys) throws
Void encode(value: Float?, forKey: MyModel.CodingKeys) throws
Void encode(value: Int16?, forKey: MyModel.CodingKeys) throws
Void encode(value: Int32?, forKey: MyModel.CodingKeys) throws
Void encode(value: Int64?, forKey: MyModel.CodingKeys) throws
Void encode(value: Int8?, forKey: MyModel.CodingKeys) throws
Void encode(value: Int?, forKey: MyModel.CodingKeys) throws
Void encode(value: String?, forKey: MyModel.CodingKeys) throws
Void encode(value: UInt16?, forKey: MyModel.CodingKeys) throws
Void encode(value: UInt32?, forKey: MyModel.CodingKeys) throws
Void encode(value: UInt64?, forKey: MyModel.CodingKeys) throws
Void encode(value: UInt8?, forKey: MyModel.CodingKeys) throws
Void encode(value: UInt?, forKey: MyModel.CodingKeys) throws
Void encodeWeak(object: (AnyObject &…), forKey: MyModel.CodingKeys) throws
KeyedEncodingContainer<CodingKey> nestedContainer(keyedBy: CodingKey.Protocol, forKey: MyModel.CodingKeys)
UnkeyedEncodingContainer nestedUnkeyedContainer(forKey: MyModel.CodingKeys)
Encoder superEncoder()
Encoder superEncoder(forKey: MyModel.CodingKeys)
I think this list would give a new developer an *entirely* incorrect impression of how you're supposed to encode your type. The list contains *ten* integer types, *two* floating-point types, `Bool`, `Data`, `String`, and, oh yeah, the overload that handles literally everything else. The impression I would get is that this supports only the listed types—why else would we list every variety of integer, for instance?—and the types *users* conform to `Encodable`, if I noticed that overload at all. Maybe the nested containers are for coding nested arrays and dictionaries—there's certainly no reason this list would make me believe I can encode those directly.
If we wanted the overload list to help beginners understand which types `encode(_:forKey:)` supports, I would want it to look more like:
[CodingKey?] codingPath
Void encode(value: [Encodable]?, forKey: MyModel.CodingKeys) throws
Void encode(value: [Encodable & Hashable: Encodable]?, forKey: MyModel.CodingKeys) throws
Void encode(value: Data?, forKey: MyModel.CodingKeys) throws
Void encode(value: Date?, forKey: MyModel.CodingKeys) throws
Void encode(value: Double?, forKey: MyModel.CodingKeys) throws
Void encode(value: Encodable?, forKey: MyModel.CodingKeys) throws
Void encode(value: Int?, forKey: MyModel.CodingKeys) throws
Void encode(value: Set<Encodable & Hashable>?, forKey: MyModel.CodingKeys) throws
Void encode(value: String?, forKey: MyModel.CodingKeys) throws
Void encode(value: URL?, forKey: MyModel.CodingKeys) throws
Void encodeWeak(object: (AnyObject &…), forKey: MyModel.CodingKeys) throws
KeyedEncodingContainer<CodingKey> nestedContainer(keyedBy: CodingKey.Protocol, forKey: MyModel.CodingKeys)
UnkeyedEncodingContainer nestedUnkeyedContainer(forKey: MyModel.CodingKeys)
Encoder superEncoder()
Encoder superEncoder(forKey: MyModel.CodingKeys)
These ten overloads more fairly represent the kinds of Swift- and Foundation-provided encodable types people actually use most often. The presence of `Array`, `Dictionary`, and `Set` convey that encoded types may have a complex inner structure. The absence of all the `Int`/`UInt` varieties does mean it's not immediately obvious that it supports every single one, but Swift guidelines highly discourage use of those types anyway unless you *really* need those semantics for some reason.
Or, alternatively, just offer the one overload:
[CodingKey?] codingPath
Void encode(value: Encodable?, forKey: MyModel.CodingKeys) throws
Void encodeWeak(object: (AnyObject &…), forKey: MyModel.CodingKeys) throws
KeyedEncodingContainer<CodingKey> nestedContainer(keyedBy: CodingKey.Protocol, forKey: MyModel.CodingKeys)
UnkeyedEncodingContainer nestedUnkeyedContainer(forKey: MyModel.CodingKeys)
Encoder superEncoder()
Encoder superEncoder(forKey: MyModel.CodingKeys)
And make sure the little two-sentence blurb at the bottom of the autocomplete list mentions some of the compatible types:
Encodes the given value for the given key. Built-in Encodable types include most basic value types in the standard library and Foundation, such as Int, Double, String, Date, and URL, plus Array, Dictionary, and Set when their elements are Encodable.
> Even for advanced users of the API, though, there's something to be said for static guarantees in the overloading. As someone familiar with the API (who might even know all the primitives by heart), I might wonder if the Encoder I'm using has correctly switched on the generic type. (It would have to be a dynamic switch.) Did the implementer remember to switch on Int16 correctly? Or did they forget it and will I be falling into a generic case which is not appropriate here?
>
I get that, which is why I'm *not* suggesting going to `switch`; I'm suggesting that primitives all be handled thorough single-value containers. An advanced user would know that all of the single-value container's types were explicitly implemented by every encoder.
> Let's take a step back, though. This is mostly annoying to implement because of the repetition, right? If someone were to put together a proposal for a proper macro system in Swift, which is really what we want here, I wouldn't be sad. 😉
>
Don't tempt me!
> This is already unfortunately a no-go. As mentioned in other emails, you cannot override an associatetype in a subclass of a class, which means that you cannot require a different container type than your superclass. This is especially problematic in the default case where we'd want to encourage types to use keyed containers — every type should have its own keys, and you'd need to have a different keyed container than your parent, keyed on your keys.
>
Ugh, that's too bad. Class inheritance ruins everything, as usual.
> This might address my last point above, but then what useful interface would EncodingSink and DecodingSource have if a type conforming to EncodingSink could be any one of the containers or even a whole encoder itself?
>
`EncodingSink`'s members are used by the encoder to convert itself into whatever type a given `Encodable` type's `encode(to:)` method wants to write itself into. The `encode(to:)` method itself knows which concrete type it wants to use, and uses the interface of that concrete type to do its work.
(But as you said, the associatedtype issue prevents this approach.)
> I am curious, though, about your comment above on preconditions being unenforceable, because this is certainly something we would want to hammer out before releasing. What cases are you thinking of that are unenforceable?
>
I'm just imagining somebody thinking they can construct several different containers from the same encoder—for instance, making a keyed and an unkeyed container that are peers to one another, or making two keyed containers with different coding key types (with the encoder preventing collisions by including the coding key type alongside the stringValue). When I was first reading the proposal, I was under the impression that coding key types would prevent collisions between, for instance, superclass and subclass keys; I realized a little later in my read-through that you weren't taking that approach, but I could see someone else making the same mistake and not realizing it until they'd already written a lot of code.
* * *
Apropos of none of this discussion so far, one other thing I just noticed: `JSONEncoder` and `JSONDecoder` have a lot of parallel "strategy" properties, but they use different types for them. Have you considered making a single `JSONCodingStrategy` type which expresses bidirectional rules? That would allow a developer to make a constant containing their strategy and use it as a single source of truth.
--
Brent Royal-Gordon
Architechies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170404/540405cf/attachment.html>
More information about the swift-evolution
mailing list