[swift-evolution] [Proposal] Foundation Swift Encoders

Tue Apr 4 03:57:09 CDT 2017

> On Apr 3, 2017, at 1:31 PM, Itai Ferber via swift-evolution <swift-evolution at swift.org> wrote:
> Hi everyone,
> 
> With feedback from swift-evolution and additional internal review, we've pushed updates to this proposal, and to the Swift Archival & Serialization proposal.
> Changes to here mostly mirror the ones made to Swift Archival & Serialization, but you can see a specific diff of what's changed here. Full content below.
> 
> We'll be looking to start the official review process very soon, so we're interested in any additional feedback.
> 
> Thanks!
> 
> — Itai

This is a good revision to a good proposal.

I'm glad `CodingKey`s now require `stringValue`s; I think the intended semantics are now a lot clearer, and key behavior will be much more reliable.

I like the separation between keyed and unkeyed containers (and I think "unkeyed" is a good name, though not perfect), but I'm not quite happy with the unkeyed container API. Encoding a value into an unkeyed container appends it to the container's end; decoding a value from an unkeyed container removes it from the container's front. These are very important semantics that the method names in question do not imply at all. Certain aspects of `UnkeyedDecodingContainer` also feel like they do the same things as `Sequence` and `IteratorProtocol`, but in different and incompatible ways. And I certainly think that the `encode(contentsOf:)` methods on `UnkeyedEncodingContainer` could use equivalents on the `UnkeyedDecodingContainer`. Still, the design in this area is much improved compared to the previous iteration.

(Tiny nitpick: I keep finding myself saying "encode into", not "encode to" as the API name suggests. Would that be a better parameter label?)

I like the functionality of the `userInfo` dictionary, but I'm still not totally satisfied casting out of `Any` all the time. I might just have to get over that, though.

I wonder if `CodingKey` implementations might ever need access to the `userInfo`. I suppose you can just switch to a different set of `CodingKeys` if you do.

Should there be a way for an `init(from:)` implementation to determine the type of container in the encoder it's just been handed? Or perhaps the better question is, do we want to promise users that all decoders can tell the difference?

* * *

I went ahead and implemented a basic version of `Encoder` and `Encodable` in a Swift 3 playground, just to get a feel for this system in action and experiment with a few things. A few observations:

* I think it may make sense to class-constrain some of these protocols. `Encodable` and its containers seem to inherently have reference semantics—otherwise data could never be communicated from all those `encode` calls out to the ultimate caller of the API. Class-constraining would clearly communicate this to both the implementer and the compiler. `Decoder` and its containers don't *inherently* have reference semantics, but I'm not sure it's a good idea to potentially copy around a lot of state in a value type.

* I really think that including overloads for every primitive type in all three container types is serious overkill. In my implementation, the primitive types' `Encodable` conformances simply request a `SingleValueEncodingContainer` and write themselves into it. I can't imagine any coder doing anything in their overloads that wouldn't be compatible with that, especially since they can never be sure when someone will end up using the `Encodable` conformance directly instead of the primitive. So what are all these overloads buying us? Are they just avoiding a generic dispatch and the creation of a new `Encoder` and perhaps a `SingleValueEncodingContainer`? I don't think that's worth the increased API surface, the larger overload sets, or the danger that an encoder might accidentally implement one of the duplicative primitive encoding calls inconsistently with the others.

To be clear: In my previous comments, I suggested that we should radically reduce the number of primitive types. That is not what I'm saying here. I'm saying that we should always use a single value container to encode and decode primitives, and the other container types should always use `Encodable` or `Decodable`. This doesn't reduce the capabilities of the system at all; it just means you only have to write the code to handle a given primitive type one time instead of three.

* And then there's the big idea: Changing the type of the parameter to `encode(to:)` and `init(from:)`.

***

While working with the prototype, I realized that the vast majority of conformances will immediately make a container and then never use the `encoder` or `decoder` again. I also noticed that it's illegal to create more than one container from the same coder, and there are unenforceable preconditions to that effect. So I'm wondering if it would make sense to not pass the coder at all, but instead have the conforming type declare what kind of container it wants:

	extension Pet: Codable {
		init(from container: KeyedDecodingContainer<CodingKeys>) throws {
			name = try container.decode(String.self, forKey: .name)
			age = try container.decode(Int.self, forKey: .age)
		}

		func encode(to container: KeyedEncodingContainer<CodingKeys>) throws {
			try container.encode(name, forKey: .name)
			try container.encode(age, forKey: .age)
		}
	}

	extension Array: Encodable where Element: Encodable {
		init(from container: UnkeyedDecodingContainer) throws {
			self.init()
			while !container.isAtEnd {
				append(try container.decode(Element.self))
			}
		}

		func encode(to container: UnkeyedEncodingContainer) throws {
			container.encode(contentsOf: self)
		}
	}

I think this could be implemented by doing the following:

	1. Adding an associated type to `Encodable` and `Decodable` for the type passed to `encode(to:)`/`init(from:)`.

	2. Creating protocols for the types that are permitted there. Call them `EncodingSink` and `DecodingSource` for now.

	3. Creating *simple* type-erased wrappers for the `Unkeyed*Container` and `SingleValue*Container` protocols and conforming them to `EncodingSink` and `DecodingSource`. These wouldn't need the full generic-subclass dance usually used for type-erased wrappers; they just exist so you can strap initializers to them. In a future version of Swift which allowed initializers on existentials, we could probably get rid of them.

(Incidentally, if our APIs always return a type-erased wrapper around the `Keyed*ContainerProtocol` types, there's no actual need for the underlying protocols to have a `Key` associated type; they can use `CodingKey` existentials and depend on the wrapper to enforce the strong key typing. That would allow us to use a simple type-erased wrapper for `Keyed*Container`, too.)

	4. For advanced use cases where you really *do* need to access the encoder in order to decide which container type to use, we would also need to create a simple type-erased wrapper around `Encoder` and `Decoder` themselves, conforming them to the `Sink`/`Source` protocols.

	5. The Source/Sink parameter would need to be `inout`, unless we *do* end up class-constraining things. (My prototype didn't.)

There are lots of little details that change too, but these are the broad strokes.

Although this technically introduces more types, I think it actually simplifies the design for people who are just using the `Codable` protocol. All they have to know about is the `Codable` protocol, the magic `CodingKeys` type, the three container types (realistically, probably just the `KeyedEncoding/DecodingContainer`), and the top-level encoders they want to use. Most users should never need to know about the members of the `Encoder` protocol; few even need to know about the other two container types. They don't need to do the "create a container" dance. The thing would just work with a minimum of fuss.

Meanwhile, folks who write encoders *do* deal with a bit more complexity, but only because they have to be aware of more type-erased wrappers. In other respects, it's simpler for them, too. Keyed containers don't need to be generic, and they have a layer of Foundation-provided wrappers above them that can help enforce good behavior and (probably) hide the implementation a little bit more. I think that overall, it's probably better for them, too.

Thoughts?

-- 
Brent Royal-Gordon
Architechies