[swift-evolution] [Proposal] Foundation Swift Encoders

Tue Apr 4 16:43:51 CDT 2017

Hi Brent,

Thanks for your comments and thorough review! :)
Responses inline.

On 4 Apr 2017, at 1:57, Brent Royal-Gordon wrote:

>> On Apr 3, 2017, at 1:31 PM, Itai Ferber via swift-evolution 
>> <swift-evolution at swift.org> wrote:
>> Hi everyone,
>>
>> With feedback from swift-evolution and additional internal review, 
>> we've pushed updates to this proposal, and to the Swift Archival & 
>> Serialization proposal.
>> Changes to here mostly mirror the ones made to Swift Archival & 
>> Serialization, but you can see a specific diff of what's changed 
>> here. Full content below.
>>
>> We'll be looking to start the official review process very soon, so 
>> we're interested in any additional feedback.
>>
>> Thanks!
>>
>> — Itai
>
> This is a good revision to a good proposal.
>
> I'm glad `CodingKey`s now require `stringValue`s; I think the intended 
> semantics are now a lot clearer, and key behavior will be much more 
> reliable.
Agreed

> I like the separation between keyed and unkeyed containers (and I 
> think "unkeyed" is a good name, though not perfect), but I'm not quite 
> happy with the unkeyed container API. Encoding a value into an unkeyed 
> container appends it to the container's end; decoding a value from an 
> unkeyed container removes it from the container's front. These are 
> very important semantics that the method names in question do not 
> imply at all.
I think that consistency of phrasing is really important here, and the 
action words "encode" and "decode" are even more important to connote 
than the semantics of pushing and popping.
(Note that there need not be specific directionality to an unkeyed 
container as long as the ordering of encoded items is eventually 
maintained on decode.) But on a practical note, names like `encodeAtEnd` 
and `decodeFromFront` (or similar) don't feel like they communicate 
anything much more useful than the current `encode`/`decode`.

> Certain aspects of `UnkeyedDecodingContainer` also feel like they do 
> the same things as `Sequence` and `IteratorProtocol`, but in different 
> and incompatible ways. And I certainly think that the 
> `encode(contentsOf:)` methods on `UnkeyedEncodingContainer` could use 
> equivalents on the `UnkeyedDecodingContainer`. Still, the design in 
> this area is much improved compared to the previous iteration.
Which aspects of `Sequence` and `IteratorProtocol` do you feel like 
you're missing on `UnkeyedDecodingContainer`? Keep in mind that methods 
on `UnkeyedDecodingContainer` must be able to throw, and an 
`UnkeyedDecodingContainer` can hold heterogeneous items whose type is 
not known, two things that `Sequence` and `IteratorProtocol` do not do.

In terms of an equivalent to `encode(contentsOf:)`, keep in mind that 
this would only work if the collection you're decoding is homogeneous, 
in which case, you would likely prefer to decode an `Array` over getting 
an unkeyed container, no? (As soon as conditional conformance arrives in 
Swift, we will be able to express `extension Array : Decodable where 
Element : Decodable { ... }` making decoding homogeneous arrays 
trivial.)

> (Tiny nitpick: I keep finding myself saying "encode into", not "encode 
> to" as the API name suggests. Would that be a better parameter label?)
On a personal note here — I agree with you, and had originally used 
"into". However, we've reviewed our APIs and more often have balanced 
`from:/to:` rather than `from:/into:` on read/write/streaming calls. 
We'd like to rein these in a bit and keep them consistent within our 
naming guidelines, as much as possible.

> I like the functionality of the `userInfo` dictionary, but I'm still 
> not totally satisfied casting out of `Any` all the time. I might just 
> have to get over that, though.
I think this is the closest we can get to a pragmatic balance between 
dynamic needs and static guarantees. :)

> I wonder if `CodingKey` implementations might ever need access to the 
> `userInfo`. I suppose you can just switch to a different set of 
> `CodingKeys` if you do.
I don't think `CodingKey` should ever know about `userInfo` — 
`CodingKey`s should be inert data. If you need to, use the `userInfo` to 
switch to a different set of keys, as you mention.

> Should there be a way for an `init(from:)` implementation to determine 
> the type of container in the encoder it's just been handed? Or perhaps 
> the better question is, do we want to promise users that all decoders 
> can tell the difference?
I think it would be very rare to need this type of information. If a 
type wants to encode as an array or as a dictionary conditionally, the 
context for that would likely be present in `userInfo`.
If you really must try to decode regardless, you can always try to grab 
one container type from the decoder, and if it fails, attempt to grab 
the other container type.

> * * *
>
> I went ahead and implemented a basic version of `Encoder` and 
> `Encodable` in a Swift 3 playground, just to get a feel for this 
> system in action and experiment with a few things. A few observations:
Lots to unpack here, let's go one by one. :)

> * I think it may make sense to class-constrain some of these 
> protocols. `Encodable` and its containers seem to inherently have 
> reference semantics—otherwise data could never be communicated from 
> all those `encode` calls out to the ultimate caller of the API. 
> Class-constraining would clearly communicate this to both the 
> implementer and the compiler. `Decoder` and its containers don't 
> *inherently* have reference semantics, but I'm not sure it's a good 
> idea to potentially copy around a lot of state in a value type.
I don't think class constraints are necessary. You can take a look at 
the current implementation of `JSONEncoder` and `JSONDecoder` 
[here](https://github.com/itaiferber/swift/blob/3c59bfa749adad2575975e47130b28b731f763e0/stdlib/public/SDK/Foundation/JSONEncoder.swift) 
(note that this is still a rough implementation and will be updated 
soon). The model I've followed there is that the encoder itself 
(`_JSONEncoder`) has reference semantics, but the containers 
(`_JSONKeyedEncodingContainer`, `_JSONUnkeyedEncodingContainer`) are 
value-type views into the encoder itself.

Keep in mind that during the encoding process, the entities created most 
often will be containers. Without some additional optimizations in 
place, you end up with a _lot_ of small, short-lived class allocations 
as containers are brought into and out of scope.
By not requiring the class constraints, it's at least possible to make 
all those containers value types with references to the shared encoder.

> * I really think that including overloads for every primitive type in 
> all three container types is serious overkill. In my implementation, 
> the primitive types' `Encodable` conformances simply request a 
> `SingleValueEncodingContainer` and write themselves into it. I can't 
> imagine any coder doing anything in their overloads that wouldn't be 
> compatible with that, especially since they can never be sure when 
> someone will end up using the `Encodable` conformance directly instead 
> of the primitive. So what are all these overloads buying us? Are they 
> just avoiding a generic dispatch and the creation of a new `Encoder` 
> and perhaps a `SingleValueEncodingContainer`? I don't think that's 
> worth the increased API surface, the larger overload sets, or the 
> danger that an encoder might accidentally implement one of the 
> duplicative primitive encoding calls inconsistently with the others.
>
> To be clear: In my previous comments, I suggested that we should 
> radically reduce the number of primitive types. That is not what I'm 
> saying here. I'm saying that we should always use a single value 
> container to encode and decode primitives, and the other container 
> types should always use `Encodable` or `Decodable`. This doesn't 
> reduce the capabilities of the system at all; it just means you only 
> have to write the code to handle a given primitive type one time 
> instead of three.
Having implemented these myself multiple times, I agree — it can be a 
pain to repeat these implementations, and if you look at the linked 
implementations above, funneling to one method from all of those is 
exactly what I do (and in fact, this can be shortened significantly, 
which I plan on doing soon).

There is a tradeoff here between ease of use for the end consumer of the 
API, and ease of coding for the writer of a new `Encoder`/`Decoder`, and 
my argument will always be for the benefit of the end consumer. (There 
will be orders of magnitude more end consumers of this API than those 
writing new `Encoder`s and `Decoder`s 😉)
Think of the experience for the consumer of this API, especially someone 
learning it for the first time. It can already be somewhat of a hurdle 
to figure out what kind of container you need, but even once you get a 
keyed container (which is what we want to encourage), then what? You 
start typing `container.enc...` and in the autocomplete list in Xcode, 
the only thing that shows up is one autocomplete result: `encode(value: 
Encodable, forKey: ...)` Consider the clarity (or lack thereof) of that, 
as opposed to seeing `encode(value: Int, forKey: ...)`, `encode(value: 
String, forKey: ...)`, etc. Given a list of types that users are already 
familiar with helps immensely with pushing them in the right direction 
and reducing cognitive load. When you see `String` in that list, you 
don't have to question whether it's possible to encode a string or not, 
you just pick it. I have an `Int8`, can I encode it? Ah, it's in the 
list, so I can.

Even for advanced users of the API, though, there's something to be said 
for static guarantees in the overloading. As someone familiar with the 
API (who might even know all the primitives by heart), I might wonder if 
the `Encoder` I'm using has correctly switched on the generic type. (It 
would have to be a dynamic switch.) Did the implementer remember to 
switch on `Int16` correctly? Or did they forget it and will I be falling 
into a generic case which is not appropriate here?

When it comes to overloads vs. dynamically switching on a generic type, 
I think we would generally prefer the static type safety. As a consumer 
of the API I want to be sure that the implementer of the `Encoder` I'm 
using was aware of these primitive types in some way, and that the 
compiler helped them too to make sure they didn't, say, forget to switch 
on `Data.self`. As a writer of `Encoder`s, yes, this is a pain, but a 
sacrifice I'm willing to make for the sake of the consumer.

Let's take a step back, though. This is mostly annoying to implement 
because of the repetition, right? If someone were to put together a 
proposal for a proper macro system in Swift, which is _really_ what we 
want here, I wouldn't be sad. 😉

> * And then there's the big idea: Changing the type of the parameter to 
> `encode(to:)` and `init(from:)`.
>
> ***
>
> While working with the prototype, I realized that the vast majority of 
> conformances will immediately make a container and then never use the 
> `encoder` or `decoder` again. I also noticed that it's illegal to 
> create more than one container from the same coder, and there are 
> unenforceable preconditions to that effect. So I'm wondering if it 
> would make sense to not pass the coder at all, but instead have the 
> conforming type declare what kind of container it wants:
>
> 	extension Pet: Codable {
> 		init(from container: KeyedDecodingContainer<CodingKeys>) throws {
> 			name = try container.decode(String.self, forKey: .name)
> 			age = try container.decode(Int.self, forKey: .age)
> 		}
> 		
> 		func encode(to container: KeyedEncodingContainer<CodingKeys>) throws 
> {
> 			try container.encode(name, forKey: .name)
> 			try container.encode(age, forKey: .age)
> 		}
> 	}
>
> 	extension Array: Encodable where Element: Encodable {
> 		init(from container: UnkeyedDecodingContainer) throws {
> 			self.init()
> 			while !container.isAtEnd {
> 				append(try container.decode(Element.self))
> 			}
> 		}
> 		
> 		func encode(to container: UnkeyedEncodingContainer) throws {
> 			container.encode(contentsOf: self)
> 		}
> 	}
>
> I think this could be implemented by doing the following:
>
> 	1. Adding an associated type to `Encodable` and `Decodable` for the 
> type passed to `encode(to:)`/`init(from:)`.
This is already unfortunately a no-go. As mentioned in other emails, you 
cannot override an `associatetype` in a subclass of a class, which means 
that you cannot require a different container type than your superclass. 
This is especially problematic in the default case where we'd want to 
encourage types to use keyed containers — every type should have its 
own keys, and you'd need to have a different keyed container than your 
parent, keyed on your keys.

Along with that, since the `typealias` would have to be at least as 
visible as your type (potentially `public`), it would necessitate that 
your key type would be at least as public as your type as well. This 
would expose your type's coding keys, which is prohibitive. (Consider 
what this would mean for frameworks, for instance.)

Finally, this also means that you could not request different container 
types based on context — a type could not offer both a dictionary 
representation and a more efficient array representation, since it can 
only statically request one container type.

> 	2. Creating protocols for the types that are permitted there. Call 
> them `EncodingSink` and `DecodingSource` for now.
>
> 	3. Creating *simple* type-erased wrappers for the `Unkeyed*Container` 
> and `SingleValue*Container` protocols and conforming them to 
> `EncodingSink` and `DecodingSource`. These wouldn't need the full 
> generic-subclass dance usually used for type-erased wrappers; they 
> just exist so you can strap initializers to them. In a future version 
> of Swift which allowed initializers on existentials, we could probably 
> get rid of them.
>
> (Incidentally, if our APIs always return a type-erased wrapper around 
> the `Keyed*ContainerProtocol` types, there's no actual need for the 
> underlying protocols to have a `Key` associated type; they can use 
> `CodingKey` existentials and depend on the wrapper to enforce the 
> strong key typing. That would allow us to use a simple type-erased 
> wrapper for `Keyed*Container`, too.)
>
> 	4. For advanced use cases where you really *do* need to access the 
> encoder in order to decide which container type to use, we would also 
> need to create a simple type-erased wrapper around `Encoder` and 
> `Decoder` themselves, conforming them to the `Sink`/`Source` 
> protocols.
This might address my last point above, but then what useful interface 
would `EncodingSink` and `DecodingSource` have if a type conforming to 
`EncodingSink` could be any one of the containers or even a whole 
encoder itself?

> 	5. The Source/Sink parameter would need to be `inout`, unless we *do* 
> end up class-constraining things. (My prototype didn't.)
>
> There are lots of little details that change too, but these are the 
> broad strokes.
>
> Although this technically introduces more types, I think it actually 
> simplifies the design for people who are just using the `Codable` 
> protocol. All they have to know about is the `Codable` protocol, the 
> magic `CodingKeys` type, the three container types (realistically, 
> probably just the `KeyedEncoding/DecodingContainer`), and the 
> top-level encoders they want to use. Most users should never need to 
> know about the members of the `Encoder` protocol; few even need to 
> know about the other two container types. They don't need to do the 
> "create a container" dance. The thing would just work with a minimum 
> of fuss.
>
> Meanwhile, folks who write encoders *do* deal with a bit more 
> complexity, but only because they have to be aware of more type-erased 
> wrappers. In other respects, it's simpler for them, too. Keyed 
> containers don't need to be generic, and they have a layer of 
> Foundation-provided wrappers above them that can help enforce good 
> behavior and (probably) hide the implementation a little bit more. I 
> think that overall, it's probably better for them, too.
>
> Thoughts?
For what it's worth, the way to introduce these three different types of 
encoding _without_ the use of associated types is to split the `Codable` 
protocol up into three protocols, which [we've tried in the 
past](https://github.com/itaiferber/swift-evolution/blob/swift-archival-serialization/proposals/XXXX-swift-archival-serialization.md#alternatives-considered) 
(bullet #4). Unfortunately, the results are not great — an even bigger 
explosion of types, overloads, etc.

While I agree that the current approach of dynamically requesting 
containers is, well, dynamic, the benefit of not exposing your keys 
publicly and allowing encoding of classes is a big win in comparison, I 
think.

I am curious, though, about your comment above on preconditions being 
unenforceable, because this is certainly something we would want to 
hammer out before releasing. What cases are you thinking of that are 
unenforceable?

> -- 
> Brent Royal-Gordon
> Architechies

Again, thanks for your thorough review! Looking forward to further 
comments. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170404/6eba5050/attachment.html>