[swift-evolution] [Proposal] Foundation Swift Encoders

Wed Apr 5 15:44:21 CDT 2017

> On 5 Apr 2017, at 19:04, Tony Parker <anthony.parker at apple.com> wrote:
> 
> Hi David,
> 
>> On Apr 4, 2017, at 10:33 PM, David Hart via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> 
>> Very interesting discussion below. Here are a few more points:
>> 
>> Sent from my iPhone
>> On 4 Apr 2017, at 23:43, Itai Ferber via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>> 
>>> Hi Brent,
>>> 
>>> Thanks for your comments and thorough review! :)
>>> Responses inline.
>>> 
>>> On 4 Apr 2017, at 1:57, Brent Royal-Gordon wrote:
>>> 
>>> 
>>> On Apr 3, 2017, at 1:31 PM, Itai Ferber via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
>>> Hi everyone,
>>> 
>>> With feedback from swift-evolution and additional internal review, we've pushed updates to this proposal, and to the Swift Archival & Serialization proposal.
>>> Changes to here mostly mirror the ones made to Swift Archival & Serialization, but you can see a specific diff of what's changed here. Full content below.
>>> 
>>> We'll be looking to start the official review process very soon, so we're interested in any additional feedback.
>>> 
>>> Thanks!
>>> 
>>> — Itai
>>> 
>>> This is a good revision to a good proposal.
>>> 
>>> I'm glad `CodingKey`s now require `stringValue`s; I think the intended semantics are now a lot clearer, and key behavior will be much more reliable.
>>> 
>>> Agreed
>>> 
>>> 
>>> I like the separation between keyed and unkeyed containers (and I think "unkeyed" is a good name, though not perfect), but I'm not quite happy with the unkeyed container API. Encoding a value into an unkeyed container appends it to the container's end; decoding a value from an unkeyed container removes it from the container's front. These are very important semantics that the method names in question do not imply at all.
>>> 
>>> I think that consistency of phrasing is really important here, and the action words "encode" and "decode" are even more important to connote than the semantics of pushing and popping.
>>> (Note that there need not be specific directionality to an unkeyed container as long as the ordering of encoded items is eventually maintained on decode.) But on a practical note, names like encodeAtEnd and decodeFromFront (or similar) don't feel like they communicate anything much more useful than the current encode/decode.
>>> 
>>> 
>>> Certain aspects of `UnkeyedDecodingContainer` also feel like they do the same things as `Sequence` and `IteratorProtocol`, but in different and incompatible ways. And I certainly think that the `encode(contentsOf:)` methods on `UnkeyedEncodingContainer` could use equivalents on the `UnkeyedDecodingContainer`. Still, the design in this area is much improved compared to the previous iteration.
>>> 
>>> Which aspects of Sequence and IteratorProtocol do you feel like you're missing on UnkeyedDecodingContainer? Keep in mind that methods on UnkeyedDecodingContainer must be able to throw, and an UnkeyedDecodingContainercan hold heterogeneous items whose type is not known, two things that Sequence and IteratorProtocol do not do.
>>> 
>>> In terms of an equivalent to encode(contentsOf:), keep in mind that this would only work if the collection you're decoding is homogeneous, in which case, you would likely prefer to decode an Array over getting an unkeyed container, no? (As soon as conditional conformance arrives in Swift, we will be able to express extension Array : Decodable where Element : Decodable { ... } making decoding homogeneous arrays trivial.)
>>> 
>>> 
>>> (Tiny nitpick: I keep finding myself saying "encode into", not "encode to" as the API name suggests. Would that be a better parameter label?)
>>> 
>>> On a personal note here — I agree with you, and had originally used "into". However, we've reviewed our APIs and more often have balanced from:/to: rather than from:/into: on read/write/streaming calls. We'd like to rein these in a bit and keep them consistent within our naming guidelines, as much as possible.
>>> 
>>> 
>>> I like the functionality of the `userInfo` dictionary, but I'm still not totally satisfied casting out of `Any` all the time. I might just have to get over that, though.
>>> 
>>> I think this is the closest we can get to a pragmatic balance between dynamic needs and static guarantees. :)
>>> 
>>> 
>>> I wonder if `CodingKey` implementations might ever need access to the `userInfo`. I suppose you can just switch to a different set of `CodingKeys` if you do.
>>> 
>>> I don't think CodingKey should ever know about userInfo — CodingKeys should be inert data. If you need to, use the userInfo to switch to a different set of keys, as you mention.
>>> 
>>> 
>>> Should there be a way for an `init(from:)` implementation to determine the type of container in the encoder it's just been handed? Or perhaps the better question is, do we want to promise users that all decoders can tell the difference?
>>> 
>>> I think it would be very rare to need this type of information. If a type wants to encode as an array or as a dictionary conditionally, the context for that would likely be present in userInfo.
>>> If you really must try to decode regardless, you can always try to grab one container type from the decoder, and if it fails, attempt to grab the other container type.
>>> 
>>> 
>>> * * *
>>> 
>>> I went ahead and implemented a basic version of `Encoder` and `Encodable` in a Swift 3 playground, just to get a feel for this system in action and experiment with a few things. A few observations:
>>> 
>>> Lots to unpack here, let's go one by one. :)
>>> 
>>> 
>>> * I think it may make sense to class-constrain some of these protocols. `Encodable` and its containers seem to inherently have reference semantics—otherwise data could never be communicated from all those `encode` calls out to the ultimate caller of the API. Class-constraining would clearly communicate this to both the implementer and the compiler. `Decoder` and its containers don't *inherently* have reference semantics, but I'm not sure it's a good idea to potentially copy around a lot of state in a value type.
>>> 
>>> I don't think class constraints are necessary. You can take a look at the current implementation of JSONEncoder and JSONDecoder here <https://github.com/itaiferber/swift/blob/3c59bfa749adad2575975e47130b28b731f763e0/stdlib/public/SDK/Foundation/JSONEncoder.swift> (note that this is still a rough implementation and will be updated soon). The model I've followed there is that the encoder itself (_JSONEncoder) has reference semantics, but the containers (_JSONKeyedEncodingContainer, _JSONUnkeyedEncodingContainer) are value-type views into the encoder itself.
>>> 
>>> Keep in mind that during the encoding process, the entities created most often will be containers. Without some additional optimizations in place, you end up with a lot of small, short-lived class allocations as containers are brought into and out of scope.
>>> By not requiring the class constraints, it's at least possible to make all those containers value types with references to the shared encoder.
>>> 
>>> 
>>> * I really think that including overloads for every primitive type in all three container types is serious overkill. In my implementation, the primitive types' `Encodable` conformances simply request a `SingleValueEncodingContainer` and write themselves into it. I can't imagine any coder doing anything in their overloads that wouldn't be compatible with that, especially since they can never be sure when someone will end up using the `Encodable` conformance directly instead of the primitive. So what are all these overloads buying us? Are they just avoiding a generic dispatch and the creation of a new `Encoder` and perhaps a `SingleValueEncodingContainer`? I don't think that's worth the increased API surface, the larger overload sets, or the danger that an encoder might accidentally implement one of the duplicative primitive encoding calls inconsistently with the others.
>>> 
>>> To be clear: In my previous comments, I suggested that we should radically reduce the number of primitive types. That is not what I'm saying here. I'm saying that we should always use a single value container to encode and decode primitives, and the other container types should always use `Encodable` or `Decodable`. This doesn't reduce the capabilities of the system at all; it just means you only have to write the code to handle a given primitive type one time instead of three.
>>> 
>>> Having implemented these myself multiple times, I agree — it can be a pain to repeat these implementations, and if you look at the linked implementations above, funneling to one method from all of those is exactly what I do (and in fact, this can be shortened significantly, which I plan on doing soon).
>>> 
>>> There is a tradeoff here between ease of use for the end consumer of the API, and ease of coding for the writer of a new Encoder/Decoder, and my argument will always be for the benefit of the end consumer. (There will be orders of magnitude more end consumers of this API than those writing new Encoders and Decoders 😉)
>>> Think of the experience for the consumer of this API, especially someone learning it for the first time. It can already be somewhat of a hurdle to figure out what kind of container you need, but even once you get a keyed container (which is what we want to encourage), then what? You start typing container.enc... and in the autocomplete list in Xcode, the only thing that shows up is one autocomplete result: encode(value: Encodable, forKey: ...) Consider the clarity (or lack thereof) of that, as opposed to seeing encode(value: Int, forKey: ...), encode(value: String, forKey: ...), etc. Given a list of types that users are already familiar with helps immensely with pushing them in the right direction and reducing cognitive load. When you see String in that list, you don't have to question whether it's possible to encode a string or not, you just pick it. I have an Int8, can I encode it? Ah, it's in the list, so I can.
>>> 
>>> Even for advanced users of the API, though, there's something to be said for static guarantees in the overloading. As someone familiar with the API (who might even know all the primitives by heart), I might wonder if the Encoder I'm using has correctly switched on the generic type. (It would have to be a dynamic switch.) Did the implementer remember to switch on Int16 correctly? Or did they forget it and will I be falling into a generic case which is not appropriate here?
>>> 
>>> When it comes to overloads vs. dynamically switching on a generic type, I think we would generally prefer the static type safety. As a consumer of the API I want to be sure that the implementer of the Encoder I'm using was aware of these primitive types in some way, and that the compiler helped them too to make sure they didn't, say, forget to switch on Data.self. As a writer of Encoders, yes, this is a pain, but a sacrifice I'm willing to make for the sake of the consumer.
>>> 
>>> Let's take a step back, though. This is mostly annoying to implement because of the repetition, right? If someone were to put together a proposal for a proper macro system in Swift, which is really what we want here, I wouldn't be sad. 😉
>>> 
>> There's also an argument of API surface area. As a user or implementer of the API, it's much less intimidating to load the documentation for a protocol and see one central function than many overloads.
>> 
>> I've used many serialization third-party frameworks in Swift. None of them defined all those overloads, and more importantly, I never saw any user of those APIs post an issue to GitHub where the cause could be traced back to the lack of those overloads.
>> 
>> These overloads look to me like remnants of Codable's NSCoding influences instead of an API reimagined for Swift.
>> 
>> For the same reasons, I continue to believe that decode functions should overload on the return type. If we follow the arguments in favor of providing a type argument, then why don't we also have type arguments for encoders: encode(_ value: T?, forKey key: Key, as type: T.self)? I'm not advocating that: I'm just pushing the argument to its logical conclusion to explain why I don't understand it.
> 
> I don’t see a way for a call to encode to become ambiguous by omitting the type argument, whereas the same is not true for a return value from decode. The two seem fundamentally different.

When decoding to a property, there will be no ambiguity. And for other cases, Swift developers are already quite used to handling that kind of ambiguity, like for literals:

let x: UInt = 10
let y = 20 as CGFloat

> - Tony
> 
>>> 
>>> * And then there's the big idea: Changing the type of the parameter to `encode(to:)` and `init(from:)`.
>>> 
>>> ***
>>> 
>>> While working with the prototype, I realized that the vast majority of conformances will immediately make a container and then never use the `encoder` or `decoder` again. I also noticed that it's illegal to create more than one container from the same coder, and there are unenforceable preconditions to that effect. So I'm wondering if it would make sense to not pass the coder at all, but instead have the conforming type declare what kind of container it wants:
>>> 
>>> extension Pet: Codable {
>>> init(from container: KeyedDecodingContainer<CodingKeys>) throws {
>>> name = try container.decode(String.self, forKey: .name)
>>> age = try container.decode(Int.self, forKey: .age)
>>> }
>>> 
>>> func encode(to container: KeyedEncodingContainer<CodingKeys>) throws {
>>> try container.encode(name, forKey: .name)
>>> try container.encode(age, forKey: .age)
>>> }
>>> }
>>> 
>>> extension Array: Encodable where Element: Encodable {
>>> init(from container: UnkeyedDecodingContainer) throws {
>>> self.init()
>>> while !container.isAtEnd {
>>> append(try container.decode(Element.self))
>>> }
>>> }
>>> 
>>> func encode(to container: UnkeyedEncodingContainer) throws {
>>> container.encode(contentsOf: self)
>>> }
>>> }
>>> 
>>> I think this could be implemented by doing the following:
>>> 
>>> 1. Adding an associated type to `Encodable` and `Decodable` for the type passed to `encode(to:)`/`init(from:)`.
>>> 
>>> This is already unfortunately a no-go. As mentioned in other emails, you cannot override an associatetype in a subclass of a class, which means that you cannot require a different container type than your superclass. This is especially problematic in the default case where we'd want to encourage types to use keyed containers — every type should have its own keys, and you'd need to have a different keyed container than your parent, keyed on your keys.
>>> 
>>> Along with that, since the typealias would have to be at least as visible as your type (potentially public), it would necessitate that your key type would be at least as public as your type as well. This would expose your type's coding keys, which is prohibitive. (Consider what this would mean for frameworks, for instance.)
>>> 
>>> Finally, this also means that you could not request different container types based on context — a type could not offer both a dictionary representation and a more efficient array representation, since it can only statically request one container type.
>>> 
>>> 
>>> 2. Creating protocols for the types that are permitted there. Call them `EncodingSink` and `DecodingSource` for now.
>>> 
>>> 3. Creating *simple* type-erased wrappers for the `Unkeyed*Container` and `SingleValue*Container` protocols and conforming them to `EncodingSink` and `DecodingSource`. These wouldn't need the full generic-subclass dance usually used for type-erased wrappers; they just exist so you can strap initializers to them. In a future version of Swift which allowed initializers on existentials, we could probably get rid of them.
>>> 
>>> (Incidentally, if our APIs always return a type-erased wrapper around the `Keyed*ContainerProtocol` types, there's no actual need for the underlying protocols to have a `Key` associated type; they can use `CodingKey` existentials and depend on the wrapper to enforce the strong key typing. That would allow us to use a simple type-erased wrapper for `Keyed*Container`, too.)
>>> 
>>> 4. For advanced use cases where you really *do* need to access the encoder in order to decide which container type to use, we would also need to create a simple type-erased wrapper around `Encoder` and `Decoder` themselves, conforming them to the `Sink`/`Source` protocols.
>>> 
>>> This might address my last point above, but then what useful interface would EncodingSink and DecodingSource have if a type conforming to EncodingSink could be any one of the containers or even a whole encoder itself?
>>> 
>>> 
>>> 5. The Source/Sink parameter would need to be `inout`, unless we *do* end up class-constraining things. (My prototype didn't.)
>>> 
>>> There are lots of little details that change too, but these are the broad strokes.
>>> 
>>> Although this technically introduces more types, I think it actually simplifies the design for people who are just using the `Codable` protocol. All they have to know about is the `Codable` protocol, the magic `CodingKeys` type, the three container types (realistically, probably just the `KeyedEncoding/DecodingContainer`), and the top-level encoders they want to use. Most users should never need to know about the members of the `Encoder` protocol; few even need to know about the other two container types. They don't need to do the "create a container" dance. The thing would just work with a minimum of fuss.
>>> 
>>> Meanwhile, folks who write encoders *do* deal with a bit more complexity, but only because they have to be aware of more type-erased wrappers. In other respects, it's simpler for them, too. Keyed containers don't need to be generic, and they have a layer of Foundation-provided wrappers above them that can help enforce good behavior and (probably) hide the implementation a little bit more. I think that overall, it's probably better for them, too.
>>> 
>>> Thoughts?
>>> 
>>> For what it's worth, the way to introduce these three different types of encoding without the use of associated types is to split the Codable protocol up into three protocols, which we've tried in the past <https://github.com/itaiferber/swift-evolution/blob/swift-archival-serialization/proposals/XXXX-swift-archival-serialization.md#alternatives-considered> (bullet #4). Unfortunately, the results are not great — an even bigger explosion of types, overloads, etc.
>>> 
>>> While I agree that the current approach of dynamically requesting containers is, well, dynamic, the benefit of not exposing your keys publicly and allowing encoding of classes is a big win in comparison, I think.
>>> 
>>> I am curious, though, about your comment above on preconditions being unenforceable, because this is certainly something we would want to hammer out before releasing. What cases are you thinking of that are unenforceable?
>>> 
>>> 
>>> -- 
>>> Brent Royal-Gordon
>>> Architechies
>>> 
>>> Again, thanks for your thorough review! Looking forward to further comments. :)
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170405/034119e9/attachment.html>