[swift-users] Collection underestimatedCount() does _?
Fritz Anderson
fritza at manoverboard.org
Sat Mar 19 16:12:49 CDT 2016
On Mar 19, 2016, at 2:53 PM, Brent Royal-Gordon via swift-users <swift-users at swift.org> wrote:
>>> I might have missed something since a lot of the results are for tests <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code>
>>> But why would `underestimatedCount` ever be used instead of `count`? Don’t most collections have O(1) `count` anyway?
>>
>> Hi Will,
>>
>> This API comes from Sequence, which does not have a count that you can
>> inspect without possibly consuming the sequence.
>
> To add some color:
>
> A sequence merely says that it has a bunch of elements and you can walk through them. It does *not* promise several important things:
>
> * That you can get the elements in the sequence more than once
> * That it knows the number of elements in the sequence and can quickly return them
> * That you can get the number of elements in the sequence without walking through and counting them one by one
>
> This adds up to an ugly conclusion: With just a sequence, it is impossible to efficiently discover the count, and doing so might destroy the sequence.
>
> That's obviously not so great. In particular, it's not so great when you want to load a sequence into an array. The simple, naive way is to do this:
>
> // Notionally
> struct Array<Element> {
> init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
> self.init()
>
> for elem in seq {
> appendElement(elem)
> }
> }
> }
>
> But this may resize the array several times, which would be very slow. The fastest way to do that will be to pre-allocate the exact right number of elements so you don't have to keep resizing the array:
>
> // Notionally
> struct Array<Element> {
> init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
> self.init()
>
> reserveCapacity(seq.count)
> for elem in seq {
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
> }
> }
> }
>
> But `seq.count` might consume the sequence, leaving you unable to add any elements. What do we do?
>
> Enter `underestimateCount()`. We can always call `underestimateCount()` and size to whatever it says we should be. For sequences with easily-calculated counts, this should give us a size that's just right. For sequences where they can kind of estimate the right count (for instance, if you're decoding a fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but not exactly how many characters they'll decode to), it will get us at least part of the way there. For sequences whose size is completely unknown, it won't help but it won't hurt much either.
>
> So you do this:
>
> // Notionally
> struct Array<Element> {
> init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
> self.init()
>
> let fastCount = seq.underestimateCount()
> reserveCapacity(fastCount)
>
> // We'll have to pull elements out manually.
> let generator = seq.generate()
>
> // Load all the fast elements
> for _ in 0..<fastCount {
> let elem = generator.next()!
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
> }
>
> // Get anything else that's left
> while let elem = generator.next() {
> appendElement(elem)
> }
> }
> }
>
> (If you're looking at Swift 3, remember: SequenceType is now Sequence, Generator is now Iterator, generate() is now makeIterator(), and underestimateCount() is now underestimatedCount().)
>
> And in fact, this is quite similar to some real source code inside Array:
>
> @warn_unused_result
> internal func _copySequenceToNativeArrayBuffer<
> S : SequenceType
> >(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> {
> let initialCapacity = source.underestimateCount()
> var builder =
> _UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>(
> initialCapacity: initialCapacity)
>
> var generator = source.generate()
>
> // FIXME(performance): use _initializeTo().
>
> // Add elements up to the initial capacity without checking for regrowth.
> for _ in 0..<initialCapacity {
> builder.addWithExistingCapacity(generator.next()!)
> }
>
> // Add remaining elements, if any.
> while let element = generator.next() {
> builder.add(element)
> }
>
> return builder.finish()
> }
>
> Now, collections *do* promise that you can walk through a sequence more than once, and some collections (ones that promise "random access") promise that `count` can be calculated quickly. But because CollectionType conforms to SequenceType, some code may receive a collection but only know it's a sequence. Fortunately, the standard library provides a default implementation of `underestimateCount` for collections:
>
> public func underestimateCount() -> Int {
> return numericCast(count)
> }
>
> So you just don't have to worry about this on collections. Implement `count` and `understimateCount` will take care of itself.
>
Okay, so… maybe I am unfairly treating your summary as the whole story… returning to the example of the contents of /dev/random, or quadrature data from a wireless receiver, which are supposed to be neither finite nor reproducible, but are indisputably one-thing-after-another — sequences in the common sense of the word …
Am I to gather
1: that the standard API requires that
every Sequence type can be instantiated so as to replicate the contents of another? The whole value of Sequence to the examples is that it should be impossible.
2: that the standard API requires of every Sequence that it should serve the expectation of the supposedly-uncoupled API of Array that it is free to buffer the entire contents of any Sequence of the same Element? Need it or not? Bad news for memory and performance.
The answer may be that if the user knows enough to build a Sequence for which laziness is indispensable, she should have better sense than to rely on those two things, required or not.
But a lifetime of Law and debugging have given me an exceptionally dirty mind. If those really are API, I can avoid them, but I can't expect others — including the standard library — to.
— F
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-users/attachments/20160319/34729802/attachment.html>
More information about the swift-users
mailing list