[swift-users] Collection underestimatedCount() does _?

Brent Royal-Gordon brent at architechies.com
Sat Mar 19 14:53:49 CDT 2016


>> I might have missed something since a lot of the results are for tests <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code>
>> But why would `underestimatedCount` ever be used instead of `count`? Don’t most collections have O(1) `count` anyway?
> 
> Hi Will,
> 
> This API comes from Sequence, which does not have a count that you can
> inspect without possibly consuming the sequence.

To add some color:

A sequence merely says that it has a bunch of elements and you can walk through them. It does *not* promise several important things:

* That you can get the elements in the sequence more than once
* That it knows the number of elements in the sequence and can quickly return them
* That you can get the number of elements in the sequence without walking through and counting them one by one

This adds up to an ugly conclusion: With just a sequence, it is impossible to efficiently discover the count, and doing so might destroy the sequence.

That's obviously not so great. In particular, it's not so great when you want to load a sequence into an array. The simple, naive way is to do this:

	// Notionally
	struct Array<Element> {
		init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
			self.init()
			
			for elem in seq {
				appendElement(elem)
			}
		}
	}

But this may resize the array several times, which would be very slow. The fastest way to do that will be to pre-allocate the exact right number of elements so you don't have to keep resizing the array:

	// Notionally
	struct Array<Element> {
		init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
			self.init()
			
			reserveCapacity(seq.count)
			for elem in seq {
				_appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
			}
		}
	}

But `seq.count` might consume the sequence, leaving you unable to add any elements. What do we do?

Enter `underestimateCount()`. We can always call `underestimateCount()` and size to whatever it says we should be. For sequences with easily-calculated counts, this should give us a size that's just right. For sequences where they can kind of estimate the right count (for instance, if you're decoding a fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but not exactly how many characters they'll decode to), it will get us at least part of the way there. For sequences whose size is completely unknown, it won't help but it won't hurt much either.

So you do this:

	// Notionally
	struct Array<Element> {
		init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
			self.init()
			
			let fastCount = seq.underestimateCount()
			reserveCapacity(fastCount)
			
			// We'll have to pull elements out manually.
			let generator = seq.generate()
			
			// Load all the fast elements
			for _ in 0..<fastCount {
				let elem = generator.next()!
				_appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
			}
			
			// Get anything else that's left
			while let elem = generator.next() {
				appendElement(elem)
			}
		}
	}

(If you're looking at Swift 3, remember: SequenceType is now Sequence, Generator is now Iterator, generate() is now makeIterator(), and underestimateCount() is now underestimatedCount().)

And in fact, this is quite similar to some real source code inside Array:

	@warn_unused_result
	internal func _copySequenceToNativeArrayBuffer<
	  S : SequenceType
	>(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> {
	  let initialCapacity = source.underestimateCount()
	  var builder =
	    _UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>(
	      initialCapacity: initialCapacity)
	
	  var generator = source.generate()
	
	  // FIXME(performance): use _initializeTo().
	
	  // Add elements up to the initial capacity without checking for regrowth.
	  for _ in 0..<initialCapacity {
	    builder.addWithExistingCapacity(generator.next()!)
	  }
	
	  // Add remaining elements, if any.
	  while let element = generator.next() {
	    builder.add(element)
	  }
	
	  return builder.finish()
	}

Now, collections *do* promise that you can walk through a sequence more than once, and some collections (ones that promise "random access") promise that `count` can be calculated quickly. But because CollectionType conforms to SequenceType, some code may receive a collection but only know it's a sequence. Fortunately, the standard library provides a default implementation of `underestimateCount` for collections:

	  public func underestimateCount() -> Int {
	    return numericCast(count)
	  }

So you just don't have to worry about this on collections. Implement `count` and `understimateCount` will take care of itself.

HTH,
-- 
Brent Royal-Gordon
Architechies



More information about the swift-users mailing list