[swift-users] Collection underestimatedCount() does _?
Will Stanton
willstanton1 at yahoo.com
Sat Mar 19 17:27:53 CDT 2016
Hello Brent,
Thanks kindly for the flair!
You gave cases for which `underestimatedCount()` is used:
> For sequences with easily-calculated counts, this should give us a size that's just right. For sequences where they can kind of estimate the right count (for instance, if you're decoding a fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but not exactly how many characters they'll decode to), it will get us at least part of the way there. For sequences whose size is completely unknown, it won't help but it won't hurt much either.
CC’ing swift-dev, I’m wondering:
Was there a reason `underestimatedCount()` was chosen for Sequence instead of `estimatedCount()`. I suppose an overestimate would be bad for iterating `0..<estimatedCount()`, but I couldn’t find any existing loops over `0..<underestimatedCount()`.
Does anything depend on an underestimate of Sequence count? And if not, would it be better to be less restrictive when asking a Sequence for a size estimate?
Since the exact count is returned for collections, the name also isn’t very precise.
Regards,
Will Stanton
> On Mar 19, 2016, at 3:53 PM, Brent Royal-Gordon via swift-users <swift-users at swift.org> wrote:
>
>>> I might have missed something since a lot of the results are for tests <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code>
>>> But why would `underestimatedCount` ever be used instead of `count`? Don’t most collections have O(1) `count` anyway?
>>
>> Hi Will,
>>
>> This API comes from Sequence, which does not have a count that you can
>> inspect without possibly consuming the sequence.
>
> To add some color:
>
> A sequence merely says that it has a bunch of elements and you can walk through them. It does *not* promise several important things:
>
> * That you can get the elements in the sequence more than once
> * That it knows the number of elements in the sequence and can quickly return them
> * That you can get the number of elements in the sequence without walking through and counting them one by one
>
> This adds up to an ugly conclusion: With just a sequence, it is impossible to efficiently discover the count, and doing so might destroy the sequence.
>
> That's obviously not so great. In particular, it's not so great when you want to load a sequence into an array. The simple, naive way is to do this:
>
> // Notionally
> struct Array<Element> {
> init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
> self.init()
>
> for elem in seq {
> appendElement(elem)
> }
> }
> }
>
> But this may resize the array several times, which would be very slow. The fastest way to do that will be to pre-allocate the exact right number of elements so you don't have to keep resizing the array:
>
> // Notionally
> struct Array<Element> {
> init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
> self.init()
>
> reserveCapacity(seq.count)
> for elem in seq {
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
> }
> }
> }
>
> But `seq.count` might consume the sequence, leaving you unable to add any elements. What do we do?
>
> Enter `underestimateCount()`. We can always call `underestimateCount()` and size to whatever it says we should be. For sequences with easily-calculated counts, this should give us a size that's just right. For sequences where they can kind of estimate the right count (for instance, if you're decoding a fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but not exactly how many characters they'll decode to), it will get us at least part of the way there. For sequences whose size is completely unknown, it won't help but it won't hurt much either.
>
> So you do this:
>
> // Notionally
> struct Array<Element> {
> init<Seq: SequenceType where Seq.Generator.Element == Element>(_ seq: Seq) {
> self.init()
>
> let fastCount = seq.underestimateCount()
> reserveCapacity(fastCount)
>
> // We'll have to pull elements out manually.
> let generator = seq.generate()
>
> // Load all the fast elements
> for _ in 0..<fastCount {
> let elem = generator.next()!
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
> }
>
> // Get anything else that's left
> while let elem = generator.next() {
> appendElement(elem)
> }
> }
> }
>
> (If you're looking at Swift 3, remember: SequenceType is now Sequence, Generator is now Iterator, generate() is now makeIterator(), and underestimateCount() is now underestimatedCount().)
>
> And in fact, this is quite similar to some real source code inside Array:
>
> @warn_unused_result
> internal func _copySequenceToNativeArrayBuffer<
> S : SequenceType
> >(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> {
> let initialCapacity = source.underestimateCount()
> var builder =
> _UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>(
> initialCapacity: initialCapacity)
>
> var generator = source.generate()
>
> // FIXME(performance): use _initializeTo().
>
> // Add elements up to the initial capacity without checking for regrowth.
> for _ in 0..<initialCapacity {
> builder.addWithExistingCapacity(generator.next()!)
> }
>
> // Add remaining elements, if any.
> while let element = generator.next() {
> builder.add(element)
> }
>
> return builder.finish()
> }
>
> Now, collections *do* promise that you can walk through a sequence more than once, and some collections (ones that promise "random access") promise that `count` can be calculated quickly. But because CollectionType conforms to SequenceType, some code may receive a collection but only know it's a sequence. Fortunately, the standard library provides a default implementation of `underestimateCount` for collections:
>
> public func underestimateCount() -> Int {
> return numericCast(count)
> }
>
> So you just don't have to worry about this on collections. Implement `count` and `understimateCount` will take care of itself.
>
> HTH,
> --
> Brent Royal-Gordon
> Architechies
More information about the swift-users
mailing list