[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings

Jens Persson jens at bitcycle.com
Fri Mar 4 07:18:09 CST 2016


If you make a WrappedString type where you can choose to use .hash instead
of .hashValue, then the below test(wrappedStrings) will be fast if .hash is
used, and slow if .hashValue is used. Demo-program:

import QuartzCore
struct WrappedString : Hashable {
    let wrapped : String
    init(_ s: String) { wrapped = s }
    var hashValue: Int {
        return wrapped.hash // <-- NOTE: Will make test(wrappedStrings)
fast.
        // return wrapped.hashValue // <-- NOTE: Will make
test(wrappedStrings) slow.
    }
}
func ==(lhs: WrappedString, rhs: WrappedString) -> Bool { return
lhs.wrapped == rhs.wrapped }
func test<T: Hashable>(array: [T]) {
    let t0 = CACurrentMediaTime()
    let set = Set(array)
    let t1 = CACurrentMediaTime()
    print("Created set of", set.count, " elements in", t1 - t0, "seconds.")
}
// Download http://sloppyfocus.com/strings.zip, unzip and set the correct
path here:
let path = "/Users/jens/strings.txt"
do {
    let strings = try String.init(contentsOfFile:
path).componentsSeparatedByString("\n")
    print("Loaded", strings.count, "strings.") // Prints: Loaded 88379
strings.
    let wrappedStrings = strings.map { WrappedString($0) }
    test(wrappedStrings) // Prints: Created set of 88379 strings in
0.0923392080003396 seconds.
    test(strings)        // Prints: Created set of 88379 strings in
9.25633556599496 seconds.
}
catch let e { fatalError(String(e)) }

/Jens


On Fri, Mar 4, 2016 at 11:13 AM, Jens Persson <jens at bitcycle.com> wrote:

> (Correction: 0.07 seconds, not 0.7 seconds.)
>
> On Fri, Mar 4, 2016 at 10:33 AM, Jens Persson <jens at bitcycle.com> wrote:
>
>> Another thing I noticed is that making a Set<NSString> (rather than a
>> Set<String>) is fast ...
>>
>> That is (for those same particular strings) this is fast (0.7 s):
>> let setOfNSStrings = Set<NSString>(strings.map { $0 as NSString })
>>
>> while this is slow (9.5 s):
>> let setOfNSStrings = Set<String>(strings)
>>
>> /Jens
>>
>>
>> On Fri, Mar 4, 2016 at 9:48 AM, Jens Persson <jens at bitcycle.com> wrote:
>>
>>> Are you already tracking/looking into this in someway or should I file a
>>> bug?
>>> /Jens
>>>
>>> On Thu, Mar 3, 2016 at 5:55 AM, Arnold <aschwaighofer at apple.com> wrote:
>>>
>>>> Yes this does not explain what you are seeing. (I could repeat this on
>>>> my end)
>>>>
>>>> If you look at the two runs under instruments: one run with the slow
>>>> strings file and one with the fast strings file (I tested with space
>>>> appended to each line); you will notice that we spend a lot more time in
>>>> the Set's find function doing NSString comparisons. The only explanation I
>>>> have for this is hash collisions leading to more comparisons as we are
>>>> probing non empty buckets. Why one of the files would trigger this and the
>>>> other not makes no sense to me.
>>>> This will need looking at.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Mar 2, 2016, at 6:43 PM, Jens Persson <jens at bitcycle.com> wrote:
>>>>
>>>> Just thought I should mention that I tried another thing:
>>>>
>>>> Using my original example:
>>>>
>>>> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>>>> Keeping the code as it is but adding a space to the end of every line
>>>> in the actual text file (strings.txt).
>>>>
>>>> This actually makes the slow test(strings) run almost as fast as the
>>>> fast test(caseSwappedStrings), 0.09 seconds, so about a hundred times
>>>> faster just because the text file has an extra space at the end of each
>>>> line.
>>>>
>>>> I just can't see how Arnold Schwaighofer's explanation explains that
>>>> ... An added space character at the end of each line of that text file is
>>>> just one more ASCII character, how can that make test(strings) so much
>>>> faster?
>>>>
>>>> And other operations on the text file makes the slow test faster too,
>>>> like eg:
>>>> Inserting a space as the first char of each line, 0.9 seconds.
>>>> Removing all spaces, 0.9 seconds.
>>>> Replacing all spaces with "-" (replacing the dashes back to spaces made
>>>> it slow again), 0.9 seconds.
>>>> Replacing all "a" with "*", 0.9 seconds.
>>>> Swapping case (now in the actual file instead of programmatically), 2.3
>>>> seconds.
>>>> (I only tried operations that kept the resulting strings unique and
>>>> their characters plain ASCII.)
>>>>
>>>> /Jens
>>>>
>>>> On Wed, Mar 2, 2016 at 6:28 PM, Arnold Schwaighofer <
>>>> aschwaighofer at apple.com> wrote:
>>>>
>>>>> That is the difference between a “String” type instance that can use
>>>>> the ascii fast path and NSString backed “String” type instances.
>>>>>
>>>>> When the code translates the string (the map function that calls upper
>>>>> case) it becomes a natively backed “String” that isASCII.
>>>>>
>>>>> // The strings in strings.txt are not strange in any way, they all
>>>>> match ^[a-zA-Z ]+$.
>>>>>
>>>>> Set the isASCII bit and things are blazing fast.
>>>>>
>>>>> The other path is that componentsSeparatedByString(separator: String)
>>>>> -> [String] returns an “[String]” where every “String” is backed by
>>>>> “NSString”. When it comes to do comparison we call a NSString function on
>>>>> the NSString and this is where the difference in performance comes from.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> > On Mar 2, 2016, at 12:15 AM, Jens Persson via swift-dev <
>>>>> swift-dev at swift.org> wrote:
>>>>> >
>>>>> > I guess it would be possible to isolate the problem by analyzing my
>>>>> example, and thereby making it possible to write a smaller benchmark which
>>>>> doesn't need that big textfile. I didn't manage to do that however, hence
>>>>> my question about the problem here.
>>>>> > /Jens
>>>>> >
>>>>> > On Wed, Mar 2, 2016 at 9:11 AM, Jens Persson <jens at bitcycle.com>
>>>>> wrote:
>>>>> > Thanks, but I'll have to invite anyone with more time and experience
>>>>> to do that.
>>>>> >
>>>>> > One (of many) thing(s) I wouldn't know how include/handle is the 1.3
>>>>> MB text file for the particular arrayOfStrings that makes
>>>>> Set(arrayOfStrings) slow. It seems a bit unnecessary/bloating to put it in
>>>>> the code base.
>>>>> >
>>>>> > /Jens
>>>>> >
>>>>> > On Tue, Mar 1, 2016 at 11:58 PM, Nadav Rotem <nrotem at apple.com>
>>>>> wrote:
>>>>> > Hi Jens,
>>>>> >
>>>>> > Thanks for reporting this issue. I don’t know what’s going on but
>>>>> we’ll take a look.
>>>>> >
>>>>> > I think it would be great if you could add this program as a
>>>>> benchmark under swift/benchmarks/.  This will allow us to track the
>>>>> performance of this test and ensure that we don’t regress.
>>>>> >
>>>>> > Thanks,
>>>>> > Nadav
>>>>> >
>>>>> >> On Mar 1, 2016, at 5:01 AM, Jens Persson via swift-dev <
>>>>> swift-dev at swift.org> wrote:
>>>>> >>
>>>>> >> For some particular arrays of strings, creating a
>>>>> Set<String>(arrayOfStrings) takes about 100 to 200 times longer than for
>>>>> other very similar arrays of strings (equally many unique simple [a-zA-Z ]+
>>>>> strings).
>>>>> >>
>>>>> >> I've put together a tiny program to demonstrate the problem here:
>>>>> >>
>>>>> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>>>>> >>
>>>>> >> Is this due to a bug / performance problem in Set or can it be
>>>>> explained (and solved) in some way?
>>>>> >>
>>>>> >> /Jens
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> swift-dev mailing list
>>>>> >> swift-dev at swift.org
>>>>> >> https://lists.swift.org/mailman/listinfo/swift-dev
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>>>> > http://www.bitcycle.com/
>>>>> > Phone: +46-73-753 24 62
>>>>> > E-mail: jens at bitcycle.com
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>>>> > http://www.bitcycle.com/
>>>>> > Phone: +46-73-753 24 62
>>>>> > E-mail: jens at bitcycle.com
>>>>> >
>>>>> > _______________________________________________
>>>>> > swift-dev mailing list
>>>>> > swift-dev at swift.org
>>>>> > https://lists.swift.org/mailman/listinfo/swift-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>>> http://www.bitcycle.com/
>>>> Phone: +46-73-753 24 62
>>>> E-mail: jens at bitcycle.com
>>>>
>>>>
>>>
>>>
>>> --
>>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>> http://www.bitcycle.com/
>>> Phone: +46-73-753 24 62
>>> E-mail: jens at bitcycle.com
>>>
>>>
>>
>>
>> --
>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>> http://www.bitcycle.com/
>> Phone: +46-73-753 24 62
>> E-mail: jens at bitcycle.com
>>
>>
>
>
> --
> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> http://www.bitcycle.com/
> Phone: +46-73-753 24 62
> E-mail: jens at bitcycle.com
>
>


-- 
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160304/18f1a006/attachment.html>


More information about the swift-dev mailing list