[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings

Jens Persson jens at bitcycle.com
Fri Mar 4 03:33:51 CST 2016


Another thing I noticed is that making a Set<NSString> (rather than a
Set<String>) is fast ...

That is (for those same particular strings) this is fast (0.7 s):
let setOfNSStrings = Set<NSString>(strings.map { $0 as NSString })

while this is slow (9.5 s):
let setOfNSStrings = Set<String>(strings)

/Jens


On Fri, Mar 4, 2016 at 9:48 AM, Jens Persson <jens at bitcycle.com> wrote:

> Are you already tracking/looking into this in someway or should I file a
> bug?
> /Jens
>
> On Thu, Mar 3, 2016 at 5:55 AM, Arnold <aschwaighofer at apple.com> wrote:
>
>> Yes this does not explain what you are seeing. (I could repeat this on my
>> end)
>>
>> If you look at the two runs under instruments: one run with the slow
>> strings file and one with the fast strings file (I tested with space
>> appended to each line); you will notice that we spend a lot more time in
>> the Set's find function doing NSString comparisons. The only explanation I
>> have for this is hash collisions leading to more comparisons as we are
>> probing non empty buckets. Why one of the files would trigger this and the
>> other not makes no sense to me.
>> This will need looking at.
>>
>> Sent from my iPhone
>>
>> On Mar 2, 2016, at 6:43 PM, Jens Persson <jens at bitcycle.com> wrote:
>>
>> Just thought I should mention that I tried another thing:
>>
>> Using my original example:
>>
>> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>> Keeping the code as it is but adding a space to the end of every line in
>> the actual text file (strings.txt).
>>
>> This actually makes the slow test(strings) run almost as fast as the fast
>> test(caseSwappedStrings), 0.09 seconds, so about a hundred times faster
>> just because the text file has an extra space at the end of each line.
>>
>> I just can't see how Arnold Schwaighofer's explanation explains that ...
>> An added space character at the end of each line of that text file is just
>> one more ASCII character, how can that make test(strings) so much faster?
>>
>> And other operations on the text file makes the slow test faster too,
>> like eg:
>> Inserting a space as the first char of each line, 0.9 seconds.
>> Removing all spaces, 0.9 seconds.
>> Replacing all spaces with "-" (replacing the dashes back to spaces made
>> it slow again), 0.9 seconds.
>> Replacing all "a" with "*", 0.9 seconds.
>> Swapping case (now in the actual file instead of programmatically), 2.3
>> seconds.
>> (I only tried operations that kept the resulting strings unique and their
>> characters plain ASCII.)
>>
>> /Jens
>>
>> On Wed, Mar 2, 2016 at 6:28 PM, Arnold Schwaighofer <
>> aschwaighofer at apple.com> wrote:
>>
>>> That is the difference between a “String” type instance that can use the
>>> ascii fast path and NSString backed “String” type instances.
>>>
>>> When the code translates the string (the map function that calls upper
>>> case) it becomes a natively backed “String” that isASCII.
>>>
>>> // The strings in strings.txt are not strange in any way, they all match
>>> ^[a-zA-Z ]+$.
>>>
>>> Set the isASCII bit and things are blazing fast.
>>>
>>> The other path is that componentsSeparatedByString(separator: String) ->
>>> [String] returns an “[String]” where every “String” is backed by
>>> “NSString”. When it comes to do comparison we call a NSString function on
>>> the NSString and this is where the difference in performance comes from.
>>>
>>>
>>>
>>>
>>> > On Mar 2, 2016, at 12:15 AM, Jens Persson via swift-dev <
>>> swift-dev at swift.org> wrote:
>>> >
>>> > I guess it would be possible to isolate the problem by analyzing my
>>> example, and thereby making it possible to write a smaller benchmark which
>>> doesn't need that big textfile. I didn't manage to do that however, hence
>>> my question about the problem here.
>>> > /Jens
>>> >
>>> > On Wed, Mar 2, 2016 at 9:11 AM, Jens Persson <jens at bitcycle.com>
>>> wrote:
>>> > Thanks, but I'll have to invite anyone with more time and experience
>>> to do that.
>>> >
>>> > One (of many) thing(s) I wouldn't know how include/handle is the 1.3
>>> MB text file for the particular arrayOfStrings that makes
>>> Set(arrayOfStrings) slow. It seems a bit unnecessary/bloating to put it in
>>> the code base.
>>> >
>>> > /Jens
>>> >
>>> > On Tue, Mar 1, 2016 at 11:58 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>> > Hi Jens,
>>> >
>>> > Thanks for reporting this issue. I don’t know what’s going on but
>>> we’ll take a look.
>>> >
>>> > I think it would be great if you could add this program as a benchmark
>>> under swift/benchmarks/.  This will allow us to track the performance of
>>> this test and ensure that we don’t regress.
>>> >
>>> > Thanks,
>>> > Nadav
>>> >
>>> >> On Mar 1, 2016, at 5:01 AM, Jens Persson via swift-dev <
>>> swift-dev at swift.org> wrote:
>>> >>
>>> >> For some particular arrays of strings, creating a
>>> Set<String>(arrayOfStrings) takes about 100 to 200 times longer than for
>>> other very similar arrays of strings (equally many unique simple [a-zA-Z ]+
>>> strings).
>>> >>
>>> >> I've put together a tiny program to demonstrate the problem here:
>>> >>
>>> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>>> >>
>>> >> Is this due to a bug / performance problem in Set or can it be
>>> explained (and solved) in some way?
>>> >>
>>> >> /Jens
>>> >>
>>> >> _______________________________________________
>>> >> swift-dev mailing list
>>> >> swift-dev at swift.org
>>> >> https://lists.swift.org/mailman/listinfo/swift-dev
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>> > http://www.bitcycle.com/
>>> > Phone: +46-73-753 24 62
>>> > E-mail: jens at bitcycle.com
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>> > http://www.bitcycle.com/
>>> > Phone: +46-73-753 24 62
>>> > E-mail: jens at bitcycle.com
>>> >
>>> > _______________________________________________
>>> > swift-dev mailing list
>>> > swift-dev at swift.org
>>> > https://lists.swift.org/mailman/listinfo/swift-dev
>>>
>>>
>>
>>
>> --
>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>> http://www.bitcycle.com/
>> Phone: +46-73-753 24 62
>> E-mail: jens at bitcycle.com
>>
>>
>
>
> --
> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> http://www.bitcycle.com/
> Phone: +46-73-753 24 62
> E-mail: jens at bitcycle.com
>
>


-- 
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160304/96963eac/attachment.html>


More information about the swift-dev mailing list