[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings
Arnold
aschwaighofer at apple.com
Wed Mar 2 22:55:36 CST 2016
Yes this does not explain what you are seeing. (I could repeat this on my end)
If you look at the two runs under instruments: one run with the slow strings file and one with the fast strings file (I tested with space appended to each line); you will notice that we spend a lot more time in the Set's find function doing NSString comparisons. The only explanation I have for this is hash collisions leading to more comparisons as we are probing non empty buckets. Why one of the files would trigger this and the other not makes no sense to me.
This will need looking at.
Sent from my iPhone
> On Mar 2, 2016, at 6:43 PM, Jens Persson <jens at bitcycle.com> wrote:
>
> Just thought I should mention that I tried another thing:
>
> Using my original example:
> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
> Keeping the code as it is but adding a space to the end of every line in the actual text file (strings.txt).
>
> This actually makes the slow test(strings) run almost as fast as the fast test(caseSwappedStrings), 0.09 seconds, so about a hundred times faster just because the text file has an extra space at the end of each line.
>
> I just can't see how Arnold Schwaighofer's explanation explains that ... An added space character at the end of each line of that text file is just one more ASCII character, how can that make test(strings) so much faster?
>
> And other operations on the text file makes the slow test faster too, like eg:
> Inserting a space as the first char of each line, 0.9 seconds.
> Removing all spaces, 0.9 seconds.
> Replacing all spaces with "-" (replacing the dashes back to spaces made it slow again), 0.9 seconds.
> Replacing all "a" with "*", 0.9 seconds.
> Swapping case (now in the actual file instead of programmatically), 2.3 seconds.
> (I only tried operations that kept the resulting strings unique and their characters plain ASCII.)
>
> /Jens
>
>> On Wed, Mar 2, 2016 at 6:28 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>> That is the difference between a “String” type instance that can use the ascii fast path and NSString backed “String” type instances.
>>
>> When the code translates the string (the map function that calls upper case) it becomes a natively backed “String” that isASCII.
>>
>> // The strings in strings.txt are not strange in any way, they all match ^[a-zA-Z ]+$.
>>
>> Set the isASCII bit and things are blazing fast.
>>
>> The other path is that componentsSeparatedByString(separator: String) -> [String] returns an “[String]” where every “String” is backed by “NSString”. When it comes to do comparison we call a NSString function on the NSString and this is where the difference in performance comes from.
>>
>>
>>
>>
>> > On Mar 2, 2016, at 12:15 AM, Jens Persson via swift-dev <swift-dev at swift.org> wrote:
>> >
>> > I guess it would be possible to isolate the problem by analyzing my example, and thereby making it possible to write a smaller benchmark which doesn't need that big textfile. I didn't manage to do that however, hence my question about the problem here.
>> > /Jens
>> >
>> > On Wed, Mar 2, 2016 at 9:11 AM, Jens Persson <jens at bitcycle.com> wrote:
>> > Thanks, but I'll have to invite anyone with more time and experience to do that.
>> >
>> > One (of many) thing(s) I wouldn't know how include/handle is the 1.3 MB text file for the particular arrayOfStrings that makes Set(arrayOfStrings) slow. It seems a bit unnecessary/bloating to put it in the code base.
>> >
>> > /Jens
>> >
>> > On Tue, Mar 1, 2016 at 11:58 PM, Nadav Rotem <nrotem at apple.com> wrote:
>> > Hi Jens,
>> >
>> > Thanks for reporting this issue. I don’t know what’s going on but we’ll take a look.
>> >
>> > I think it would be great if you could add this program as a benchmark under swift/benchmarks/. This will allow us to track the performance of this test and ensure that we don’t regress.
>> >
>> > Thanks,
>> > Nadav
>> >
>> >> On Mar 1, 2016, at 5:01 AM, Jens Persson via swift-dev <swift-dev at swift.org> wrote:
>> >>
>> >> For some particular arrays of strings, creating a Set<String>(arrayOfStrings) takes about 100 to 200 times longer than for other very similar arrays of strings (equally many unique simple [a-zA-Z ]+ strings).
>> >>
>> >> I've put together a tiny program to demonstrate the problem here:
>> >> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>> >>
>> >> Is this due to a bug / performance problem in Set or can it be explained (and solved) in some way?
>> >>
>> >> /Jens
>> >>
>> >> _______________________________________________
>> >> swift-dev mailing list
>> >> swift-dev at swift.org
>> >> https://lists.swift.org/mailman/listinfo/swift-dev
>> >
>> >
>> >
>> >
>> > --
>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>> > http://www.bitcycle.com/
>> > Phone: +46-73-753 24 62
>> > E-mail: jens at bitcycle.com
>> >
>> >
>> >
>> >
>> > --
>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>> > http://www.bitcycle.com/
>> > Phone: +46-73-753 24 62
>> > E-mail: jens at bitcycle.com
>> >
>> > _______________________________________________
>> > swift-dev mailing list
>> > swift-dev at swift.org
>> > https://lists.swift.org/mailman/listinfo/swift-dev
>
>
>
> --
> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> http://www.bitcycle.com/
> Phone: +46-73-753 24 62
> E-mail: jens at bitcycle.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160302/993dbdba/attachment.html>
More information about the swift-dev
mailing list