[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings
Jens Persson
jens at bitcycle.com
Fri Mar 4 04:13:24 CST 2016
(Correction: 0.07 seconds, not 0.7 seconds.)
On Fri, Mar 4, 2016 at 10:33 AM, Jens Persson <jens at bitcycle.com> wrote:
> Another thing I noticed is that making a Set<NSString> (rather than a
> Set<String>) is fast ...
>
> That is (for those same particular strings) this is fast (0.7 s):
> let setOfNSStrings = Set<NSString>(strings.map { $0 as NSString })
>
> while this is slow (9.5 s):
> let setOfNSStrings = Set<String>(strings)
>
> /Jens
>
>
> On Fri, Mar 4, 2016 at 9:48 AM, Jens Persson <jens at bitcycle.com> wrote:
>
>> Are you already tracking/looking into this in someway or should I file a
>> bug?
>> /Jens
>>
>> On Thu, Mar 3, 2016 at 5:55 AM, Arnold <aschwaighofer at apple.com> wrote:
>>
>>> Yes this does not explain what you are seeing. (I could repeat this on
>>> my end)
>>>
>>> If you look at the two runs under instruments: one run with the slow
>>> strings file and one with the fast strings file (I tested with space
>>> appended to each line); you will notice that we spend a lot more time in
>>> the Set's find function doing NSString comparisons. The only explanation I
>>> have for this is hash collisions leading to more comparisons as we are
>>> probing non empty buckets. Why one of the files would trigger this and the
>>> other not makes no sense to me.
>>> This will need looking at.
>>>
>>> Sent from my iPhone
>>>
>>> On Mar 2, 2016, at 6:43 PM, Jens Persson <jens at bitcycle.com> wrote:
>>>
>>> Just thought I should mention that I tried another thing:
>>>
>>> Using my original example:
>>>
>>> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>>> Keeping the code as it is but adding a space to the end of every line in
>>> the actual text file (strings.txt).
>>>
>>> This actually makes the slow test(strings) run almost as fast as the
>>> fast test(caseSwappedStrings), 0.09 seconds, so about a hundred times
>>> faster just because the text file has an extra space at the end of each
>>> line.
>>>
>>> I just can't see how Arnold Schwaighofer's explanation explains that ...
>>> An added space character at the end of each line of that text file is just
>>> one more ASCII character, how can that make test(strings) so much faster?
>>>
>>> And other operations on the text file makes the slow test faster too,
>>> like eg:
>>> Inserting a space as the first char of each line, 0.9 seconds.
>>> Removing all spaces, 0.9 seconds.
>>> Replacing all spaces with "-" (replacing the dashes back to spaces made
>>> it slow again), 0.9 seconds.
>>> Replacing all "a" with "*", 0.9 seconds.
>>> Swapping case (now in the actual file instead of programmatically), 2.3
>>> seconds.
>>> (I only tried operations that kept the resulting strings unique and
>>> their characters plain ASCII.)
>>>
>>> /Jens
>>>
>>> On Wed, Mar 2, 2016 at 6:28 PM, Arnold Schwaighofer <
>>> aschwaighofer at apple.com> wrote:
>>>
>>>> That is the difference between a “String” type instance that can use
>>>> the ascii fast path and NSString backed “String” type instances.
>>>>
>>>> When the code translates the string (the map function that calls upper
>>>> case) it becomes a natively backed “String” that isASCII.
>>>>
>>>> // The strings in strings.txt are not strange in any way, they all
>>>> match ^[a-zA-Z ]+$.
>>>>
>>>> Set the isASCII bit and things are blazing fast.
>>>>
>>>> The other path is that componentsSeparatedByString(separator: String)
>>>> -> [String] returns an “[String]” where every “String” is backed by
>>>> “NSString”. When it comes to do comparison we call a NSString function on
>>>> the NSString and this is where the difference in performance comes from.
>>>>
>>>>
>>>>
>>>>
>>>> > On Mar 2, 2016, at 12:15 AM, Jens Persson via swift-dev <
>>>> swift-dev at swift.org> wrote:
>>>> >
>>>> > I guess it would be possible to isolate the problem by analyzing my
>>>> example, and thereby making it possible to write a smaller benchmark which
>>>> doesn't need that big textfile. I didn't manage to do that however, hence
>>>> my question about the problem here.
>>>> > /Jens
>>>> >
>>>> > On Wed, Mar 2, 2016 at 9:11 AM, Jens Persson <jens at bitcycle.com>
>>>> wrote:
>>>> > Thanks, but I'll have to invite anyone with more time and experience
>>>> to do that.
>>>> >
>>>> > One (of many) thing(s) I wouldn't know how include/handle is the 1.3
>>>> MB text file for the particular arrayOfStrings that makes
>>>> Set(arrayOfStrings) slow. It seems a bit unnecessary/bloating to put it in
>>>> the code base.
>>>> >
>>>> > /Jens
>>>> >
>>>> > On Tue, Mar 1, 2016 at 11:58 PM, Nadav Rotem <nrotem at apple.com>
>>>> wrote:
>>>> > Hi Jens,
>>>> >
>>>> > Thanks for reporting this issue. I don’t know what’s going on but
>>>> we’ll take a look.
>>>> >
>>>> > I think it would be great if you could add this program as a
>>>> benchmark under swift/benchmarks/. This will allow us to track the
>>>> performance of this test and ensure that we don’t regress.
>>>> >
>>>> > Thanks,
>>>> > Nadav
>>>> >
>>>> >> On Mar 1, 2016, at 5:01 AM, Jens Persson via swift-dev <
>>>> swift-dev at swift.org> wrote:
>>>> >>
>>>> >> For some particular arrays of strings, creating a
>>>> Set<String>(arrayOfStrings) takes about 100 to 200 times longer than for
>>>> other very similar arrays of strings (equally many unique simple [a-zA-Z ]+
>>>> strings).
>>>> >>
>>>> >> I've put together a tiny program to demonstrate the problem here:
>>>> >>
>>>> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
>>>> >>
>>>> >> Is this due to a bug / performance problem in Set or can it be
>>>> explained (and solved) in some way?
>>>> >>
>>>> >> /Jens
>>>> >>
>>>> >> _______________________________________________
>>>> >> swift-dev mailing list
>>>> >> swift-dev at swift.org
>>>> >> https://lists.swift.org/mailman/listinfo/swift-dev
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>>> > http://www.bitcycle.com/
>>>> > Phone: +46-73-753 24 62
>>>> > E-mail: jens at bitcycle.com
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>>> > http://www.bitcycle.com/
>>>> > Phone: +46-73-753 24 62
>>>> > E-mail: jens at bitcycle.com
>>>> >
>>>> > _______________________________________________
>>>> > swift-dev mailing list
>>>> > swift-dev at swift.org
>>>> > https://lists.swift.org/mailman/listinfo/swift-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>>> http://www.bitcycle.com/
>>> Phone: +46-73-753 24 62
>>> E-mail: jens at bitcycle.com
>>>
>>>
>>
>>
>> --
>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
>> http://www.bitcycle.com/
>> Phone: +46-73-753 24 62
>> E-mail: jens at bitcycle.com
>>
>>
>
>
> --
> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> http://www.bitcycle.com/
> Phone: +46-73-753 24 62
> E-mail: jens at bitcycle.com
>
>
--
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160304/b86d8171/attachment.html>
More information about the swift-dev
mailing list