[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings

Jens Persson jens at bitcycle.com
Sat Mar 5 02:40:10 CST 2016


Ah, great!

On Saturday, March 5, 2016, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:

> Fixed in
> https://github.com/apple/swift/commit/ab5defc1248d0703215a810f705da4e717d10600
> .
>
> The original test case runs now as:
>
> ./TestStrings
> Loaded 88380 strings into an array of strings.
> Created set of 88380 strings in 0.0794822110037785 seconds.
> Created set of 88380 strings in 0.110139190001064 seconds
>
>
> > On Mar 4, 2016, at 8:57 AM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> >
> > Thanks!
> >
> > On Fri, Mar 4, 2016 at 4:25 PM, Arnold Schwaighofer <
> aschwaighofer at apple.com <javascript:;>> wrote:
> > Tracked by: https://bugs.swift.org/browse/SR-877
> >
> >
> > > On Mar 4, 2016, at 6:10 AM, Arnold via swift-dev <swift-dev at swift.org
> <javascript:;>> wrote:
> > >
> > > Yes. NSString's hash is not the same function as String's hashValue.
> To make matters more confusing when you call hashValue  on NSString type
> you actually get
> > > NSStrings hash.
> > >
> > > I have confirmed that we are indeed getting a lot more collisions with
> the bad Strings  and I will be looking into this.
> > >
> > >
> > > Sent from my iPhone
> > >
> > > On Mar 4, 2016, at 5:18 AM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> > >
> > >> If you make a WrappedString type where you can choose to use .hash
> instead of .hashValue, then the below test(wrappedStrings) will be fast if
> .hash is used, and slow if .hashValue is used. Demo-program:
> > >>
> > >> import QuartzCore
> > >> struct WrappedString : Hashable {
> > >>     let wrapped : String
> > >>     init(_ s: String) { wrapped = s }
> > >>     var hashValue: Int {
> > >>         return wrapped.hash // <-- NOTE: Will make
> test(wrappedStrings) fast.
> > >>         // return wrapped.hashValue // <-- NOTE: Will make
> test(wrappedStrings) slow.
> > >>     }
> > >> }
> > >> func ==(lhs: WrappedString, rhs: WrappedString) -> Bool { return
> lhs.wrapped == rhs.wrapped }
> > >> func test<T: Hashable>(array: [T]) {
> > >>     let t0 = CACurrentMediaTime()
> > >>     let set = Set(array)
> > >>     let t1 = CACurrentMediaTime()
> > >>     print("Created set of", set.count, " elements in", t1 - t0,
> "seconds.")
> > >> }
> > >> // Download http://sloppyfocus.com/strings.zip, unzip and set the
> correct path here:
> > >> let path = "/Users/jens/strings.txt"
> > >> do {
> > >>     let strings = try String.init(contentsOfFile:
> path).componentsSeparatedByString("\n")
> > >>     print("Loaded", strings.count, "strings.") // Prints: Loaded
> 88379 strings.
> > >>     let wrappedStrings = strings.map { WrappedString($0) }
> > >>     test(wrappedStrings) // Prints: Created set of 88379 strings in
> 0.0923392080003396 seconds.
> > >>     test(strings)        // Prints: Created set of 88379 strings in
> 9.25633556599496 seconds.
> > >> }
> > >> catch let e { fatalError(String(e)) }
> > >>
> > >> /Jens
> > >>
> > >>
> > >> On Fri, Mar 4, 2016 at 11:13 AM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> > >> (Correction: 0.07 seconds, not 0.7 seconds.)
> > >>
> > >> On Fri, Mar 4, 2016 at 10:33 AM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> > >> Another thing I noticed is that making a Set<NSString> (rather than a
> Set<String>) is fast ...
> > >>
> > >> That is (for those same particular strings) this is fast (0.7 s):
> > >> let setOfNSStrings = Set<NSString>(strings.map { $0 as NSString })
> > >>
> > >> while this is slow (9.5 s):
> > >> let setOfNSStrings = Set<String>(strings)
> > >>
> > >> /Jens
> > >>
> > >>
> > >> On Fri, Mar 4, 2016 at 9:48 AM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> > >> Are you already tracking/looking into this in someway or should I
> file a bug?
> > >> /Jens
> > >>
> > >> On Thu, Mar 3, 2016 at 5:55 AM, Arnold <aschwaighofer at apple.com
> <javascript:;>> wrote:
> > >> Yes this does not explain what you are seeing. (I could repeat this
> on my end)
> > >>
> > >> If you look at the two runs under instruments: one run with the slow
> strings file and one with the fast strings file (I tested with space
> appended to each line); you will notice that we spend a lot more time in
> the Set's find function doing NSString comparisons. The only explanation I
> have for this is hash collisions leading to more comparisons as we are
> probing non empty buckets. Why one of the files would trigger this and the
> other not makes no sense to me.
> > >> This will need looking at.
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On Mar 2, 2016, at 6:43 PM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> > >>
> > >>> Just thought I should mention that I tried another thing:
> > >>>
> > >>> Using my original example:
> > >>>
> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
> > >>> Keeping the code as it is but adding a space to the end of every
> line in the actual text file (strings.txt).
> > >>>
> > >>> This actually makes the slow test(strings) run almost as fast as the
> fast test(caseSwappedStrings), 0.09 seconds, so about a hundred times
> faster just because the text file has an extra space at the end of each
> line.
> > >>>
> > >>> I just can't see how Arnold Schwaighofer's explanation explains that
> ... An added space character at the end of each line of that text file is
> just one more ASCII character, how can that make test(strings) so much
> faster?
> > >>>
> > >>> And other operations on the text file makes the slow test faster
> too, like eg:
> > >>> Inserting a space as the first char of each line, 0.9 seconds.
> > >>> Removing all spaces, 0.9 seconds.
> > >>> Replacing all spaces with "-" (replacing the dashes back to spaces
> made it slow again), 0.9 seconds.
> > >>> Replacing all "a" with "*", 0.9 seconds.
> > >>> Swapping case (now in the actual file instead of programmatically),
> 2.3 seconds.
> > >>> (I only tried operations that kept the resulting strings unique and
> their characters plain ASCII.)
> > >>>
> > >>> /Jens
> > >>>
> > >>> On Wed, Mar 2, 2016 at 6:28 PM, Arnold Schwaighofer <
> aschwaighofer at apple.com <javascript:;>> wrote:
> > >>> That is the difference between a “String” type instance that can use
> the ascii fast path and NSString backed “String” type instances.
> > >>>
> > >>> When the code translates the string (the map function that calls
> upper case) it becomes a natively backed “String” that isASCII.
> > >>>
> > >>> // The strings in strings.txt are not strange in any way, they all
> match ^[a-zA-Z ]+$.
> > >>>
> > >>> Set the isASCII bit and things are blazing fast.
> > >>>
> > >>> The other path is that componentsSeparatedByString(separator:
> String) -> [String] returns an “[String]” where every “String” is backed by
> “NSString”. When it comes to do comparison we call a NSString function on
> the NSString and this is where the difference in performance comes from.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> > On Mar 2, 2016, at 12:15 AM, Jens Persson via swift-dev <
> swift-dev at swift.org <javascript:;>> wrote:
> > >>> >
> > >>> > I guess it would be possible to isolate the problem by analyzing
> my example, and thereby making it possible to write a smaller benchmark
> which doesn't need that big textfile. I didn't manage to do that however,
> hence my question about the problem here.
> > >>> > /Jens
> > >>> >
> > >>> > On Wed, Mar 2, 2016 at 9:11 AM, Jens Persson <jens at bitcycle.com
> <javascript:;>> wrote:
> > >>> > Thanks, but I'll have to invite anyone with more time and
> experience to do that.
> > >>> >
> > >>> > One (of many) thing(s) I wouldn't know how include/handle is the
> 1.3 MB text file for the particular arrayOfStrings that makes
> Set(arrayOfStrings) slow. It seems a bit unnecessary/bloating to put it in
> the code base.
> > >>> >
> > >>> > /Jens
> > >>> >
> > >>> > On Tue, Mar 1, 2016 at 11:58 PM, Nadav Rotem <nrotem at apple.com
> <javascript:;>> wrote:
> > >>> > Hi Jens,
> > >>> >
> > >>> > Thanks for reporting this issue. I don’t know what’s going on but
> we’ll take a look.
> > >>> >
> > >>> > I think it would be great if you could add this program as a
> benchmark under swift/benchmarks/.  This will allow us to track the
> performance of this test and ensure that we don’t regress.
> > >>> >
> > >>> > Thanks,
> > >>> > Nadav
> > >>> >
> > >>> >> On Mar 1, 2016, at 5:01 AM, Jens Persson via swift-dev <
> swift-dev at swift.org <javascript:;>> wrote:
> > >>> >>
> > >>> >> For some particular arrays of strings, creating a
> Set<String>(arrayOfStrings) takes about 100 to 200 times longer than for
> other very similar arrays of strings (equally many unique simple [a-zA-Z ]+
> strings).
> > >>> >>
> > >>> >> I've put together a tiny program to demonstrate the problem here:
> > >>> >>
> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
> > >>> >>
> > >>> >> Is this due to a bug / performance problem in Set or can it be
> explained (and solved) in some way?
> > >>> >>
> > >>> >> /Jens
> > >>> >>
> > >>> >> _______________________________________________
> > >>> >> swift-dev mailing list
> > >>> >> swift-dev at swift.org <javascript:;>
> > >>> >> https://lists.swift.org/mailman/listinfo/swift-dev
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >>> > http://www.bitcycle.com/
> > >>> > Phone: +46-73-753 24 62
> > >>> > E-mail: jens at bitcycle.com <javascript:;>
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >>> > http://www.bitcycle.com/
> > >>> > Phone: +46-73-753 24 62
> > >>> > E-mail: jens at bitcycle.com <javascript:;>
> > >>> >
> > >>> > _______________________________________________
> > >>> > swift-dev mailing list
> > >>> > swift-dev at swift.org <javascript:;>
> > >>> > https://lists.swift.org/mailman/listinfo/swift-dev
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >>> http://www.bitcycle.com/
> > >>> Phone: +46-73-753 24 62
> > >>> E-mail: jens at bitcycle.com <javascript:;>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >> http://www.bitcycle.com/
> > >> Phone: +46-73-753 24 62
> > >> E-mail: jens at bitcycle.com <javascript:;>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >> http://www.bitcycle.com/
> > >> Phone: +46-73-753 24 62
> > >> E-mail: jens at bitcycle.com <javascript:;>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >> http://www.bitcycle.com/
> > >> Phone: +46-73-753 24 62
> > >> E-mail: jens at bitcycle.com <javascript:;>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > >> http://www.bitcycle.com/
> > >> Phone: +46-73-753 24 62
> > >> E-mail: jens at bitcycle.com <javascript:;>
> > >>
> > > _______________________________________________
> > > swift-dev mailing list
> > > swift-dev at swift.org <javascript:;>
> > > https://lists.swift.org/mailman/listinfo/swift-dev
> >
> >
> >
> >
> > --
> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> > http://www.bitcycle.com/
> > Phone: +46-73-753 24 62
> > E-mail: jens at bitcycle.com <javascript:;>
> >
>
>

-- 
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160305/e84701a4/attachment.html>


More information about the swift-dev mailing list