[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings
Jens Persson
jens at bitcycle.com
Fri Mar 4 10:57:10 CST 2016
Thanks!
On Fri, Mar 4, 2016 at 4:25 PM, Arnold Schwaighofer <aschwaighofer at apple.com
> wrote:
> Tracked by: https://bugs.swift.org/browse/SR-877
>
>
> > On Mar 4, 2016, at 6:10 AM, Arnold via swift-dev <swift-dev at swift.org>
> wrote:
> >
> > Yes. NSString's hash is not the same function as String's hashValue. To
> make matters more confusing when you call hashValue on NSString type you
> actually get
> > NSStrings hash.
> >
> > I have confirmed that we are indeed getting a lot more collisions with
> the bad Strings and I will be looking into this.
> >
> >
> > Sent from my iPhone
> >
> > On Mar 4, 2016, at 5:18 AM, Jens Persson <jens at bitcycle.com> wrote:
> >
> >> If you make a WrappedString type where you can choose to use .hash
> instead of .hashValue, then the below test(wrappedStrings) will be fast if
> .hash is used, and slow if .hashValue is used. Demo-program:
> >>
> >> import QuartzCore
> >> struct WrappedString : Hashable {
> >> let wrapped : String
> >> init(_ s: String) { wrapped = s }
> >> var hashValue: Int {
> >> return wrapped.hash // <-- NOTE: Will make test(wrappedStrings)
> fast.
> >> // return wrapped.hashValue // <-- NOTE: Will make
> test(wrappedStrings) slow.
> >> }
> >> }
> >> func ==(lhs: WrappedString, rhs: WrappedString) -> Bool { return
> lhs.wrapped == rhs.wrapped }
> >> func test<T: Hashable>(array: [T]) {
> >> let t0 = CACurrentMediaTime()
> >> let set = Set(array)
> >> let t1 = CACurrentMediaTime()
> >> print("Created set of", set.count, " elements in", t1 - t0,
> "seconds.")
> >> }
> >> // Download http://sloppyfocus.com/strings.zip, unzip and set the
> correct path here:
> >> let path = "/Users/jens/strings.txt"
> >> do {
> >> let strings = try String.init(contentsOfFile:
> path).componentsSeparatedByString("\n")
> >> print("Loaded", strings.count, "strings.") // Prints: Loaded 88379
> strings.
> >> let wrappedStrings = strings.map { WrappedString($0) }
> >> test(wrappedStrings) // Prints: Created set of 88379 strings in
> 0.0923392080003396 seconds.
> >> test(strings) // Prints: Created set of 88379 strings in
> 9.25633556599496 seconds.
> >> }
> >> catch let e { fatalError(String(e)) }
> >>
> >> /Jens
> >>
> >>
> >> On Fri, Mar 4, 2016 at 11:13 AM, Jens Persson <jens at bitcycle.com>
> wrote:
> >> (Correction: 0.07 seconds, not 0.7 seconds.)
> >>
> >> On Fri, Mar 4, 2016 at 10:33 AM, Jens Persson <jens at bitcycle.com>
> wrote:
> >> Another thing I noticed is that making a Set<NSString> (rather than a
> Set<String>) is fast ...
> >>
> >> That is (for those same particular strings) this is fast (0.7 s):
> >> let setOfNSStrings = Set<NSString>(strings.map { $0 as NSString })
> >>
> >> while this is slow (9.5 s):
> >> let setOfNSStrings = Set<String>(strings)
> >>
> >> /Jens
> >>
> >>
> >> On Fri, Mar 4, 2016 at 9:48 AM, Jens Persson <jens at bitcycle.com> wrote:
> >> Are you already tracking/looking into this in someway or should I file
> a bug?
> >> /Jens
> >>
> >> On Thu, Mar 3, 2016 at 5:55 AM, Arnold <aschwaighofer at apple.com> wrote:
> >> Yes this does not explain what you are seeing. (I could repeat this on
> my end)
> >>
> >> If you look at the two runs under instruments: one run with the slow
> strings file and one with the fast strings file (I tested with space
> appended to each line); you will notice that we spend a lot more time in
> the Set's find function doing NSString comparisons. The only explanation I
> have for this is hash collisions leading to more comparisons as we are
> probing non empty buckets. Why one of the files would trigger this and the
> other not makes no sense to me.
> >> This will need looking at.
> >>
> >> Sent from my iPhone
> >>
> >> On Mar 2, 2016, at 6:43 PM, Jens Persson <jens at bitcycle.com> wrote:
> >>
> >>> Just thought I should mention that I tried another thing:
> >>>
> >>> Using my original example:
> >>>
> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
> >>> Keeping the code as it is but adding a space to the end of every line
> in the actual text file (strings.txt).
> >>>
> >>> This actually makes the slow test(strings) run almost as fast as the
> fast test(caseSwappedStrings), 0.09 seconds, so about a hundred times
> faster just because the text file has an extra space at the end of each
> line.
> >>>
> >>> I just can't see how Arnold Schwaighofer's explanation explains that
> ... An added space character at the end of each line of that text file is
> just one more ASCII character, how can that make test(strings) so much
> faster?
> >>>
> >>> And other operations on the text file makes the slow test faster too,
> like eg:
> >>> Inserting a space as the first char of each line, 0.9 seconds.
> >>> Removing all spaces, 0.9 seconds.
> >>> Replacing all spaces with "-" (replacing the dashes back to spaces
> made it slow again), 0.9 seconds.
> >>> Replacing all "a" with "*", 0.9 seconds.
> >>> Swapping case (now in the actual file instead of programmatically),
> 2.3 seconds.
> >>> (I only tried operations that kept the resulting strings unique and
> their characters plain ASCII.)
> >>>
> >>> /Jens
> >>>
> >>> On Wed, Mar 2, 2016 at 6:28 PM, Arnold Schwaighofer <
> aschwaighofer at apple.com> wrote:
> >>> That is the difference between a “String” type instance that can use
> the ascii fast path and NSString backed “String” type instances.
> >>>
> >>> When the code translates the string (the map function that calls upper
> case) it becomes a natively backed “String” that isASCII.
> >>>
> >>> // The strings in strings.txt are not strange in any way, they all
> match ^[a-zA-Z ]+$.
> >>>
> >>> Set the isASCII bit and things are blazing fast.
> >>>
> >>> The other path is that componentsSeparatedByString(separator: String)
> -> [String] returns an “[String]” where every “String” is backed by
> “NSString”. When it comes to do comparison we call a NSString function on
> the NSString and this is where the difference in performance comes from.
> >>>
> >>>
> >>>
> >>>
> >>> > On Mar 2, 2016, at 12:15 AM, Jens Persson via swift-dev <
> swift-dev at swift.org> wrote:
> >>> >
> >>> > I guess it would be possible to isolate the problem by analyzing my
> example, and thereby making it possible to write a smaller benchmark which
> doesn't need that big textfile. I didn't manage to do that however, hence
> my question about the problem here.
> >>> > /Jens
> >>> >
> >>> > On Wed, Mar 2, 2016 at 9:11 AM, Jens Persson <jens at bitcycle.com>
> wrote:
> >>> > Thanks, but I'll have to invite anyone with more time and experience
> to do that.
> >>> >
> >>> > One (of many) thing(s) I wouldn't know how include/handle is the 1.3
> MB text file for the particular arrayOfStrings that makes
> Set(arrayOfStrings) slow. It seems a bit unnecessary/bloating to put it in
> the code base.
> >>> >
> >>> > /Jens
> >>> >
> >>> > On Tue, Mar 1, 2016 at 11:58 PM, Nadav Rotem <nrotem at apple.com>
> wrote:
> >>> > Hi Jens,
> >>> >
> >>> > Thanks for reporting this issue. I don’t know what’s going on but
> we’ll take a look.
> >>> >
> >>> > I think it would be great if you could add this program as a
> benchmark under swift/benchmarks/. This will allow us to track the
> performance of this test and ensure that we don’t regress.
> >>> >
> >>> > Thanks,
> >>> > Nadav
> >>> >
> >>> >> On Mar 1, 2016, at 5:01 AM, Jens Persson via swift-dev <
> swift-dev at swift.org> wrote:
> >>> >>
> >>> >> For some particular arrays of strings, creating a
> Set<String>(arrayOfStrings) takes about 100 to 200 times longer than for
> other very similar arrays of strings (equally many unique simple [a-zA-Z ]+
> strings).
> >>> >>
> >>> >> I've put together a tiny program to demonstrate the problem here:
> >>> >>
> http://sloppyfocus.com/slowSetFromParticularButSeeminglyNormalArrayOfStrings.html
> >>> >>
> >>> >> Is this due to a bug / performance problem in Set or can it be
> explained (and solved) in some way?
> >>> >>
> >>> >> /Jens
> >>> >>
> >>> >> _______________________________________________
> >>> >> swift-dev mailing list
> >>> >> swift-dev at swift.org
> >>> >> https://lists.swift.org/mailman/listinfo/swift-dev
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >>> > http://www.bitcycle.com/
> >>> > Phone: +46-73-753 24 62
> >>> > E-mail: jens at bitcycle.com
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >>> > http://www.bitcycle.com/
> >>> > Phone: +46-73-753 24 62
> >>> > E-mail: jens at bitcycle.com
> >>> >
> >>> > _______________________________________________
> >>> > swift-dev mailing list
> >>> > swift-dev at swift.org
> >>> > https://lists.swift.org/mailman/listinfo/swift-dev
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >>> http://www.bitcycle.com/
> >>> Phone: +46-73-753 24 62
> >>> E-mail: jens at bitcycle.com
> >>>
> >>
> >>
> >>
> >> --
> >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >> http://www.bitcycle.com/
> >> Phone: +46-73-753 24 62
> >> E-mail: jens at bitcycle.com
> >>
> >>
> >>
> >>
> >> --
> >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >> http://www.bitcycle.com/
> >> Phone: +46-73-753 24 62
> >> E-mail: jens at bitcycle.com
> >>
> >>
> >>
> >>
> >> --
> >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >> http://www.bitcycle.com/
> >> Phone: +46-73-753 24 62
> >> E-mail: jens at bitcycle.com
> >>
> >>
> >>
> >>
> >> --
> >> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> >> http://www.bitcycle.com/
> >> Phone: +46-73-753 24 62
> >> E-mail: jens at bitcycle.com
> >>
> > _______________________________________________
> > swift-dev mailing list
> > swift-dev at swift.org
> > https://lists.swift.org/mailman/listinfo/swift-dev
>
>
--
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160304/7076e659/attachment.html>
More information about the swift-dev
mailing list