[swift-dev] Very slow Set<String>(arrayOfStrings) for some arrayOfStrings

Jens Persson jens at bitcycle.com
Wed Mar 2 18:02:13 CST 2016


The following slight modification of the extension, however, makes
test(strings) run as fast as test(caseSwappedStrings) (ie 0.07 seconds):

extension String {
    func componentsSeparatedByNewLineCharacter() -> [String] {
        var lines = [String]()
        var currStr = String.UnicodeScalarView()
        let newLineUCS = UnicodeScalar("\n")
        for ucs in self.unicodeScalars {
            switch ucs {
            case newLineUCS: lines.append(String(currStr) + " ");
currStr.removeAll()
            default: currStr.append(ucs)
            }
        }
        return lines
    }
}

Note that the only change is that a space is added to the string there ( +
" " ).

So I guess that for some reason adding that space sets the String's isASCII
bit ... But the strange thing is that if I try to remove the space, and no
matter how I do that, the test(strings)-test goes back to being 2.3 seconds
again (instead of 0.07 seconds).

It's almost as if there is a cached version of the original String (one
that has its isASCII bit cleared) that is being reused as soon as I modify
it in a way that makes it be the same as it was originally.
If so, I'm guessing that it is the String.init(contentsOfFile: path) that
is to blame (it's making an NSString-backed String with its isASCII bit
cleared), because I'm unable to reproduce the slow (now 2.3 seconds)
behavior without loading from disk.

/Jens


On Wed, Mar 2, 2016 at 10:02 PM, Jens Persson <jens at bitcycle.com> wrote:

> Interesting, thanks!
> I tried using this extension
> extension String {
>     func componentsSeparatedByNewLineCharacter() -> [String] {
>         var lines = [String]()
>         var currStr = String.UnicodeScalarView()
>         let newLineUCS = UnicodeScalar("\n")
>         for ucs in self.unicodeScalars {
>             switch ucs {
>             case newLineUCS: lines.append(String(currStr));
> currStr.removeAll()
>             default: currStr.append(ucs)
>             }
>         }
>         return lines
>     }
> }
> instead of componentsSeparatedByString("\n")
>
> This made the slow non-caseSwapped test(strings) run in 2.3 seconds
> instead of the previous 9.5 seconds, but that is still relatively slow
> compared to the 0.066 seconds of the test(caseSwappedStrings).
>
> Is there a way to make sure a String in Swift has the isASCII bit set
> (provided the original string contains only ASCII of course)?
>
> /Jens
>
>
> On Wed, Mar 2, 2016 at 7:24 PM, Daniel Duan via swift-dev <
> swift-dev at swift.org> wrote:
>
>> Arnold Schwaighofer via swift-dev <swift-dev <at> swift.org> writes:
>>
>> >
>> > That is the difference between a “String” type instance that can use the
>> > ascii fast path and NSString backed “String” type instances.
>> >
>>
>> This makes total sense now :) I was very mystified by this issue and
>> thought
>> it's a weird bias in the hashing function at some point.
>>
>> Thanks for the insight Arnold.
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-dev
>>
>
>
>
> --
> bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
> http://www.bitcycle.com/
> Phone: +46-73-753 24 62
> E-mail: jens at bitcycle.com
>
>


-- 
bitCycle AB | Smedjegatan 12 | 742 32 Östhammar | Sweden
http://www.bitcycle.com/
Phone: +46-73-753 24 62
E-mail: jens at bitcycle.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-dev/attachments/20160303/060effbc/attachment.html>


More information about the swift-dev mailing list