[swift-evolution] [Draft proposal] Faster/lower-level external String initialization

Charles Kissinger crk at akkyra.com
Wed Feb 3 15:09:31 CST 2016


> On Feb 3, 2016, at 10:18 AM, Zach Waldowski <zach at waldowski.me> wrote:
> 
> Charles —
> 
> This certainly makes a lot of sense. My primary response is that I think
> the bad behavior of reserveCapacity should be reported by one of us as a
> bug.

From the little bit of poking around that I’ve done, the problem might not lie with reserveCapacity() itself. It *appears* that when calling String.append(_:Character), each character is converted to a String and then concatenated. So the savings in memory allocations that reserveCapacity() provides might be getting swamped out by the per-character allocations of temporary strings. I only took a quick look though, and haven’t verified this.

> My second thought is that the extra method should be proposed
> separately; whereas the current proposal surfaces things that already
> exist, what you need is purely additive but would require underlying
> changes. I don't see a point in implementing it now for API completeness
> if it can't make good on its performance; that's the exact predicament
> we're in today with reserveCapacity and append/appendContentsOf.

Fair enough. We can always revisit the “API completeness” argument when the proposal actual undergoes review.

Thanks again for putting together the proposal and code!

—CK

> 
> Zach Waldowski
> zach at waldowski.me
> 
> On Tue, Feb 2, 2016, at 03:24 AM, Charles Kissinger wrote:
>> 
>>> On Feb 1, 2016, at 8:53 PM, Zach Waldowski via swift-evolution <swift-evolution at swift.org> wrote:
>>> 
>>> That'd seem reasonable.
>>> 
>>> I guess I'm not entirely sold on the benefit of the extra method here,
>>> and all the weight on maintenance that'd entail. Obviously I get the
>>> benefit of skipping the storage reservation, but I can't imagine a
>>> scenario where building something up using
>>> `appendContentsOf(_:encoding:)` would be that much better then plumb
>>> concatenation. I'd love to hear an example, though.
>> 
>> Zach,
>> 
>> Here’s a real-world example:
>> 
>> I have a case where I am assembling a String from five short ASCII
>> character sequences scattered around different parts of each line of an
>> input file. The maximum length of the resulting String is predictable, so
>> in an ideal world I could create an empty string, call
>> String.reserveCapacity() and then suck up all of the ASCII character
>> sequences with a series of String.appendContentsOf(_, encoding:), all
>> with just a single memory allocation per String. (But as you mentioned,
>> it would appear to require a significant change in the String
>> implementation for things to be that efficient.)
>> 
>> Obviously, the alternative approach of instantiating a string for each of
>> the subsequences and concatenating them would involve a minimum of six
>> allocations. It matters in my case, because the input files are large
>> (sometimes millions of lines).
>> 
>> Right now, my approach is to allocate a byte buffer, assemble the
>> substrings in it, null-terminate and call String.fromCString(). That
>> performs reasonably well, but it still involves an extra copy of the
>> characters and the byte buffer allocation, neither of which would be
>> necessary with the String.appendContentsOf(_, encoding:) method. 
>> 
>> I hope that example was clear. If single-character String.append() became
>> more efficient, that would reduce the need for the function I’m
>> proposing. And if Swift strings were to get short-string optimization it
>> would make this all much easier, but I have no idea if that is in the
>> cards.
>> 
>> —CK
>> 
>>> 
>>> Cheers!
>>> Zach Waldowski
>>> zach at waldowski.me
>>> 
>>> On Mon, Feb 1, 2016, at 08:36 PM, Charles Kissinger via swift-evolution
>>> wrote:
>>>> 
>>>>> On Feb 1, 2016, at 2:07 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
>>>>> 
>>>>> 
>>>>> on Mon Feb 01 2016, Zach Waldowski <swift-evolution at swift.org> wrote:
>>>>> 
>>>>>> Due to the semantics of _StringCore and _StringBuffer (as far as I
>>>>>> understand them), such a method would not be more efficient than
>>>>>> creating another String with the new initializer and concatenating the
>>>>>> two, and would require more significant plumbing changes to
>>>>>> _StringBuffer.
>>>>> 
>>>>> We are very interested in making significant plumbing changes to String, FWIW.
>>>>> 
>>>> 
>>>> In that case, perhaps it would make sense to add String.append() for code
>>>> unit sequences over the exiting plumbing just for completeness of the
>>>> API, on the assumption that efficiency would come later when String gets
>>>> its makeover.
>>>> 
>>>> —CK
>>>> 
>>>>>> 
>>>>>> 
>>>>>> It would be good to shop around for this proposal, though; maybe if
>>>>>> someone on the core team wants to chime in.
>>>>>> 
>>>>>> Cheers,
>>>>>> Zachary Waldowski
>>>>>> zach at waldowski.me
>>>>>> 
>>>>>> On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:
>>>>>>> It occurred to me that this proposal provides a way to efficiently
>>>>>>> initialize Strings from UTF code unit sequences, but it doesn’t provide a
>>>>>>> way to *append* code unit sequences to existing strings. String has an
>>>>>>> existing method to append Character sequences:
>>>>>>> 
>>>>>>> String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>>>>>>> Character>(_: S)
>>>>>>> 
>>>>>>> The equivalent for code units would presumably be:
>>>>>>> 
>>>>>>> String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
>>>>>>> where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
>>>>>>> Encoding.Type)
>>>>>>> 
>>>>>>> Is there any interest in adding that to the proposal? It would only have
>>>>>>> a lot of value if it could be implemented in a more efficient way than
>>>>>>> just calling String.Append() for each decoded Character. From looking at
>>>>>>> the code, that might not be straightforward.
>>>>>>> 
>>>>>>> —CK
>>>>>>> 
>>>>>>>> On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>>>> 
>>>>>>>> Since this seems to have gone quiet, and the code was already done, I've
>>>>>>>> posted the PR to Swift itself:
>>>>>>>> 
>>>>>>>> https://github.com/apple/swift/pull/1109
>>>>>>>> 
>>>>>>>> The existing proposal PR:
>>>>>>>> 
>>>>>>>> https://github.com/apple/swift-evolution/pull/101
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Sincerely,
>>>>>>>> Zachary Waldowski
>>>>>>>> zach at waldowski.me
>>>>>>>> 
>>>>>>>> On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution
>>>>>>>> wrote:
>>>>>>>>> Thanks, Dave.
>>>>>>>>> 
>>>>>>>>> I definitely wasn't hard to convince on this. The change has already
>>>>>>>>> been made to the proposal, its PR, and the pending PR to the stdlib.
>>>>>>>>> 
>>>>>>>>> Cheers!
>>>>>>>>> Zach Waldowski
>>>>>>>>> zach at waldowski.me
>>>>>>>>> 
>>>>>>>>> On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> on Fri Jan 15 2016, Zach Waldowski via swift-evolution
>>>>>>>>>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Charles -
>>>>>>>>>>> 
>>>>>>>>>>> I shared the same concern, and mention them in the proposal. I thought
>>>>>>>>>>> `decode(_:as:)` to be too simple to the point of being
>>>>>>>>>>> non-descriptive,
>>>>>>>>>> 
>>>>>>>>>> The names of methods don't need to be descriptive.  It's the use-sites
>>>>>>>>>> (and secondarily, declarations) that need to be clear.  Trying to make
>>>>>>>>>> the names of methods descriptive by themselves just hurts readability at
>>>>>>>>>> the use-site.
>>>>>>>>>> 
>>>>>>>>>> -Dave
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> swift-evolution mailing list
>>>>>>>>>> swift-evolution at swift.org
>>>>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>>>>> _______________________________________________
>>>>>>>>> swift-evolution mailing list
>>>>>>>>> swift-evolution at swift.org
>>>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>>>> _______________________________________________
>>>>>>>> swift-evolution mailing list
>>>>>>>> swift-evolution at swift.org
>>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution at swift.org
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> 
>>>>> -- 
>>>>> -Dave
>>>>> 
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution at swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> 
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 



More information about the swift-evolution mailing list