[swift-evolution] [Draft proposal] Faster/lower-level external String initialization

Charles Kissinger crk at akkyra.com
Tue Feb 2 02:24:19 CST 2016


> On Feb 1, 2016, at 8:53 PM, Zach Waldowski via swift-evolution <swift-evolution at swift.org> wrote:
> 
> That'd seem reasonable.
> 
> I guess I'm not entirely sold on the benefit of the extra method here,
> and all the weight on maintenance that'd entail. Obviously I get the
> benefit of skipping the storage reservation, but I can't imagine a
> scenario where building something up using
> `appendContentsOf(_:encoding:)` would be that much better then plumb
> concatenation. I'd love to hear an example, though.

Zach,

Here’s a real-world example:

I have a case where I am assembling a String from five short ASCII character sequences scattered around different parts of each line of an input file. The maximum length of the resulting String is predictable, so in an ideal world I could create an empty string, call String.reserveCapacity() and then suck up all of the ASCII character sequences with a series of String.appendContentsOf(_, encoding:), all with just a single memory allocation per String. (But as you mentioned, it would appear to require a significant change in the String implementation for things to be that efficient.)

Obviously, the alternative approach of instantiating a string for each of the subsequences and concatenating them would involve a minimum of six allocations. It matters in my case, because the input files are large (sometimes millions of lines).

Right now, my approach is to allocate a byte buffer, assemble the substrings in it, null-terminate and call String.fromCString(). That performs reasonably well, but it still involves an extra copy of the characters and the byte buffer allocation, neither of which would be necessary with the String.appendContentsOf(_, encoding:) method. 

I hope that example was clear. If single-character String.append() became more efficient, that would reduce the need for the function I’m proposing. And if Swift strings were to get short-string optimization it would make this all much easier, but I have no idea if that is in the cards.

—CK

> 
> Cheers!
> Zach Waldowski
> zach at waldowski.me
> 
> On Mon, Feb 1, 2016, at 08:36 PM, Charles Kissinger via swift-evolution
> wrote:
>> 
>>> On Feb 1, 2016, at 2:07 PM, Dave Abrahams via swift-evolution <swift-evolution at swift.org> wrote:
>>> 
>>> 
>>> on Mon Feb 01 2016, Zach Waldowski <swift-evolution at swift.org> wrote:
>>> 
>>>> Due to the semantics of _StringCore and _StringBuffer (as far as I
>>>> understand them), such a method would not be more efficient than
>>>> creating another String with the new initializer and concatenating the
>>>> two, and would require more significant plumbing changes to
>>>> _StringBuffer.
>>> 
>>> We are very interested in making significant plumbing changes to String, FWIW.
>>> 
>> 
>> In that case, perhaps it would make sense to add String.append() for code
>> unit sequences over the exiting plumbing just for completeness of the
>> API, on the assumption that efficiency would come later when String gets
>> its makeover.
>> 
>> —CK
>> 
>>>> 
>>>> 
>>>> It would be good to shop around for this proposal, though; maybe if
>>>> someone on the core team wants to chime in.
>>>> 
>>>> Cheers,
>>>> Zachary Waldowski
>>>> zach at waldowski.me
>>>> 
>>>> On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:
>>>>> It occurred to me that this proposal provides a way to efficiently
>>>>> initialize Strings from UTF code unit sequences, but it doesn’t provide a
>>>>> way to *append* code unit sequences to existing strings. String has an
>>>>> existing method to append Character sequences:
>>>>> 
>>>>> String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>>>>> Character>(_: S)
>>>>> 
>>>>> The equivalent for code units would presumably be:
>>>>> 
>>>>> String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
>>>>> where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
>>>>> Encoding.Type)
>>>>> 
>>>>> Is there any interest in adding that to the proposal? It would only have
>>>>> a lot of value if it could be implemented in a more efficient way than
>>>>> just calling String.Append() for each decoded Character. From looking at
>>>>> the code, that might not be straightforward.
>>>>> 
>>>>> —CK
>>>>> 
>>>>>> On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution at swift.org> wrote:
>>>>>> 
>>>>>> Since this seems to have gone quiet, and the code was already done, I've
>>>>>> posted the PR to Swift itself:
>>>>>> 
>>>>>> https://github.com/apple/swift/pull/1109
>>>>>> 
>>>>>> The existing proposal PR:
>>>>>> 
>>>>>> https://github.com/apple/swift-evolution/pull/101
>>>>>> 
>>>>>> -- 
>>>>>> Sincerely,
>>>>>> Zachary Waldowski
>>>>>> zach at waldowski.me
>>>>>> 
>>>>>> On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution
>>>>>> wrote:
>>>>>>> Thanks, Dave.
>>>>>>> 
>>>>>>> I definitely wasn't hard to convince on this. The change has already
>>>>>>> been made to the proposal, its PR, and the pending PR to the stdlib.
>>>>>>> 
>>>>>>> Cheers!
>>>>>>> Zach Waldowski
>>>>>>> zach at waldowski.me
>>>>>>> 
>>>>>>> On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> on Fri Jan 15 2016, Zach Waldowski via swift-evolution
>>>>>>>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:
>>>>>>>> 
>>>>>>>>> Charles -
>>>>>>>>> 
>>>>>>>>> I shared the same concern, and mention them in the proposal. I thought
>>>>>>>>> `decode(_:as:)` to be too simple to the point of being
>>>>>>>>> non-descriptive,
>>>>>>>> 
>>>>>>>> The names of methods don't need to be descriptive.  It's the use-sites
>>>>>>>> (and secondarily, declarations) that need to be clear.  Trying to make
>>>>>>>> the names of methods descriptive by themselves just hurts readability at
>>>>>>>> the use-site.
>>>>>>>> 
>>>>>>>> -Dave
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> swift-evolution mailing list
>>>>>>>> swift-evolution at swift.org
>>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>>> _______________________________________________
>>>>>>> swift-evolution mailing list
>>>>>>> swift-evolution at swift.org
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution at swift.org
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> 
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution at swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>> 
>>> -- 
>>> -Dave
>>> 
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution



More information about the swift-evolution mailing list