[swift-evolution] Strings in Swift 4
dabrahams at apple.com
Fri Feb 24 15:40:28 CST 2017
Sent from my moss-covered three-handled family gradunza
> On Feb 23, 2017, at 2:04 PM, Ted F.A. van Gaalen <tedvgiosdev at gmail.com> wrote:
>> On 23 Feb 2017, at 02:24, Dave Abrahams <dabrahams at apple.com> wrote:
>> Equally a non-starter. All known threadsafe schemes that require caches to be updated upon non-mutating operations have horrible performance issues, and further this would penalize all string code by reserving space for the cache and filling it even for the vast majority of operations that don't require random access.
> Well, maybe “caching” is not the right description for what I've suggested.
> It is more like:
> let all strings be stored as they are now, but as soon as you want to work with
> random accessing parts of a string just “lift the string out of normal optimised string storage”
> and then add (temporarily) a Character array so one can work with this array directly ”
That's a cache.
> which implies that all other strings remain as they are. ergo: efficiency
> is only reduced for the “elevated” strings,
You have to add that temporary array somewhere. The performance of every string is penalized for that storage, and also for the cost of throwing it out upon mutation. Every branch counts.
> Using e.g. str.freeSpace(), if necessary, would then place the String back
> in its normal storage domain, thereby disposing the Character array
> associated with it.
Avoiding hidden dynamic storage overhead that needs to be freed is an explicit goal of the design (see the section on String and Substring).
>> Trust me, we've gotten lots of such suggestions and thought through the implications of each one very carefully.
> That’s good, because it means, that a lot of people are interested in this subject and wish to help.
> Of course you’ll get many of suggestions that might not be very useful,
> perhaps like this one... but sometimes suddenly someone
> comes along with things that might never have occurred to you.
> That is the beautiful nature of ideas…
But at some point, I hope you'll understand, I also have to say that I think all the simple schemes have been adequately explored and the complex ones all seem to have this basic property of relying on caches, which has unacceptable performance, complexity, and, yes, usability costs. Analyzing and refuting each one in detail begins to be a waste of time after that. I'm not really willing to go further down this road unless someone has an implementation and experimental evidence that demonstrates it as non-problematic.
>> I'm afraid you will have to accept being disappointed about this.
> Well, like most developers, I am a stubborn kind of guy..
> Luckily Swift is very flexible like Lego, so I rolled my own convenience struct.
> If I need direct access on a string I simply copy the string to it.
> it permits things like this: (and growing)
> let strabc = "abcdefghjiklmnopqrstuvwxyz"
> let strABC = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> var abc = TGString(strabc)
> var ABC = TGString(strABC)
> func test()
> // as in Basic: left$, mid$, right$
> // ranges and concatenation:
> print(abc.left(5) + ABC.mid(6,6) + abc[10...25])
> // eat anything:
> let d:Double = -3.14159
> let n:Int = 1234
> let str = abc[15..<17].asString // Copy to to normal Swift String
> let s = "\(abc[12..<20])" // interpolate to normal Swift String.
> abc[3..<5] = TGString("34") // if lengths don't match:
> abc[8...9] = ABC[24...25] // length of dest. string is altered.
> abc = TGString("$$$$") // if src l > 1 will insert remainder after dest.12 here
> abc = TGString("") // empty removes character at pos.
> abc.insert(at: 3, string: ABC[0..<3])
> kinda hoped that this could be builtin in Swift strings
> Anyway, I’ve made myself what I wanted, which happily co-exists
> alongside normal Swift strings. Performance and storage
> aspects of my struct TGString are not very important, because
> I am not using this on thousands of strings.
> Simply want to use a string as a plain array, that’s all,
> which is implemented in almost every PL on this planet.
>> More generally, there's a reason that the collection model has bidirectional and random access distinctions: important data structures are inherently not random access.
> I don’t understand the above line: definition of “important data structures” <> “inherently”
Important data structures are those from the classical CS literature upon which every practical programming language (and even modern CPU hardware) is based, e.g. hash tables. Based on the properties of modern string processing, strings fall into the same category. "Inherent" means that performance characteristics are tied to the structure of the data or problem being solved. You can't sort in better than O(N log N) worst case (mythical quantum computers don't count here), and that's been proven mathematically. Similarly it's easy to prove that the constraints of our problem mean that counting the characters in a string will always be O(N) worst case where N is the length of the representation. That means strings are inherently not random access.
>> Heroic attempts to present the illusion that they are randomly-accessible are not going to fly.
> ?? Accessing discrete elements directly
All collections have direct access via indices. You mean randomly, via arbitrary integers.
> in an array is not an illusion to me.
> (e.g. I took the 4th and 7th eggs from the container)
It's not an illusion when they're stored in an array.
If you have to walk down an aisle of differently sized cereal boxes to pick the 5th box of SuperBoomCrisp Flakes off the shelf in the grocery store, that's not random access (even if you're willing to drop the boxes into an array for later lookups as you go, as you're proposing). That's what Strings are like.
>> These abstractions always break down, leaking the true non-random-access nature in often unpredictable ways, penalizing lots of code for the sake of a very few use-cases, and introducing complexity that is hard for the optimizer to digest and makes it painful (sometimes impossible) to grow and evolve the library.
> Is an Array an abstraction? of what?
A randomly-accessible homogeneous tail growable collection. But the abstraction in question here isn't Array; it's RandomAccessCollection.
> I don’t get this either. most components in the real world can be accessed randomly.
Actually that's far from being true in the real world. See the grocery store above. Computer memory is very unlike most things in the real world. Ask any roboticist.
>> This should be seen as a general design philosophy: Swift presents abstractions that harmonize with, rather than hide, the true nature of things.
> The true nature of things is a very vague and subjective criterium,
Not at all; see my explanation above.
> how can you harmonise with that, let alone with abstractions?
> e.g. for me: “the true nature of things” for an array is that it has direct accessible discrete elements…
Arrays definitely support random access. Strings are not arrays.
> Sorry, with respect, we have a difference of opinion here.
That's fine. Please don't be offended that I don't wish to argue it further. It's been an interesting exercise while I'm on vacation and I hoped it would lay out some general principles that would be useful to others in future even if you are not convinced, but when I get back to work next week I'll have to focus on other things.
> Thanks btw for the link to this article about tagged pointers, very interesting.
> it inspired me to (have) read other things in this domain as well.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the swift-evolution