[swift-evolution] Strings in Swift 4

Ben Cohen ben_cohen at apple.com
Sun Feb 12 19:49:01 CST 2017


Hi Ted,

Dave is on vacation next two weeks so this is a reply on behalf of both him and me:

> On Feb 12, 2017, at 10:17, "Ted F.A. van Gaalen" <tedvgiosdev at gmail.com <mailto:tedvgiosdev at gmail.com>> wrote:

>> On 11 Feb 2017, at 18:33, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>> wrote:
>> 
>> All of these examples should be efficiently and expressively handled by the pattern matching API mentioned in the proposal. They definitely do not require random access or integer indexing. 
>> 
> Hi Dave, 
> then I am very interested to know how to unpack aString (e.g. read from a file record such as in the previous example:
> 123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453 ) 
> without using direct subscripting like str[n1…n2) ? 

If you look again at the code I sent previously, it demonstrates how you can use lengths to move forward through a string without needing random access for your particular use case.

> (which btw is for me the most straightforward and ideal method) 
> conditions:
>    -The source string contains fields of known position (offset) and length, concatenated together
>     without any separators (like in a CSV)
>   -the  contents of each field is unpredictable. 
>    which excludes the use of pattern-matching. 

Pattern matching isn’t just about matching known contents. Think of the regex “...”. This is a pattern matches any 3 characters. While full regex support is out of scope for the current discussions, the intention is for the pattern matching part of the proposal to handle this kind of use case.

>    -the source string needs to be unpacked in independent strings. 
> 
> I made this example: (the comments also stress my point) 
> 

Here is another way of implementing your example in a form that doesn’t require random access.

Putting aside pattern matching for now, assume that there is an API on String that lets you drop a specific-length prefix from a Substring (for now in Swift 3, that's a String). An API like this (probably taking any pattern as its argument, not just a length) is likely to be proposed to evolution soon once we move into that phase of the 4.0 String project.

// this particular API/implementation for demonstration only,
// not necessarily quite what will be proposed
extension Collection where SubSequence == Self {
    /// Drop n elements from the front of `self` in-place,
    /// returning the dropped prefix.
    mutating func dropPrefix(_ n: IndexDistance) -> SubSequence {
        // nature of error handling/swallowing/trapping/optional
        // returning here TBD...
        let newStart = index(startIndex, offsetBy: n)
        defer { self = self[newStart..<endIndex] }
        return self[startIndex..<newStart]
    }
}
// soon...
extension String: Collection { }

Given this, here’s your example code written using it (compacted a little for brevity):

struct Product {
    var id, group, name, description, currency: String
    var inStock, ordered, price: Int
    
    var priceFormatted: String {
        let whole = (price/100)
        let cents = price - (whole * 100)
        return currency + " \(whole).\(cents)"
    }
    
    init(inputrecord: String) {
        // note, no copying will occur here, as String is
        // copy-on-write and there’s no writing happening
        var record = inputrecord
        
        id          =     record.dropPrefix(10)
        group       =     record.dropPrefix(4)
        name        =     record.dropPrefix(16)
        description =     record.dropPrefix(30)
        inStock     = Int(record.dropPrefix(10))!
        ordered     = Int(record.dropPrefix(10))!
        price       = Int(record.dropPrefix(10))!
        currency    =     record.dropPrefix(1)
    }
}
let record = "123A.534.CMCU3Arduino Due     Arm 32-bit Micro controller.  000000034100000005680000002250$"
let product = Product(inputrecord: record)
print("=== Product data for the item with ID: \(product.id) ====")
print("group          : \(product.group)")
print("name           : \(product.name)")
print("description    : \(product.description)")
print("items in stock : \(product.inStock)")
print("items ordered  : \(product.ordered)")
print("price per item : \(product.priceFormatted)")
print("=========================================================“)

Now, other use cases might not have such a straightforward solution. But for the example here, this approach ought to suffice, or be a starting point for similar cases needing error handling, skipped regions etc.

> Isn’t that an elegant solution or what? 

Unfortunately not. Adding integer subscripting to String via an extension that uses index(_:offsetBy) is a commonly proposed idea that we strongly caution against. Strings use an opaque
index rather than integers for a reason, it’s not an oversight.

The reason being: if ever your string contains more than just ASCII characters, then advancing a String's startIndex to the nth element becomes a linear-time operation, because Characters are variable length. As a result, every one of your uses of that subscript takes linear time. If you use them in a loop, then code that looks linear is actually (probably accidentally) quadratic.

Now, sometimes, when the String knows it only contains ASCII, it might be able to do the advance in constant time. But we still recommend against these kind of extensions to avoid performance pitfalls if ever you are handling strings where this isn’t the case. There are other techniques like the one shown above that achieve the same goal just as well.


> I might start a very lengthy discussion here about the threshold of where and how
> to protect the average programmer (like me :o) from falling in to language pittfalls
> and to what extend these have effect on working with a PL. One cannot make
> a PL idiot-proof. Of course, i agree a lot of it make sense, and also the “intelligence”
> of the Swift compiler (sometimes it almost feels as if it sits next to me looking at
> the screen and shaking its head from time to time) But hey, remember most of
> us in our profession have a brain too. 
> (btw, if you now of a way to let Xcode respect in-between spaces when auto-formatting please let me know, thanks)
> 
> @Ben Cohen:
> Hi, you wrote:
> "p.s. as someone who has worked in a bank with thousands of ancient file formats, no argument from me that COBOL rules :)"
> Although still the most part of accounting software is Cobol (mostly because it is too expensive 
> and risky to convert to newer technologies) I don’t think that Cobol rules and that new apps definitely should
> not be written in Cobol. I wouldn’t be doing Swift if I thought otherwise.  
> If I would be doing a Cobol project again, It would be with same enjoyment as say,
> a 2017 mechanical engineer, working on a steam locomotive of a touristic railroad.

Indeed. It was in this nostalgic spirit that my comment was meant.

> which I would do with dedication as well. However, never use this comparison
> at the hiring interview..:o)
> 
> 
> Kind Regards
> TedvG
> 
> 
> 
> 
> 
> 
> 
>> Sent from my moss-covered three-handled family gradunza
>> 
>> On Feb 9, 2017, at 5:09 PM, Ted F.A. van Gaalen <tedvgiosdev at gmail.com <mailto:tedvgiosdev at gmail.com>> wrote:
>> 
>>> 
>>>> On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>> wrote:
>>>> 
>>>> 
>>>> on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:
>>>> 
>>>>> Hello Shawn
>>>>> Just google with any programming language name and “string manipulation”
>>>>> and you have enough reading for a week or so :o)
>>>>> TedvG
>>>> 
>>>> That truly doesn't answer the question.  It's not, “why do people index
>>>> strings with integers when that's the only tool they are given for
>>>> decomposing strings?”  It's, “what do you have to do with strings that's
>>>> hard in Swift *because* you can't index them with integers?”
>>> 
>>> Hi Dave,
>>> Ok. here are just a few examples: 
>>> Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps? 
>>> of many of the typical combined article codes and product IDs that many factories and shops use? 
>>> 
>>> or: 
>>> 
>>> E.g. processing legacy files from IBM mainframes:
>>> extract fields from ancient data records read from very old sequential files,
>>> say, a product data record like this from a file from 1978 you’d have to unpack and process:   
>>> 123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
>>> into:
>>> 123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
>>> product category, pcs, discount code, product code, price Yen, price $, class code, etc… 
>>> in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
>>> (storage was limited and very, very expensive, e.g. XML would be regarded as a 
>>> "scandalous waste" even the commas in CSV files! ) 
>>> 
>>> 01  MAILING-RECORD.
>>>        05  COMPANY-NAME            PIC X(30).
>>>        05  CONTACTS.
>>>            10  PRESIDENT.
>>>                15  LAST-NAME       PIC X(15).
>>>                15  FIRST-NAME      PIC X(8).
>>>            10  VP-MARKETING.
>>>                15  LAST-NAME       PIC X(15).
>>>                15  FIRST-NAME      PIC X(8).
>>>            10  ALTERNATE-CONTACT.
>>>                15  TITLE           PIC X(10).
>>>                15  LAST-NAME       PIC X(15).
>>>                15  FIRST-NAME      PIC X(8).
>>>        05  ADDRESS                 PIC X(15).
>>>        05  CITY                    PIC X(15).
>>>        05  STATE                   PIC XX.
>>>        05  ZIP                     PIC 9(5).
>>> 
>>> These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data. 
>>> So here I am, having to get the data of these old Cobol production files
>>> into a brand new Swift based accounting system of 2017, what can I do?   
>>> 
>>> How do I unpack these records and being the data into a Swift structure or class? 
>>> (In Cobol I don’t have to because of the predefined fixed format record layout).
>>> 
>>> AFAIK there are no similar record structures with fixed fields like this available Swift?
>>> 
>>> So, the only way I can think of right now is to do it like this:
>>> 
>>> // mailingRecord is a Swift structure
>>> struct MailingRecord
>>> {
>>>     var  companyName: String = “no Name”
>>>      var contacts: CompanyContacts
>>>      .
>>>      etc.. 
>>> }
>>> 
>>> // recordStr was read here with ASCII encoding
>>> 
>>> // unpack data in to structure’s properties, in this case all are Strings
>>> mailingRecord.companyName                       = recordStr[ 0..<30]
>>> mailingRecord.contacts.president.lastName  = recordStr[30..<45]
>>> mailingRecord.contacts.president.firstName = recordStr[45..<53]
>>> 
>>> 
>>> // and so on..
>>> 
>>> Ever worked for e.g. a bank with thousands of these files unchanged formats for years?
>>> 
>>> Any alternative, convenient en simpler methods in Swift present? 
>>> 
>>> Kind Regards
>>> TedvG
>>> ( example of the above Cobol record borrowed from here: 
>>>  http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-1.html <http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-1.html>  ) 
>>> 
>>> 
>>>     
>>> 
>>>> 
>>>>>> On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce at gmail.com <mailto:shawnce at gmail.com>> wrote:
>>>>>> 
>>>>>> I also wonder what folks are actually doing that require indexing
>>>>>> into strings. I would love to see some real world examples of what
>>>>>> and why indexing into a string is needed. Who is the end consumer of
>>>>>> that string, etc.
>>>>>> 
>>>>>> Do folks have so examples?
>>>>>> 
>>>>>> -Shawn
>>>>>> 
>>>>>> On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org> <mailto:swift-evolution at swift.org <mailto:swift-evolution at swift.org>>> wrote:
>>>>>> Hello Hooman
>>>>>> That invalidates my assumptions, thanks for evaluating
>>>>>> it's more complex than I thought.
>>>>>> Kind Regards
>>>>>> Ted
>>>>>> 
>>>>>>> On 8 Feb 2017, at 00:07, Hooman Mehr <hooman at mac.com <mailto:hooman at mac.com> <mailto:hooman at mac.com <mailto:hooman at mac.com>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org> <mailto:swift-evolution at swift.org <mailto:swift-evolution at swift.org>>> wrote:
>>>>>>>> 
>>>>>>>> I now assume that:
>>>>>>>>      1. -= a “plain” Unicode character (codepoint?)  can result in one glyph.=-
>>>>>>> 
>>>>>>> What do you mean by “plain”? Characters in some Unicode scripts are
>>>>>>> by no means “plain”. They can affect (and be affected by) the
>>>>>>> characters around them, they can cause glyphs around them to
>>>>>>> rearrange or combine (like ligatures) or their visual
>>>>>>> representation (glyph) may float in the same space as an adjacent
>>>>>>> glyph (and seem to be part of the “host” glyph), etc. So, the
>>>>>>> general relationship of a character and its corresponding glyph (if
>>>>>>> there is one) is complex and depends on context and surroundings
>>>>>>> characters.
>>>>>>> 
>>>>>>>>      2. -= a  grapheme cluster always results in just a single glyph, true? =- 
>>>>>>> 
>>>>>>> False
>>>>>>> 
>>>>>>>>      3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )
>>>>>>> 
>>>>>>> The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.
>>>>>>> 
>>>>>>>>     4.  In this context, a glyph is a humanly recognisable visual form of a character,
>>>>>>> 
>>>>>>> Not in a straightforward one to one fashion, not even in Latin / Roman script.
>>>>>>> 
>>>>>>>>     5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
>>>>>>>>         with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
>>>>>>>>         of encoding the glyph was based upon.
>>>>>>> 
>>>>>>> False
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution at swift.org <mailto:swift-evolution at swift.org> <mailto:swift-evolution at swift.org <mailto:swift-evolution at swift.org>>
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
>>>>> <https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>>
>>>>> 
>>>> 
>>>> -- 
>>>> -Dave
>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170212/5591b0d5/attachment.html>


More information about the swift-evolution mailing list