[swift-evolution] Strings in Swift 4

Fri Feb 10 13:47:50 CST 2017

Hi Ted,

Here’s a sketch of one way to handle this kind of processing without requiring integer indexing. Hopefully not too buggy though I haven’t tested it extensively :). 

Here I’m stashing the parsed values in a dictionary, but you could also write code to insert them into a proper data structure where the dictionary set is happening (or maybe stick with the dictionary build, and then use that dictionary to populate your data structure, along with some more data validation and error handling).

import Foundation
extension String: Collection { }

let fieldLengths: DictionaryLiteral = [
    "CompanyName":30,
    "PresidentLastName":15,
    "PresidentFirstName":8,
    "VPMarketingLastName":15,
    "VPMarketingFirstName":8,
    "AlternateContactTitle":10,
    "AlternateContactLastName":15,
    "AlternateContactFirstName":8,
    "Address":15,
    "City":15,
    "State":2,
    "Zip":5,
]

var data = "Premier Properties            Murray         Mitch   Ricky          Roma    Office MgrWilliamson     John    350 Fifth Av   New York       NY10118"
var keyedRecord: [String:String] = [:]

for (key,length) in fieldLengths {
    let field = data.prefix(length)

    guard field.count == length
    else { fatalError("Input too short while reading \(key)") }
    // or however you want to handle it

    keyedRecord[key] = field.trimmingCharacters(in: CharacterSet.whitespaces)

    data = data.dropFirst(length)
}
guard data.isEmpty
else { fatalError("Input too long") }

print(keyedRecord)

I think it’s worth noting how seductive it is, with the integer indexing, to perform unchecked indexing into the data: recordStr[ 0..<30] is great until you have to process a corrupt record. Working in terms of higher-level APIs encourages handling of the failure cases. As an added bonus, when you upgrade your system and now the incoming data turns out to be utf8, your system doesn’t crash when a bored intern inserts some emoji into the president’s name.

There is still definitely room to make this easier/more discoverable for users:

- The “patterns” concept that is briefly touched on in the string manifesto would hopefully provide a another way of expressing this, with patterns matching fixed numbers of characters.
 - The need to walk over the field multiple times (first prefix, then count, then dropFirst) should be better-handled by some other scanning APIs mentioned in the manifesto e.g. if let field = data.dropPrefix(lengthPattern). Note that if the underlying String held only ASCII/Latin1, these should still be constant-time operations under the hood. 
- Another approach is to provide generic operations on Collection that chunks collections into subsequences of given lengths and serves them up, possibly via a a lazy view. This would have the advantage of not requiring mutable state in the loop.

But the above is what we can achieve with the tools we have today.

p.s. as someone who has worked in a bank with thousands of ancient file formats, no argument from me that COBOL rules :)

> On Feb 10, 2017, at 9:20 AM, Ted F.A. van Gaalen via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Please see in-line response below
>> On 10 Feb 2017, at 03:56, Shawn Erickson <shawnce at gmail.com <mailto:shawnce at gmail.com>> wrote:
>> 
>> 
>> On Thu, Feb 9, 2017 at 5:09 PM Ted F.A. van Gaalen <tedvgiosdev at gmail.com <mailto:tedvgiosdev at gmail.com>> wrote:
>>> On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams at apple.com <mailto:dabrahams at apple.com>> wrote:
>>> 
>>> 
>>> on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:
>>> 
>>>> Hello Shawn
>>>> Just google with any programming language name and “string manipulation”
>>>> and you have enough reading for a week or so :o)
>>>> TedvG
>>> 
>>> That truly doesn't answer the question.  It's not, “why do people index
>>> strings with integers when that's the only tool they are given for
>>> decomposing strings?”  It's, “what do you have to do with strings that's
>>> hard in Swift *because* you can't index them with integers?”
>> 
>> Hi Dave,
>> Ok. here are just a few examples: 
>> Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps? 
>> of many of the typical combined article codes and product IDs that many factories and shops use? 
>> 
>> or: 
>> 
>> E.g. processing legacy files from IBM mainframes:
>> extract fields from ancient data records read from very old sequential files,
>> say, a product data record like this from a file from 1978 you’d have to unpack and process:   
>> 123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
>> into:
>> 123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
>> product category, pcs, discount code, product code, price Yen, price $, class code, etc… 
>> in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
>> (storage was limited and very, very expensive, e.g. XML would be regarded as a 
>> "scandalous waste" even the commas in CSV files! ) 
>> 
>> 01  MAILING-RECORD.
>>        05  COMPANY-NAME            PIC X(30).
>>        05  CONTACTS.
>>            10  PRESIDENT.
>>                15  LAST-NAME       PIC X(15).
>>                15  FIRST-NAME      PIC X(8).
>>            10  VP-MARKETING.
>>                15  LAST-NAME       PIC X(15).
>>                15  FIRST-NAME      PIC X(8).
>>            10  ALTERNATE-CONTACT.
>>                15  TITLE           PIC X(10).
>>                15  LAST-NAME       PIC X(15).
>>                15  FIRST-NAME      PIC X(8).
>>        05  ADDRESS                 PIC X(15).
>>        05  CITY                    PIC X(15).
>>        05  STATE                   PIC XX.
>>        05  ZIP                     PIC 9(5).
>> 
>> These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data. 
>> So here I am, having to get the data of these old Cobol production files
>> into a brand new Swift based accounting system of 2017, what can I do?   
>> 
>> How do I unpack these records and being the data into a Swift structure or class? 
>> (In Cobol I don’t have to because of the predefined fixed format record layout).
>> 
>> AFAIK there are no similar record structures with fixed fields like this available Swift?
>> 
>> So, the only way I can think of right now is to do it like this:
>> 
>> // mailingRecord is a Swift structure
>> struct MailingRecord
>> {
>>     var  companyName: String = “no Name”
>>      var contacts: CompanyContacts
>>      .
>>      etc.. 
>> }
>> 
>> // recordStr was read here with ASCII encoding
>> 
>> // unpack data in to structure’s properties, in this case all are Strings
>> mailingRecord.companyName                       = recordStr[ 0..<30]
>> mailingRecord.contacts.president.lastName  = recordStr[30..<45]
>> mailingRecord.contacts.president.firstName = recordStr[45..<53]
>> 
>> 
>> // and so on..
>> 
>> Ever worked for e.g. a bank with thousands of these files unchanged formats for years?
>> 
>> Any alternative, convenient en simpler methods in Swift present? 
>> These looks like examples of fix data format
> Hi Shawn,
> No, it could also be an UTF-8 String.
>   
>> that could be parsed from a byte buffer into strings, etc. 
> How would you do that? could you please provide an example how to do this, with a byte buffer? 
> eg. read from flat ascii file —> unpack fields —> store in structure props? 
> 
> 
>> Likely little need to force them via a higher order string concept,
> What do you mean here with “high order string concept” ??
> Swift is a high level language, I expect to do this with Strings directly,
> instead of being forced to use low-level coding with byte arrays etc.
> (I have/want no time for that)
> Surely, one doesn’t have to resort to that in a high level language like Swift? 
> If I am certain that all characters in a file etc. are of fixed width, even in UTF-32
> (in the above example I am 100% sure of that) then 
> using  str[n1..<n2] is that case legitimate, because there are no
> grapheme characters involved.
> Therefore IMHO String direct subscripting should be available in Swift 
> for all Unicode types, and that the responsibility wether or not to use
> this feature is with the programmer, not the language designer.
> 
>> at least not until unpacked from its compact byte form.
> I am sorry, but to me, it all sounds a bit like:
> “why solve the problem with simple solution, when one can make it much
> more complicated?” Be more pragmatic.
> 
> 
> TedvG, 
>> 
>> -Shawn 
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170210/3ebf49cb/attachment.html>