[swift-evolution] Fixed-sized arrays

Sun Jun 26 00:13:43 CDT 2016

[I’ve seen the WWDC 2016 Keynote and State of the Platforms videos.  I haven’t seen any others so I don’t spoil myself before typing my ideas down.  Apologies if this has already been covered.]

Is there any problem to adding fixed-sized arrays?  From what I glanced here, it’s more like no one has gotten around to it and less like the very idea is hated.

Obviously, the main advantages are that object storage (or pointer storage for reference types) is on the stack and that the shape (i.e. the number of extents and range for each extent) is fixed at compile time.  This type of, well, type can be used when the baggage of length changing isn’t needed  Such arrays are also a classic type since we started system-programming languages.  (I was surprised by their absence in Swift when I first read the books.)  They can be mapped to a vector processing unit’s built-ins.

This:

> struct ArrayOf5<T> {
> 
>     let count = 5
> 
>     var first: T
>     var second: T
>     var third: T
>     var fourth: T
>     var fifth: T
> 
>     init(array: [T]) {
>         first = array[0]
>         second = array[1]
>         third = array[2]
>         fourth = array[3]
>         fifth = array[4]
>     }
> 
>     subscript(index: Int) -> T {
>         get {
>             var preResult: T?
>             switch index {
>             case 0:
>                 preResult = first
>             case 1:
>                 preResult = second
>             case 2:
>                 preResult = third
>             case 3:
>                 preResult = fourth
>             case 4:
>                 preResult = fifth
>             default:
>                 preResult = nil
>             }
>             return preResult!
>         }
> 
>         set {
>             func doAssign(destination: UnsafeMutablePointer<T>) {
>                 destination.memory = newValue
>             }
> 
>             switch index {
>             case 0:
>                 doAssign(&first)
>             case 1:
>                 doAssign(&second)
>             case 2:
>                 doAssign(&third)
>             case 3:
>                 doAssign(&fourth)
>             case 4:
>                 doAssign(&fifth)
>             default:
>                 doAssign(nil)
>             }
>         }
>     }
> 
> }

shouldn’t be better supported than directly having built-in fixed-sized arrays.  (I’ve wondered if we should have a library or built-in arrays.  I’ve decided that even if we add a library, we still need the built-in.)  Although I parametrized the type, altering the shape (right now 1 dimension of 5 elements) doesn’t scale since it requires lexical change.  Internally, a built-in array type would be something like this, modulo optimizations and not having to come up with names for each element.

—————

[The following sections are my ideas about implementation; we don’t have to use them.]

What should array type expressions look like in code?

Since generics (currently) only use type-based parameters, we can’t do something C++-ish like “array<T, 1, 4, 3>”.  I did have a thought of “[MyType; 5]”, but that seemed too close to current arrays.  I was shocked to discover that Rust uses that syntax.  However, it’s still to bad to use now because ‘[MyType]” would mean something different that it does currently (zero-dimensional array of a single unit instead of one-dimensional array of arbitrary number of elements).

Since arrays are product types (I think), let’s use multiplication syntax:

> var myArray: MyType * [6]  // This is an array of six elements, each of `MyType`
> var myArray2: MyType * [6] * [5]  // This is an array of five elements, each a six-element array of `MyType`
> var myArray3: MyType * [3, 4]  // This is an array of three-by-four, each element of `MyType`

The “[ SHAPE ]” syntax affects the type expression to its left.  An empty list is not allowed since, although array-literal syntax allows empty lists, the subscript operation does not.  Note, to keep the syntax regular, the first subscript when using “myArray2” is for five elements, i.e. “0..<5”, and the second (if any) is for six, the reverse lexical order of declaration.  This is because we don’t have C’s cute “declaration equals usage” type syntax.  Also note that “myArray3” requires a single subscript call with two coordinates in it to dereference.  Supplying only one coordinate is nonsensical.  (“myArray2” is the other way when you want to dereference a “MyType” value.)

Of course, not using a shape expression gives you a zero-dimensional array (i.e. a regular variable) of your base type by default.  (So, it would actually be "0 + extents(MyType.self)" dimensions.)

Or we could just do it the same way C does it (i.e. “MyType[8]”), and preserve nested arrays having the same lexical order in declaration and dereference.  We could even forbid multi-dimensional arrays and just do nested single-dimensional arrays like C (and move multi-dimensionality to wrapper structs), but I feel that would be giving up on improving the abstraction.

——

What about row vs. column order?

If you stick with chaining one-dimensional arrays, you get only row-order, just like C (which can only do nested one-dimensional arrays).  If you use a multi-extent shape expression, you can override the order:

> var myArray1: MyType * [3, 4] * #rank[0, 1]  // Default; row-order
> var myArray2: MyType * [3, 4] * #rank[1, 0]  // Reversed; column-order
> var myArray3: MyType * [3, 4, 5] * #rank[0, 2, 1]  // Scrambled

The array literal attached to the rank command has to contain every value in 0..<count exactly once.  The position of the “0” value is the spacing with the longest span, “count - 1” is the adjacent elements level.

A rank command can only appear directly after a shape expression or an element type expression, and not after another rank command.  The command’s literal must have the same length as the number of extents of the corresponding shape expression.  A rank command following an element type expression, whether it’s a non-static-array type or a static-array type hidden behind a type alias, is treated as a zero-dimensional array so “#rank[]” is the only legal rank command for them.

The rank command does not pierce type aliases; “#rank[]” is the only valid command for them.  The alias itself can have a rank command, which affects all objects declared with that alias, and cannot be overridden.  (If the alias doesn’t have a rank command, the index storage priority is fixed to the default.)  Should a rank-stripping command be added?  Should an rank command that pierces aliases and/or overrides previous ranks be added?

Types that differ in number of shape commands, number of extents within corresponding commands, size of corresponding extents, index storage priorities, or element type are different types.  We should have a function that can reshape an array if the total number of the outermost non-static-array objects are the same.  (The function could reshape a non-static-array type to a static array of one element with any nesting.)

(If we give up on improving the abstraction compared to C, this rank order would be simulated in a wrapper struct that rearranges the indices.)

——

What about initialization?

I didn’t use initialization expressions before because it’s hard.  We’ll have to work on this.

Zero-dimensional arrays (i.e. a regular non-static-array type) are iniitalized as usual.

A one-dimensional static-array gets initialized like a current dynamic array.  But the syntax given below will also be accepted (with just one coordinate, of course).

For higher-dimensional arrays, we need something like:

> [(0, 0, 0): myFirstValue, (2, 4, 1): mySecondValue, …]
> [(0, 0): myFirstValue, (0, 1): mySecondValue, default: myDefaultValue]

and/or:

> var myArray: MyType * [3, 4] = { return 2 * $0 + 7 * $1 }

(We can use “myParam1, myParam2 in” syntax for that last one too.)  Can/could we have an array expression that is part (XX, YY) and part $0/$etc formula?  Maybe with “_” and/or “case” and/or “where” to differentiate which elements get the $0/$etc and which ones get a “default:” phrase?  Maybe the _/case/where could be used within incomplete (XX, YY)-style indexing?

——

Should there be flexible-ending arrays?  Other value types?

Let’s say that at most one extent in a shape expression can be “any” instead of a compile-time integer expression.  Let’s restrict the “any” dimension to be the rank-0 one, using “#rank” to override if you didn’t put the “any” first.  Let’s call these flexible-ending static-arrays.

Flexible-ending static-arrays could be used like the “MyType[]” expression in C.  Frequently it can be used as a function parameter without having to specialize for each possible array length.  In other words, for a given static array type “MyType * […, N, …]”, it can be passed into a function parameter that is declared as “MyType * […, any, …]”.  Could there be an automatic way to pass in the value of “N”, or would we have to resort to using another function parameter or some other out-of-band information and trusting the user to synchronize correctly?

It should be possible to pass in different-shaped arrays to said function parameters with a reshape call, as long as the total number of elements is a multiple of the span across the “any”-ed extent.  Should such a function parameter also take a "[MyInnerType]” dynamic array objects, where “MyInnerType” is what you get after stripping the outermost, “any”-ed, extent?

A flexible-ending struct is one with a flexible-ending type as its last stored property.  A flexible-ending enum is a enum with attributed cases and at least one case ends its tuple with a flexible-ending type.  Flexible-ending types cannot appear anywhere else as a stored property in a struct or enum.  Should we use “final” to mark a struct’s flexible ending property?  Somehow adapt this for enums too?  (Right now, “final” is only used for classes.)  Flexible-ending structs and enums can be used in function parameters too, allowing different lengths for the same parameter.  Like static-arrays, a struct or enum that matches a flexible function parameter except that its would-be flexible part is fixed can be passed through said parameter.  (It’s the same rule since all flexible struct and enums have to have a nested flexible array at their end.)

Besides as function parameters, a flexible-ending object can be declared inside a function/method or as a global.  I’m not sure how to do heap versions yet.  If the object is immutable (i.e. “let”), the initialization block has to cover all the elements without using $0/$etc or “default:” so the size can be determined.  I’m not sure how mutable objects are supposed to work here.  I’ll need your ideas the most here, assuming we keep this feature.  It was inspired by seeing code in Ada (discriminated record) and C (“MyType[1]” as last field in struct, then allocate with malloc) doing this.

Oh yeah, a flexible-ending static-array, including those that are parts of flexible-ending structs and enums, must have a inflexible base type.  (Otherwise lies madness.)

(If we keep this feature, maybe it should be limited to one-dimensional arrays.)

— 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20160626/29625ba7/attachment-0001.html>