[swift-evolution] [Pitch] New Version of Array Proposal

Sun Jul 23 02:08:45 CDT 2017

> On Jul 22, 2017, at 9:14 PM, Taylor Swift <kelvin13ma at gmail.com> wrote:
> 
> Hi, I’ve been watching this proposal for a while but didn’t get to read it in detail until now. I really like this idea and it’s very comprehensive and well-thought out. Some feedback:
> 
> 1. I got very confused reading through the document the first time through. You’re introducing a huge amount of new syntax and jargon, and not defining all of them before first using it. For example, what does `a[[;3]]` and `b[[;1, 4]]` after the seventh paragraph mean? This bounds-omission syntax isn’t explained until the Array Values section, and it’s buried in the middle of a paragraph.

I had to rearrange other sections of the document due to this problem. I saw this, but couldn’t figure out how to resolve it at that time; the sections seemed too mutually dependent (so I get this problem either way).

> 2. Using the semicolon as the bounds separator feels a bit weird, especially when combined with the colons in the initializers. In English, semicolons bind closer than colons do, so `[13, 2; [;0, 0]: 0, [;0, 1]: 0, default: -1]` reads strangely. Also, where does the type annotation go? Having a semicolon after a left square bracket “[;” also makes for ugly code texture <http://elm-lang.org/blog/compilers-as-assistants>, something to think about considering I’m going to be staring at a lot of these things. Then again, I’m struggling to think up an alternative syntax.

I would prefer to drop the semicolons to describe the arrays used to index locations, but I don’t know if using standard array literals would make parsing harder for implementors. And we could need different confirmation code that the indexes are valid for the shape of the array.

Within a FSA literal, the element type is calculated from the expression type, like in standard array and dictionary literals. That’s real easy if the literal uses a function term (It’s the closure’s return type.), but having mismatch element types in other kinds of terms is an error (I think) like it is for the other composite literals.

For your specific example, you can use storage order: `[13, 2; 0, 0, default: -1]`.

> 3. Some of the syntax rules could benefit from providing a little explanation, as the rationale is not always immediately obvious, for example:
> 
> - Allowing a single number for the left side of the dictionary literal pair, but not a comma-separated index list.
> 
> [5, 2; 2, 1: 5]
> 
> of course, the comma character is already reserved for separating key-element pairs, but I had to think for a while before realizing that.
> 
> - `func` not being allowed to coexist with other initialization pairs. My first thought was “why can’t `func:`initialize all the elements that weren’t covered by anything else, like `default:` does? I guess this is to ensure an efficient implementation of the closure initializer, but again, this is non-obvious.

Yeah, allowing a function term to share with others would either mean elements get double initialized (or one takes priority, and what defines the order/priority) or the implementation logic to loop over every element needs extra tests to skip elements with their own initialization term.

> 4. Please define terms like  “term”, “array-index term” and “single-number term” and “index set”, among others, preferably in boldface at the top of the document. This proposal is hard enough to follow without having to puzzle out what an “array-index term” is. (Isn’t every index an array index term?)
> 
> 5. There’s a lot of awkward and confusing language that I had to go back and reread many times over to figure out what it meant. Examples: “The number to the left of the colon is the index of the element to receive the value on the right of the colon.” (what???), “with an explicit bound (set)” (first thought: Why is “set” in parentheses? Does it not matter whether you set an explicit bound?). Again, this proposal is hard enough to follow without having to puzzle out that “(set)” refers to a list of multiple indices, not setting a single index.
> 
> 6. “The element type of the array is determined by either the return type of a function term's closure, or by the type consolidated from all the term values (like in dictionary and standard-array literals).” That second part is going to kill the type checker, since I’m guessing in practice FSAs are going to have much longer literals than flexible arrays or dictionaries do right now.

Hopefully, most literals will use default or function terms.

> 7. You should add a section in the FAQ justifying the use of `as` for FSA shape conversion. Why not initializer syntax?
> let d = [4, 3; 1, 2, 3, default: 0]
> let e = [4, 3; Double](d)
> I agree that `as` is better than magical initializers, but there are legitimate arguments against `as` too. Right now, `as` is reserved for literals and class types. Applying an `as` cast to a value type is an entirely new use-context for the keyword.

I don’t think any compound type has initializers; I didn’t add them to FSAs to avoid the impression that custom initializers are possible.

Theoretically, some reshaping could be expressed as a library function:

let f = reshape<2, 3, 2>(e)  // [2, 3, 2; Double]

but that requires complete generics

func reshape<N…: Int, M…: Int, T>(_ x:  […M; T]) -> […N; T] where #product(N…) == #product(M…) {
return […N; func: { x[ reindex<…M>($0) ] }
}

and it doesn’t work when the element types are different (even if the inner non-array types are the same). Then “as” can be reserved for just element-wise casts (which your “e” is, not a reshape). Speaking of which, maybe introduce a new “as”:

let g = e as@ Float

where you give only the element type and the compiler figures out the shape. We would probably need “as@?” and “as@!” too. I mentioned that these could be library generic functions too someday.

> 8. What is a “sub-object”??? This word started randomly appearing halfway through the document with no definition.

That should be a computer-science standard term-of-art. Here, it’s a tuple member or FSA element. The other product types, classes and structures, have them too, but are hidden behind encapsulation.

> 9. I don’t see the value in having both nested FSAs and multi-dimensional FSAs. Aren’t they the same thing? For example, in the code snippet

Why does any language with multi-dimensional arrays (like Fortran or Ada) have them? By this standard, no language should have multi-dimensional arrays. They exist because of data modeling. Co-equal coordinates in the model should be co-equal in their representation in the program. Yes, they are implemented the same way underneath. We don’t copy C everywhere, so why not take this opportunity to do better. Also, just using nesting could imply that the intermediate array types have a meaning, but they might not if they’re just implementation quirks and not part of the abstract model.

Nested arrays are not my solution for multi-coordinate indexing; use multi-dimensional arrays for that. I mention nested arrays because:

Nested arrays fundamentally cannot be banned. (What if an element type is hidden behind a type-alias and is conditionally an array type?)
I need the definition to explain the “inner non-array type”
I need the inner non-array type to explain which pairings of FSAs for reshaping are legal. (And a similar reason for tuple conversion.) Note the two types can have different nesting levels.
I need to explain that empty arrays cannot be an array element type. (Should this be changed? What happens with tuples or structures containing empty tuples/structures as members? What about empty tuples/sturctures in “Array”? Banning empty arrays means we don’t have to worry about every array element being at the same address. The other way to solve this is to make them one byte (or word) long.)

> ```
> let a = [;1, 2, 3, 4]
> assert(a[0] == 1)
> assert(a[1] == 2)
> assert(a[2] == 3)
> assert(a[3] == 4)
> let b = a as [2, 2; Int]
> assert(b[0, 0] == 1)
> assert(b[0, 1] == 2)
> assert(b[1, 0] == 3)
> assert(b[1, 1] == 4)
> let c = a as [2; [2; Int]]
> assert(c[0][0] == 1)
> assert(c[0][1] == 2)
> assert(c[1][0] == 3)
> assert(c[1][1] == 4)
> ```
> 
> There’s three syntaxes which accomplish two unique things. I lean towards disallowing FSA nesting and instead allowing incomplete index lists to partially unnest multidimensional FSAs. Let’s reserve “[][][]...” for flexible array chained dereferencing.

I don’t understand what your incomplete index list idea is. And as I said, the chaining technique is less I desire it and more I can’t ban it and keep Swift features orthogonal.

> 10. I don’t see much value in runtime DI. It seems to add a lot of complexity to the compiler with little benefit.

In other languages, people initialize their arrays in loops with run-time indexes. This frustrates compile-time DI. C doesn’t have this problem because it doesn’t have DI and is willing to risk undefined behavior during uninitialized reads. So it’s either work in the confines of static DI, allow dynamic DI (hopefully under as limited circumstances as possible), or give up on checking for FSAs and allow undefined behavior (which would blow a big hole in static DI). Maybe this can be pushed to version 2. 

> 11. This should have defined behavior:
> 
> let data = [2, 2; 1, 2, 4, 8]
> for (i, x) in data.enumerated()
> {
>     total += x
> }

FSAs are compound types, not named types, so there are no methods. (Just like tuples, FSAs don’t follow any protocols.) But since they’re a built-in, I can customize the “for-in” loop to cover all the elements in an implementation-optimized order. (The implementation has the option of multi-threading if it can see there’s no cross-iteration contamination.) Since you can’t control the order without manual looping, the “#indexOf” expression exists to let you know where you are during an iteration loop.

I first used the iteration variable as the operand to “#indexOf” to determine which array instance to track. Then I saw that the for-loop implements its iteration variable as a pattern in the grammar instead of something simpler. And what happens if a wildcard is used as the iteration variable? And would people question why when the pattern has multiple iteration variables, all are equally valid for “#indexOf”? I moved the operand to be the label because we usually don’t have multiple labels on statements (Is that even legal in Swift?) and can optimize which loops need to be indexable.

> “with” methods should be used sparingly and not be part of common idioms like iterating through an array as a buffer.

I need some library function to convert a FSA into a Collection, and to represent the “T[]” function parameter type concept from C, which represents any array size (instead of being locked for one length, or since we have multi-dimensional arrays, one shape). For safety, the callback function uses the buffer version so you have a pointer and size. You generally should be using the for-loop, though.

Using an unsafe buffer pointer forces the data into addressable memory, which isn’t necessarily the case of an instance’s location by default. (C optimizes the other way. Objects are addressable by default, and that is disabled, and any optimizations by being un-aliased activated, by the “restrict” qualifier.)

We should have some way to cull specific indexes in loops. (Like “lock index 2 to column 5 but iterate over all the qualifying subset of elements.”) I don’t know if this has to be a new base operation on a loop, or if it can be done through a library function, or if it can be done through a library function after we complete generics. But I think it can wait until version 2.

> 12. Can we rename the `cardinality` type property to `nestingCount` or `nestingLevels` something similar? That seems a lot clearer and more descriptive of that the property actually represents. I’d also bet that close to no one knows what cardinality is.

The nesting-level is “layers.” “Cardinality” is the number of co-equal coordinates. But that second name probably needs some work, although it can be synthesized from the length of “dimensions.” The “cardinality” and “elementCount” properties work around us not having (integer) compile-time constants yet; otherwise we can derive them from compile-time expressions on “dimensions."

— 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com 

> On Sat, Jul 22, 2017 at 3:41 PM, Daryle Walker via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> wrote:
> It’s at <https://gist.github.com/CTMacUser/cfffa526b971d0e1f3a079f53c6819bb <https://gist.github.com/CTMacUser/cfffa526b971d0e1f3a079f53c6819bb>>.
> 
> * Try to clarify that fixed-size arrays are a new kind of compound type, not a (ridiculously magical) library generic type.
> * Change the separator between the bounds and element type from a colon to a semicolon.
> * Specify that violating the bounds of an array during indexing is a run-time error.
> * Reword how the mapping of static-indexing elements for multi-dimensional arrays works.
> * Completely redo array values/literals. A literal for a fixed-size array always includes a semicolon, which separates the bounds from the values. (The separator in FSA types was changed to a semicolon to match.) A value can be a plain expression or a dictionary expression where the right side of the colon is the value and the left side is the index of the target element or “default” for all un-targeted elements. The left side can also be “func”, with the right side being an initialization closure.
> * Move the “Reshaping Arrays” section and add an example.
> * Right now, deterministic initialization is unchanged, so an initializing loop has to be changed to initializing the array with a function term, moving the loop to the closure.
> * Remove the “parallel” declaration attribute. Add a future note about it and the “fragmentary” attribute.
> * Change the for-loop example to conform to deterministic initialization. Reword how the flattening buffer functions work.
> * Add examples to element-wise casting.
> * Reword tuple conversion section, and add an example.
> * Reword various vector-mode attribute sections. Note that there need to be two ABI additions for FSA, one for non-vectorized FSAs and one for vectorized FSAs. These two kinds of arrays need conversion functions at our (i.e. the ABI) level, but are transparent for the user.
> 
>     let v1: @vector [3; Int] = [; 1, 3, 5]
>     let v2: [3; Int] = v1  // Not a type-mismatch error
> 
> * Due to FSA’s literals including the bounds, or using automatic bounds mode, array-placeholder syntax is not longer needed for type annotations and has been removed.
> * Renamed “nestings” to “layers”.
> 
> — 
> Daryle Walker
> Mac, Internet, and Video Game Junkie
> darylew AT mac DOT com 
> 
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org <mailto:swift-evolution at swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution <https://lists.swift.org/mailman/listinfo/swift-evolution>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20170723/e1c25a13/attachment.html>