[swift-evolution] [Draft] scope-based submodules

Wed Mar 1 00:55:57 CST 2017

> On Feb 24, 2017, at 11:34 AM, Matthew Johnson via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Scope-based submodules
> 
> 	• Proposal: SE-NNNN
> 	• Authors: Matthew Johnson
> 	• Review Manager: TBD
> 	• Status: Awaiting review

Well, this is certainly comprehensive! Sorry about the delay in answering; I've been hosting a house guest and haven't had a lot of free time.

> The primary goal of this proposal are to introduce a unit of encapsulation within a module that is larger than a file as a means of adding explicit structure to a large program. All other goals are subordinate to this goal and should be considered in light of it. 

I agree with this as the primary goal of a submodule system.

> Some other goals of this proposal are:
> 
> 	• Submodules should help us to manage and understand the internal dependencies of a large, complex system.
> 	• Submodules should be able to collaborate with peer submodules without necessarily being exposed to the rest of the module.
> 	• A module should not be required to expose its internal submodule structure to users when symbols are exported.
> 	• It should be possible to extract a submodule from existing code with minimal friction. The only difficulty should be breaking any circular dependencies.

One goal I don't see mentioned here is "segment the API surface exposed to importing code". The `UIGestureRecognizerSubclass` use case has been thoroughly discussed, but I think there are probably a lot of cases where there are two "sides" to an API and it'd often be helpful to hide one unless it's needed. `URLProtocol` and `URLProtocolClient` come to mind; the many weird little classes and symbols related to `NSAtomicStore` and `NSIncrementalStore` might be another.

(I'm not necessarily suggesting that the Foundation and Core Data overlays should move these into submodules—I'm suggesting that, if they were implemented in a Swift with a submodule feature, they would be candidates for submodule encapsulation.)

> Submodule names form a hierarchical path:
> 
> 	• The fully qualified name of the submodule specified by Submodule.InnerSubmodule is: MyModuleName.Submodule.InnerSubmodule.
> 	• In this example, InnerSubmodule is a child of Submodule.
> 	• A submodule may not have the same name as any of its ancestors. This follows the rule used by types.

Does being in a nested submodule have any semantic effect, or is it just a naming trick?

> Submodules may not be extended. They form strictly nested scopes.
> 
> 	• The only way to place code in a submodule is with a submodule declaration at the top of a file.
> 	• All code in a file exists in a single submodule.

I'm a big supporter of the 1-to-N submodule-to-file approach.

> There are several other ways to specify which submodule the top-level scope of a file is in. All of these alternatives share a crucial problem: you can’t tell what submodule your code is in by looking at the file. 
> 
> The alternatives are:
> 
> 	• Use a manifest file. This would be painful to maintain.
> 	• Use file system paths. This is too tightly coupled to physical organization. Appendix A discusses file system independence in more detail.
> 	• Leave this up to the build system. This makes it more difficult for a module to support multiple build systems.

I'm going to push back on this a little. I don't like the top-of-file `submodule` declaration for several reasons:

	1. Declarations in a Swift file are almost always order-independent. There certainly aren't any that must be the first thing in the file to be valid.

	2. Swift usually keeps configuration stuff out of source files so you can copy and paste snippets of code or whole files around with minimum fuss. Putting `submodule` declarations in files means that developers would need to open and modify those files if they wanted to copy them to a different project. (It's worth noting that your own goal of making it easy to extract submodules into separate modules is undermined by submodule declarations inside files.)

	3. However you're organizing your source code—whether in the file system, an IDE project, or whatever else—it's very likely that you will end up organizing files by submodule. That means either information about submodules will have to be specified twice—once in a canonical declaration and again in source file organization—and kept in sync, or IDEs and tooling will have to interpret the `submodule` declarations in source files and reflect that information in their UIs.

	4. Your cited reason for rejecting build system-based approaches is that "This makes it more difficult for a module to support multiple build systems", but Swift has this same problem in *many* other parts of its design. For instance, module names and dependencies are build system concerns, despite the fact that this makes it harder to support multiple build systems. I can only conclude that supporting multiple build systems with a single code base is, in the long term, a non-goal, presumably by improving the Xcode/SwiftPM story in some way.

I'm still a fan of build-system-based approaches because I think they're better about these issues. The only way that they're worse is that—as you note—it may not be clear which submodule a particular file is in. But I think this is basically a UI problem for editors.

> Top-level export
> 
> All export statements consist of an access modifier, the export keyword, and a submodule name:
> 
> open export ChildSubmodule

Is the access control keyword mandatory?

If we do indeed use `submodule` statements, could we attach the attributes to them, rather than having a separate statement in a different file?

What does it mean if a `public` or `open` symbol is in a submodule which is not `export`ed?

> 	• A submodule may be published under a different external name using the export as NewName syntax*.

What's the use case for this feature?

> 	• @implicit causes symbols from the submodule to be implicitly imported when the module is imported.
> 	• @inline causes the symbols from the submodule to appear as if they had been declared directly within the top-level submodule.

So if you write `@implicit public export Bar` in module `Foo`, then writing `import Foo` also imports `Foo.Bar.Baz` *as* `Foo.Bar.Baz`, whereas `@inline public export Bar` copies `Foo.Bar.Baz` into `Foo`, so it imports as `Foo.Baz`?

What's the use case for supporting both of these behaviors?

> Exports within the module
> 
> A submodule may bound the maximum visibility of any of its descendent submodules by explicitly exporting it:

I'm not sure how valuable this feature is in this kind of submodule design.

***

To avoid being coy, here's the export control model that *I* think would make the most sense for this general class of submodule system designs:

1. A submodule with `public` or `open` symbols is importable from outside the module. There is no need to separately mark the submodule as importable.

2. Normally, references within the module to submodule symbols need to be prefixed with the submodule name. (That is, in top-level `Foo` code, you need to write `Bar.Baz` to access `Foo.Bar.Baz`). As a convenience, you can import a submodule, which makes the submodule's symbols available to that file as though they were top-level module symbols.

3. When you import a submodule, you can mark it with `@exported`; this indicates that the symbols in that submodule should be aliased and, if `public` or `open`, re-exported to other modules.

4. There are no special facilities for renaming submodules or implicitly importing submodules.

> Importing submodules
> 
> Submodules are imported in exactly the same way as an external module by using an import statement.

Okay, but what exactly does importing *do*? Set up un-prefixed private aliases for the submodule's internal-and-up APIs?

> There are a few additional details that are not applicable for external modules:
> 
> 	• Circular imports are not allowed.

Why not? In this design, all submodules are evaluated at once, so I'm not sure why circular imports would be a problem.

> // `Grandparent` and all of its descendents can see `Child1` (fully qualified: `Grandparent.Parent.Child1`)
> // This reads: `Child1` is scoped to `Grandparent`.
> 
> scoped(Grandparent) export Child1
> 
> // `Child2` is visible throughout the module but may not be exported for use by clients.
> // This reads: `Child2` is scoped to the module.
> 
> scoped(module) export Child2
> 
> With parameterization, scoped has the power to specify all access levels that Swift has today:
> 
> `scoped`                                      == `private` (Swift 3)
> `scoped(file)`                                == `private` (Swift 2 & 4?) == `fileprivate` (Swift 3)
> `scoped(submodule)`                           == `internal`
> `scoped(public) scoped(internal, inherit)`*   == `public`
> `scoped(public)`                              == `open`
> 
> The parameterization of scoped also allows us to reference other scopes that we cannot in today’s system, specifically extensions: scoped(extension) and outer types: scoped(TypeName).

What is the purpose of creating more verbose aliases for existing access levels? I can't think of one, which means that these are redundant.

And if we remove them as redundant, the remaining access control levels look like:

	scoped
	private
	scoped(TypeName)
	internal
	scoped(SomeModule)
	scoped(module)
	scoped(extension)
	public
	open

There's just no logic to the use of the `scoped` keyword here—it doesn't really mean anything other than "we didn't want to assign a keyword to this access level".

***

I think we need to go back to first principles here. The reason to introduce a new access level is that we believe that a submodule is a large enough unit of code that it will simultaneously need to encapsulate some of its implementation details from other submodules, *and* have some of its own implementation details encapsulated from the rest of the submodule. Thus, we need at least three access levels within a submodule: one that exposes an API to other submodules, one that exposes an API throughout a submodule, and one that exposes it to only part of a submodule.

What we do *not* need is a way to allow access only from certain other named submodules. The goal is to separate external and internal interfaces, not to micromanage who can access what.

Basically, that means we need one of two things. Keeping all existing keywords the same—i.e., not removing either `private` or `fileprivate`— and using `semi-` as a placeholder, we want to either have:

	private: surrounding scope
	fileprivate: surrounding file
	semi-internal: surrounding submodule
	internal: surrounding module
	public: all modules (no subclassing)
	open: all modules (with subclassing)

Or:

	private: surrounding scope
	fileprivate: surrounding file
	internal: surrounding submodule
	semi-public: surrounding module
	public: all modules (no subclassing)
	open: all modules (with subclassing)

The difference between the two is that, with `semi-internal` below `internal`, submodule APIs are exposed by default to other submodules; with `semi-public` above `internal`, submodule APIs are encapsulated by default from other submodules.

I think encapsulating by default is the right decision, so we want the `semi-public` design. But there's also a second reason to use that design: We can anticipate another use case for it. The library resilience design document discusses the idea of "resilience domains"—groups of libraries whose versions are always matched, and which therefore don't need to use resilient representations of each others' data structures—and the idea of having "SPIs", basically APIs that are only public to certain clients. I think these ideas could be conflated, so that a semi-public API would be available both to other submodules in the module and to other libraries in your resilience domain, and that this feature could be used to expose SPIs.

So, that leaves an important question: what the hell do you call this thing? My best suggestions are `confidential` and `privileged`; in the context of information, these are both used to describe information which *is* shared, but only within a select group. (Think, for instance, of attorney-client privilege: You can share this information with your lawyer, but not with anyone else.)

So in short, I suggest adding a single access level to the existing system:

	private
	fileprivate
	internal
	confidential/privileged
	public
	open

This is orthogonal to any other simplification of the access control system, like removing `private` or `fileprivate`.

> Appendix A: file system independence

I think we need to decide: Is a translation unit of some sort—whether it's a physical on-disk file or some simulacrum like a database record or just a separate string—something intrinsic to Swift? I think it should be; it simplifies a lot of parts of the language that would otherwise require nesting and explicit scoping.

If translation units are an implicit part of Swift, then this section is not really necessary. If translation units aren't, then we need to rethink a lot of things that are already built in.

-- 
Brent Royal-Gordon
Architechies