[swift-evolution] [Draft] scope-based submodules

Thu Mar 2 14:58:43 CST 2017

> On Mar 1, 2017, at 12:55 AM, Brent Royal-Gordon <brent at architechies.com> wrote:
> 
>> On Feb 24, 2017, at 11:34 AM, Matthew Johnson via swift-evolution <swift-evolution at swift.org> wrote:
>> 
>> Scope-based submodules
>> 
>> 	• Proposal: SE-NNNN
>> 	• Authors: Matthew Johnson
>> 	• Review Manager: TBD
>> 	• Status: Awaiting review
> 
> Well, this is certainly comprehensive! Sorry about the delay in answering; I've been hosting a house guest and haven't had a lot of free time.

Hi Brent, thanks for providing feedback on the proposal!

> 
>> The primary goal of this proposal are to introduce a unit of encapsulation within a module that is larger than a file as a means of adding explicit structure to a large program. All other goals are subordinate to this goal and should be considered in light of it. 
> 
> I agree with this as the primary goal of a submodule system.
> 
>> Some other goals of this proposal are:
>> 
>> 	• Submodules should help us to manage and understand the internal dependencies of a large, complex system.
>> 	• Submodules should be able to collaborate with peer submodules without necessarily being exposed to the rest of the module.
>> 	• A module should not be required to expose its internal submodule structure to users when symbols are exported.
>> 	• It should be possible to extract a submodule from existing code with minimal friction. The only difficulty should be breaking any circular dependencies.
> 
> One goal I don't see mentioned here is "segment the API surface exposed to importing code”.

Yeah, this is certainly a goal.  I didn’t explicitly state it because it feels intrinsic to any submodule solution.  Do you think it’s important to call it out explicitly?

> The `UIGestureRecognizerSubclass` use case has been thoroughly discussed, but I think there are probably a lot of cases where there are two "sides" to an API and it'd often be helpful to hide one unless it's needed. `URLProtocol` and `URLProtocolClient` come to mind; the many weird little classes and symbols related to `NSAtomicStore` and `NSIncrementalStore` might be another.

Submodules could certainly address these use cases, but I’m not sure they are the best solution we can come up with.  I’ve been kicking around an idea I’m calling “symbol groups”.  A symbol group would be a super lightweight way to isolate symbols that should not be imported by default.  When you do need them you would say something like this:

`import UIKit including UIGestureRecognizerSubclass`

The basic idea is that these symbols should be explicitly requested on import, but otherwise are considered part of the same submodule.  There are two reasons I think a really lightweight mechanism like this is a better solution for this use case.  First, a submodule feels like a pretty heavy solution to isolate a handful of symbols that are all related to the same type.  Second, a submodule establishes a scope boundary and that will often be undesirable and complicate the implementation.  Symbol groups avoid both of those problems.

I haven’t given too much thought to how symbol groups would be declared and how symbols would be placed in them.  What do you think of the concept?

> 
> (I'm not necessarily suggesting that the Foundation and Core Data overlays should move these into submodules—I'm suggesting that, if they were implemented in a Swift with a submodule feature, they would be candidates for submodule encapsulation.)
> 
>> Submodule names form a hierarchical path:
>> 
>> 	• The fully qualified name of the submodule specified by Submodule.InnerSubmodule is: MyModuleName.Submodule.InnerSubmodule.
>> 	• In this example, InnerSubmodule is a child of Submodule.
>> 	• A submodule may not have the same name as any of its ancestors. This follows the rule used by types.
> 
> Does being in a nested submodule have any semantic effect, or is it just a naming trick?

It does have a semantic effect.  Ancestor submodules are allowed to bound the visibility of descendants.  They also form a scope boundary that I would like to be able to reference when bounding the visibility of a submodule as well as in access modifiers.

> 
>> Submodules may not be extended. They form strictly nested scopes.
>> 
>> 	• The only way to place code in a submodule is with a submodule declaration at the top of a file.
>> 	• All code in a file exists in a single submodule.
> 
> I'm a big supporter of the 1-to-N submodule-to-file approach.
> 
>> There are several other ways to specify which submodule the top-level scope of a file is in. All of these alternatives share a crucial problem: you can’t tell what submodule your code is in by looking at the file. 
>> 
>> The alternatives are:
>> 
>> 	• Use a manifest file. This would be painful to maintain.
>> 	• Use file system paths. This is too tightly coupled to physical organization. Appendix A discusses file system independence in more detail.
>> 	• Leave this up to the build system. This makes it more difficult for a module to support multiple build systems.
> 
> I'm going to push back on this a little. I don't like the top-of-file `submodule` declaration for several reasons:
> 
> 	1. Declarations in a Swift file are almost always order-independent. There certainly aren't any that must be the first thing in the file to be valid.

Sure, but we also don’t put any semantic information outside of source files today either.

> 
> 	2. Swift usually keeps configuration stuff out of source files so you can copy and paste snippets of code or whole files around with minimum fuss. Putting `submodule` declarations in files means that developers would need to open and modify those files if they wanted to copy them to a different project. (It's worth noting that your own goal of making it easy to extract submodules into separate modules is undermined by submodule declarations inside files.)

The goal is to extract submodules from code within the same module.  The files are going to need to be configured as part of a new submodule one way or another.  I don’t see the declaration as burdensome.

> 
> 	3. However you're organizing your source code—whether in the file system, an IDE project, or whatever else—it's very likely that you will end up organizing files by submodule. That means either information about submodules will have to be specified twice—once in a canonical declaration and again in source file organization—and kept in sync, or IDEs and tooling will have to interpret the `submodule` declarations in source files and reflect that information in their UIs.

The problem with using the file system is that even if I physically organize my files by submodule I might want to have a submodule with sub-folders all containing code that is in that submodule (not in descendent submodules).  This makes it tricky to use a file system convention for defining submodule boundaries.

If you put the information in the build system it is distant from the code.  I would prefer to just tell the build system where to find the source files for my module and have it work out how they are organized into submodules.  Requiring the declaration to be at the top of the file should help with this.

> 
> 	4. Your cited reason for rejecting build system-based approaches is that "This makes it more difficult for a module to support multiple build systems", but Swift has this same problem in *many* other parts of its design. For instance, module names and dependencies are build system concerns, despite the fact that this makes it harder to support multiple build systems. I can only conclude that supporting multiple build systems with a single code base is, in the long term, a non-goal, presumably by improving the Xcode/SwiftPM story in some way.

Fair enough.  This was a secondary point in my mind.  The primary point is that I think semantically relevant information belongs in source files.  In this case, it belongs at the *top* of the source file so a reader has the necessary context before seeing the rest of the code.

> 
> I'm still a fan of build-system-based approaches because I think they're better about these issues. The only way that they're worse is that—as you note—it may not be clear which submodule a particular file is in. But I think this is basically a UI problem for editors.

Comments above aside, I would not want to see the mechanism for placing files in a submodule sink the proposal.  I am happy to defer to the judgment of the core team and the community on this issue.

> 
>> Top-level export
>> 
>> All export statements consist of an access modifier, the export keyword, and a submodule name:
>> 
>> open export ChildSubmodule
> 
> Is the access control keyword mandatory?

It is mandatory.  It specifies the bound of the export.

> 
> If we do indeed use `submodule` statements, could we attach the attributes to them, rather than having a separate statement in a different file?

I considered this.  The problem is that the `submodule` statement is specified in every file that resides in the submodule.  We don’t want to require annotations to be repeated everywhere.  

Secondarily, the proper way to think about this is that the submodule is a declaration introduced in the scope of its parent, just like any declaration.  However, in this case the scopes are logical rather than physical / lexical.  This is why the parent specifies annotations using an `export` statement.  The `export` statement in the parent is semantically equivalent to a declaration.  We allow the declaration to be elided if the parent doesn’t have any annotations to add because the children already state the name of the submodule.

> 
> What does it mean if a `public` or `open` symbol is in a submodule which is not `export`ed?

It means exactly the same thing as having a `public` or `open` symbol in a `private` type.  The availability is bounded by the containing scope.

> 
>> 	• A submodule may be published under a different external name using the export as NewName syntax*.
> 
> What's the use case for this feature?

One of the stated goals of the proposal is "A module should not be *required* to expose its internal submodule structure to users when symbols are exported.”

The idea is that submodules form scope boundaries inside the module.  The structure of these scope boundaries is really an implementation detail.  Outside the module they form an interface provided to users of the module.  The interface exposed to users should not have to be coupled to implementation details.  Renaming allows us to freely use submodules to form scope boundaries in the implementation while still exposing the preferred interface externally to users.  

This also allows the submodule structure of a module to evolve over time without breaking users.  The external interface can be stable while the internal structure changes to meet the needs of the implementation.

> 
>> 	• @implicit causes symbols from the submodule to be implicitly imported when the module is imported.
>> 	• @inline causes the symbols from the submodule to appear as if they had been declared directly within the top-level submodule.
> 
> So if you write `@implicit public export Bar` in module `Foo`, then writing `import Foo` also imports `Foo.Bar.Baz` *as* `Foo.Bar.Baz`, whereas `@inline public export Bar` copies `Foo.Bar.Baz` into `Foo`, so it imports as `Foo.Baz`?
> 
> What's the use case for supporting both of these behaviors?

That’s the idea.  It seems like an arbitrary and unnecessary limitation to couple implicit import to also inlining the symbols.  I would be willing to let this go if there was enough resistance that it compromised the reception of the proposal as a whole.  It could always be added later.

> 
>> Exports within the module
>> 
>> A submodule may bound the maximum visibility of any of its descendent submodules by explicitly exporting it:
> 
> I'm not sure how valuable this feature is in this kind of submodule design.

I think this is a very important feature.  I would like to be able to have peer submodules collaborate in ways that are not visible beyond the parent.

One of the big goals I have is to allow us to realize the logical benefits of encapsulation modules provide without the physical separation that results in additional build complexity and performance degradation (at least until we have whole program optimization and maybe event then).  Some modules can be quite large.  The ability to use layers of encapsulation to structure a large module is invaluable IMO.  Most of us do this logically already.  We just don’t have a way to state these boundaries in the language.

> 
> ***
> 
> To avoid being coy, here's the export control model that *I* think would make the most sense for this general class of submodule system designs:
> 
> 1. A submodule with `public` or `open` symbols is importable from outside the module. There is no need to separately mark the submodule as importable.

This is an interesting approach.  But it runs against the grain of layered scopes of encapsulation.  

I also *like* the idea of having an `Exports.swift` file or something like that at the top level of a module which lists all of the submodules visible externally and performs any name mapping required.  This will require minimal maintenance and provide important information to newcomers on a project.

> 
> 2. Normally, references within the module to submodule symbols need to be prefixed with the submodule name. (That is, in top-level `Foo` code, you need to write `Bar.Baz` to access `Foo.Bar.Baz`). As a convenience, you can import a submodule, which makes the submodule's symbols available to that file as though they were top-level module symbols.

One of the goals I have for this proposal is to help identify internal dependencies between submodules.  I think requiring `import` statements helps to facilitate this goal.  But maybe I could be convinced that requiring fully qualified names is enough.  I’m not sure though.

> 
> 3. When you import a submodule, you can mark it with `@exported`; this indicates that the symbols in that submodule should be aliased and, if `public` or `open`, re-exported to other modules.

I don’t understand what the purpose of this would be given your point 1 above.  I definitely don’t like the idea of a symbol being exposed outside the module with more than one fully qualified name.

> 
> 4. There are no special facilities for renaming submodules or implicitly importing submodules.

Is there a reason you think renaming is not important?  Do you really believe that the names users see should be coupled to the scope boundaries a module uses in its implementation?

> 
>> Importing submodules
>> 
>> Submodules are imported in exactly the same way as an external module by using an import statement.
> 
> Okay, but what exactly does importing *do*? Set up un-prefixed private aliases for the submodule's internal-and-up APIs?

This and also make the APIs available.  The symbols in other submodules are not available without import.

> 
>> There are a few additional details that are not applicable for external modules:
>> 
>> 	• Circular imports are not allowed.
> 
> Why not? In this design, all submodules are evaluated at once, so I'm not sure why circular imports would be a problem.

One of the goals of submodules is to help manage internal dependencies.  Is there a specific reason you believe circular imports should be allowed?  Do you have use cases that are not possible without circular dependencies?

> 
>> // `Grandparent` and all of its descendents can see `Child1` (fully qualified: `Grandparent.Parent.Child1`)
>> // This reads: `Child1` is scoped to `Grandparent`.
>> 
>> scoped(Grandparent) export Child1
>> 
>> // `Child2` is visible throughout the module but may not be exported for use by clients.
>> // This reads: `Child2` is scoped to the module.
>> 
>> scoped(module) export Child2
>> 
>> With parameterization, scoped has the power to specify all access levels that Swift has today:
>> 
>> `scoped`                                      == `private` (Swift 3)
>> `scoped(file)`                                == `private` (Swift 2 & 4?) == `fileprivate` (Swift 3)
>> `scoped(submodule)`                           == `internal`
>> `scoped(public) scoped(internal, inherit)`*   == `public`
>> `scoped(public)`                              == `open`
>> 
>> The parameterization of scoped also allows us to reference other scopes that we cannot in today’s system, specifically extensions: scoped(extension) and outer types: scoped(TypeName).
> 
> What is the purpose of creating more verbose aliases for existing access levels? I can't think of one, which means that these are redundant.

I’m going to request that we move this topic to the thread I started for my new access control proposal.  It elaborates extensively on the rationale.  We can discuss further there.

As far as this proposal goes, I really don’t want access control to get in the way of the larger proposal.  The basic design works regardless of what access control looks like.  It would be a shame if we don’t have both submodule and module wide scopes available but that is somewhat orthogonal to the larger design of submodules.  

The aspect of the proposal that relies most heavily on this is exports bounded by an ancestor submodule.  As you have already noted, the basic design still works without that feature.  It would also be possible to redesign that feature so that it doesn’t rely on access control.  I think that would be unfortunate but it is possible.

> 
> And if we remove them as redundant, the remaining access control levels look like:
> 
> 	scoped
> 	private
> 	scoped(TypeName)
> 	internal
> 	scoped(SomeModule)
> 	scoped(module)
> 	scoped(extension)
> 	public
> 	open
> 
> There's just no logic to the use of the `scoped` keyword here—it doesn't really mean anything other than "we didn't want to assign a keyword to this access level”.

I’m looking forward to your feedback on the new access control proposal.

> 
> ***
> 
> I think we need to go back to first principles here. The reason to introduce a new access level is that we believe that a submodule is a large enough unit of code that it will simultaneously need to encapsulate some of its implementation details from other submodules, *and* have some of its own implementation details encapsulated from the rest of the submodule. Thus, we need at least three access levels within a submodule: one that exposes an API to other submodules, one that exposes an API throughout a submodule, and one that exposes it to only part of a submodule.
> 
> What we do *not* need is a way to allow access only from certain other named submodules. The goal is to separate external and internal interfaces, not to micromanage who can access what.

I think the difference in our perspectives is that you’re viewing submodules as a basically flat system which maybe have names that form a hierarchy.  I’m viewing submodules as a true hierarchy of scopes, kind of like russian dolls.  This is very different from something like `friend` in C++.  You are *not* allowed to give access to any arbitrary submodule.  You are not even allowed to give access to any *single* submodule.  What you are allowed to do is bound visibility to a *family* of submodules that all have a common ancestor (the bound).

This is not micromanaging who gets to access what.  It is saying that we can form scopes that increase in size from files to submodules to parent submodules to the entire module.  I want truly hierarchical submodule scopes, not just a flat sea of submodule scopes.

> 
> Basically, that means we need one of two things. Keeping all existing keywords the same—i.e., not removing either `private` or `fileprivate`— and using `semi-` as a placeholder, we want to either have:
> 
> 	private: surrounding scope
> 	fileprivate: surrounding file
> 	semi-internal: surrounding submodule
> 	internal: surrounding module
> 	public: all modules (no subclassing)
> 	open: all modules (with subclassing)
> 
> Or:
> 
> 	private: surrounding scope
> 	fileprivate: surrounding file
> 	internal: surrounding submodule
> 	semi-public: surrounding module
> 	public: all modules (no subclassing)
> 	open: all modules (with subclassing)
> 
> The difference between the two is that, with `semi-internal` below `internal`, submodule APIs are exposed by default to other submodules; with `semi-public` above `internal`, submodule APIs are encapsulated by default from other submodules.

Either of these would be perfectly workable with the basic design I have proposed.  But they both adopt the “flat sea of submodule scopes” perspective.  This is far better than what we have now, but it is also not ideal.  Truly hierarchical submodule scopes give us a lot more flexibility to structure the implementation of our module.

> 
> I think encapsulating by default is the right decision, so we want the `semi-public` design.
> But there's also a second reason to use that design: We can anticipate another use case for it. The library resilience design document discusses the idea of "resilience domains"—groups of libraries whose versions are always matched, and which therefore don't need to use resilient representations of each others' data structures—and the idea of having "SPIs", basically APIs that are only public to certain clients. I think these ideas could be conflated, so that a semi-public API would be available both to other submodules in the module and to other libraries in your resilience domain, and that this feature could be used to expose SPIs.
> 
> So, that leaves an important question: what the hell do you call this thing? My best suggestions are `confidential` and `privileged`; in the context of information, these are both used to describe information which *is* shared, but only within a select group. (Think, for instance, of attorney-client privilege: You can share this information with your lawyer, but not with anyone else.)

This brings to mind the SE-0025 debate.  I can’t say for sure, but it’s quite likely that both of those terms were discussed in that thread. :)

In any case, I agree that identifying a good name for this is worthwhile.

> 
> So in short, I suggest adding a single access level to the existing system:
> 
> 	private
> 	fileprivate
> 	internal
> 	confidential/privileged
> 	public
> 	open
> 
> This is orthogonal to any other simplification of the access control system, like removing `private` or `fileprivate`.
> 
>> Appendix A: file system independence
> 
> I think we need to decide: Is a translation unit of some sort—whether it's a physical on-disk file or some simulacrum like a database record or just a separate string—something intrinsic to Swift? I think it should be; it simplifies a lot of parts of the language that would otherwise require nesting and explicit scoping.
> 
> If translation units are an implicit part of Swift, then this section is not really necessary. If translation units aren't, then we need to rethink a lot of things that are already built in.

I tend to agree.  I included this section mostly because Robert and Jaden emphasized file system independence quite heavily in their submodule thread.  I wanted to make it very clear that while files play a role in this design it is not coupled to the physical file system at all.  Files don’t have any role that isn’t already giving to them by Swift.

I really appreciate all of your feedback Brent!  I’m looking forward to continuing the discussion.

> 
> -- 
> Brent Royal-Gordon
> Architechies
>