[swift-evolution] Proposal to improve C pointer type import

Florent Bruneau florent.bruneau at intersec.com
Tue Feb 7 14:34:14 CST 2017


Hi Charlie,

Thanks for your answer.

> Le 7 févr. 2017 à 18:23, Charlie Monroe <charlie at charliemonroe.net> a écrit :
> 
>> 
>> On Feb 7, 2017, at 5:56 PM, Florent Bruneau via swift-evolution <swift-evolution at swift.org> wrote:
>> 
>> Anyone interested in that subject?
>> 
>>> Le 31 janv. 2017 à 09:16, Florent Bruneau via swift-evolution <swift-evolution at swift.org> a écrit :
>>> 
>>> Hi swift-evolution, 
>>> 
>>> For the last few weeks, I've been working on introducing some Swift in a pure-C codebase. While the Clang importer makes the process quite smooth, there are still some rough edges.
>>> 
>>> Here is a (lengthy) proposal resulting from that experience.
>>> Rendered version: https://gist.github.com/Fruneau/fa83fe87a316514797c1eeaaaa2e5012
>>> 
>>> Introduction
>>> =======
>>> 
>>> Directly importing C APIs is a core feature of the Swift compiler. In that process, C pointers are systematically imported as `Unsafe*Pointer` swift objects. However, in C we make the distinction between pointers that reference a single object, and those pointing to an array of objects. In the case of a single object of type `T`, the Swift compiler should be able to import the parameter `T *` as a `inout T`, and `T const *` as `T`. Since the compiler cannot makes the distinction between pointer types by itself, we propose to add an attribute of C pointer for that purpose.
>>> 
>>> Motivation
>>> =======
>>> 
>>> Let consider the following C API:
>>> 
>>> ```c
>>> typedef struct sb_t {
>>>  char * _Nonnull data;
>>>  int len;
>>>  int size;
>>> } sb_t;
>>> 
>>> /** Append the string \p str to \p sb. */
>>> void sb_adds(sb_t * _Nonnull sb, const char * _Nonnull str);
>>> 
>>> /** Append the content of \p other to \p sb. */
>>> void sb_addsb(sb_t * _Nonnull sb, const sb_t * _Nonnull other);
>>> 
>>> /** Returns the amount of available memory of \p sb. */
>>> int sb_avail(const sb_t * _Nonnull sb);
>>> ```
>>> 
>>> This is imported in Swift as follow:
>>> 
>>> ```swift
>>> struct sb_t {
>>>  var data: UnsafeMutablePointer<Int8>
>>>  var len: Int32
>>>  var size: Int32
>>> }
>>> 
>>> func sb_adds(_ sb: UnsafeMutablePointer<sb_t>, _ str: UnsafePointer<Int8>)
>>> func sb_addsb(_ sb: UnsafeMutablePointer<sb_t>, _ other: UnsafePointer<sb_t>)
>>> func sb_avail(_ sb: UnsafePointer<sb_t>) -> Int32
>>> ```
>>> 
>>> `sb_adds()` takes two pointers: the first one is supposed to point to a single object named `sb` that will be mutated in order to add the content of `str` which points to a c-string. So we have two kinds of pointers: the first points to a single object, the second to a buffer. But both are represented using `Unsafe*Pointer`. Swift cannot actually make the difference between those two kind of pointers since the C language provides no way to express it.
>>> 
>>> `sb_addsb()` takes two objects of type `sb_t`. The first is mutated by the function by appending the content of the second one, which is `const`. The constness is properly reflected in Swift. However, the usage of the imported API is Swift might be surprising since Swift requires usage of an `inout` parameter in order to build an `Unsafe*Pointer` object:
>>> 
>>> ```swift
>>> var sb = sb_t(...)
>>> let sb2 = sb_t(...)
>>> sb_addsb(&sb, &sb2) // error: cannot pass immutable value as inout argument: 'sb2' is a 'let' constant
> 
> This is because your declaration is const sb_t * _Nonnull other... See http://stackoverflow.com/questions/1143262/what-is-the-difference-between-const-int-const-int-const-and-int-const
> 
> Change it to "const sb_t * const _Nonnull other" and you get a non-mutable pointer and you can use it with let.

Actually, no. In case of a function argument, the difference between `const sb_t *` and `const sb_t * const` is the same as between a `var` argument and a `let` argument in swift < 3: it only affects the mutability of the variable inside the function definition, and has no effect on the outside. When imported in swift, both result in the exact same function prototype.

```c
void sb_addsb(sb_t *self, const sb_t *other);
void sb_addsb2(sb_t *self, const sb_t * const other);
void sb_addsb3(sb_t * const self, const sb_t * const other);
```

```swift
public func sb_addsb(_ self: UnsafeMutablePointer<sb_t>!, _ other: UnsafePointer<sb_t>!)
public func sb_addsb2(_ self: UnsafeMutablePointer<sb_t>!, _ other: UnsafePointer<sb_t>!)
public func sb_addsb3(_ self: UnsafeMutablePointer<sb_t>!, _ other: UnsafePointer<sb_t>!)
```

> 
>>> sb_addsb(&sb, sb2) // cannot convert value of type 'sb_t' to expected argument type 'UnsafePointer<sb_t>!'
> 
> If the other parameter is const, why not just take in the struct vs. pointer to it? Yes, you run into the risk of copying the structure, but since the structure (unless it's really small and fits into registers on some architectures) gets passed by reference and if the compiler is smart enough during optimization, it won't copy it anyway... (At least from what I remember reading.)

There are cases where you just cannot pass the structure by value. For example structure ending with a variable-size buffer:

```c
struct with_trailing_buf {
    int len;
    char buf[];
};
```

Anyway, the question here isn't really wether we should rewrite our whole code base, but rather: can we make the Clang Importer run smoothly even on non-Apple codebases. In C there are certainly as many coding conventions as there are developers, so if we want more developer willing to use safer languages without throwing away their existing code bases, I think we need to make the importer more flexible.

> 
>>> var sb3 = sb_t(...)
>>> sb_addsb(&sb, &sb3) // works
>>> ```
>>> 
>>> ```swift
>>> sb_avail(&sb2) // cannot convert value of type 'sb_t' to expected argument type 'UnsafePointer<sb_t>!'
>>> ```
>>> 
>>> 
>>> However, Swift also provides the `swift_name()` attribute that allows remapping a C function to a Swift method, which includes mapping one of the parameter to `self:`:
>>> 
>>> ```c 
>>> __attribute__((swift_name("sb_t.add(self:string:)")))
>>> void sb_adds(sb_t * _Nonnull sb, const char * _Nonnull str);
>>> __attribute__((swift_name("sb_t.add(self:other:)")))
>>> void sb_addsb(sb_t * _Nonnull sb, const sb_t * _Nonnull other);
>>> __attribute__((swift_name("sb_t.avail(self:)")))
>>> int sb_avail(const sb_t * _Nonnull sb);
>>> ```
> 
> While I do feel your pain dealing with structs imported from C, nothing is stopping you from making an extension of that struct and implementing these methods on it... Yes, it's a lot of boilerplate, but it can be in a separate file until you migrate your C code into Swift, where as the suggested solution generates so many annotations that it's IMHO unreadable for anyone hoping to use the API from pure C...

My problem is exactly the lot of boilerplate. We are still investigating the ability to switch from C to swift (at least for some part of our codebase), but we cannot afford rewriting the whole code, nor spending months writing overlays for every C library we want to be able to use from swift. And, but this is just my opinion, I think that adding qualifiers such as the `_Nonnull`, `_Nullable` (and the proposed `_Ref`), in addition to improved interoperability, also helps self-documenting the APIs, and helps providing safer code even in C (since the qualifier offer the opportunity for new static analysis heuristics).

> 
>>> 
>>> ```swift
>>> struct sb_t {
>>>  var data: UnsafeMutablePointer<Int8>
>>>  var len: Int32
>>>  var size: Int32
>>> 
>>>  mutating func add(string: UnsafePointer<Int8>)
>>>  mutating func add(other: UnsafePointer<sb_t>)
>>>  func avail() -> Int32
>>> }
>>> ```
>>> 
>>> With that attribute used, there is no need to convert the parameter mapped to `self:` to an `Unsafe*Pointer`. As a consequence, we have an improved API:
>>> 
>>> ```swift
>>> sb2.avail() // This time it works!
>>> ```
>>> 
>>> But we also have some inconsistent behavior since only `self:` is affected by this:
>>> 
>>> ```swift
>>> sb.add(other: &sb2)  // error: cannot pass immutable value as inout argument: 'sb2' is a 'let' constant
>>> sb.add(other: sb2) // cannot convert value of type 'sb_t' to expected argument type 'UnsafePointer<sb_t>!'
>>> ```
>>> 
>>> 
>>> What we observe here is that mapping an argument to `self:` is enough for the compiler to be able to change its semantics. As soon as it knows the pointer is actually the pointer to a single object, it can deal with it without exposing it as an `Unsafe*Pointer`, making the API safer and less surprising.
>>> 
>>> 
>>> Proposed solution
>>> ================
>>> 
>>> A new qualifier could be added to inform the compiler that a pointer points to a single object. Then the Swift compiler could use that new piece of the information to generate API that use directly the object type instead of the pointer type. We propose the introduction of a new qualifier named `_Ref`, semantically similar to a C++ reference. That is:
>>> 
>>> * `_Ref` is applied with the same grammar as the `_Nonnull`,  `_Nullable`, family
>>> * A pointer tagged `_Ref` cannot be used to access more than the single pointed object.
>>> * A pointer tagged `_Ref` is non-owning
>>> 
>>> Parameters qualified with `_Ref` would then be imported in Swift as follows:
>>> 
>>> * `T * _Ref _Nonnull` is imported as `inout T`
>>> * `T * _Ref _Nullable` is imported as `inout T?`
>>> * `T const * _Ref _Nonnull` is imported as `T`
>>> * `T const * _Ref _Nullable` is imported as `T?`
>>> 
>>> Example
>>> =======
>>> 
>>> In the context of the provided example from the motivation section:
>>> 
>>> ```c
>>> typedef struct sb_t {
>>>  char * _Nonnull data;
>>>  int len;
>>>  int size;
>>> } sb_t;
>>> 
>>> /** Append the string \p str to \p sb. */
>>> void sb_adds(sb_t * _Ref _Nonnull sb, const char * _Nonnull str);
>>> 
>>> /** Append the content of \p other to \p sb. */
>>> void sb_addsb(sb_t * _Ref _Nonnull sb, const sb_t * _SIngle _Nonnull other);
>>> 
>>> /** Returns the amount of available memory of \p sb. */
>>> int sb_avail(const sb_t * _Ref _Nonnull sb);
>>> ```
>>> 
>>> Would be imported as follow:
>>> 
>>> ```swift
>>> struct sb_t {
>>>  var data: UnsafeMutablePointer<Int8>
>>>  var len: Int32
>>>  var size: Int32
>>> }
>>> 
>>> func sb_adds(_ sb: inout sb_t, _ str: UnsafePointer<Int8>)
>>> func sb_addsb(_ sb: inout sb_t, _ other: sb_t)
>>> func sb_avail(_ sb: sb_t) -> Int32
>>> ```
>>> 
>>> Impact on existing code
>>> =================
>>> 
>>> This proposal has no impact on existing code since it proposes additive changes only. However, opting in for the `_Ref` qualifier on APIs already exposed in Swift will impact the generated code.
>>> 
>>> * For `const` pointers, the change is always source-incompatible
>>> * For non-`const` pointers, the change will be source-compatible everywhere we use the `&object` syntax to pass the argument from a plain object, but will break sources that passed an `Unsafe*Pointer` as argument.
>>> 
>>> 
>>> Alternatives considered
>>> ===================
>>> 
>>> It has been considered to use to qualifiers family instead of the `_Ref`:
>>> 
>>> - one family to specify the kind of pointer: single object or array
>>> - one family to declare the ownership
>>> 
>>> This approach has the clear advantage to be more flexible, however it has been found to be less expressive. Considering C API already should use nullability qualifiers on every single pointers, forcing two additional qualifiers on every pointer would be painful and negatively impact the readability of the C APIs.
>>> 
>>> `_Ref` on the other hand is short and leverage a concept already known by developers, but is also more specific to particular use case.
>>> 
>>> 
>>> Discussion
>>> ========
>>> 
>>> * Safety: won't this make developper think they are calling safe APIs from Swift while the API is actually unsafe?
>>> 
>>> There is certainly a risk a C API make an improper use of `_Ref` (in particular, breaks the non-owning part of the contract). However, this kind of safety issues are already present when using the `swift_name()` attribute of function and mapping one of the pointer parameter of the function to `self:`, or when using the nullability qualifiers.
>>> 
>>> * What about pointers stored in structures? or pointers returned by functions?
>>> 
>>> As a qualifier, `_Ref` could also be used on pointers that are not arguments of a function:
>>> 
>>> ```c
>>> typedef struct {
>>>  sb_t * _Ref obj;
>>> } sb_ptr_t;
>>> 
>>> sb_t * _Ref sb_get_singleton(void);
>>> ```
>>> 
>>> Swift, however, cannot import those as `sb_t` but will still be forced to use `Unsafe*Pointer<sb_t>` since `sb_t` is a structure and as such is not stored by reference.
>>> 
>>> We could also imagine a standard `Reference<T>` type that would wrap a pointer to a `T` (and could exposes the API of `T` on it).
>>> 
>>> * What about function pointers that take a `_Ref` object?
>>> 
>>> When an API takes a function pointer whose type includes a `_Ref` qualified parameter, the qualifier applies:
>>> 
>>> ```c
>>> void take_cb(int (*a)(sb_t const * _Ref _Nonnull sb, sb_t * _Ref _Nonnull other))
>>> ```
>>> 
>>> ```swift
>>> func cb(sb: sb_t, other: inout sb_t) {
>>>  ...
>>> }
>>> 
>>> take_cb(cb)
>>> ```
>>> 
>>> Swift guarantees we cannot break the non-owning contract and that we respect the constness of the parameter. This is safer than using the `Unsafe*Pointer`-based alternative.
>>> 
>>> * Other use cases than Swift's?
>>> 
>>> The `_Ref` qualifier could be used by static analysis to check that functions don't access memory it shouldn't access: as long as some code manipulates some memory through a `_Ref` qualified pointer, it shouldn't access memory address bellow that pointer or above that pointer plus the stride of the type (an exception remains for types ending with a zero-length array).
>>> 
>>> * What about pointers to arrays of objects?
>>> 
>>> This is another topic. We could imagine a `_Array` qualifier that could take an optional length.
>>> 
>>> ```c
>>> /* The number of elements is statically known or passed as argument */
>>> int main(int argc, char ** _Array(argc) argv)
>>> 
>>> /* The number of element is unknown. */
>>> int puts(const char * _Array str);
>>> ```
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution at swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution at swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution



More information about the swift-evolution mailing list