[swift-evolution] [Discussion] Swift for Data Science / ML / Big Data analytics

Chris Lattner sabre at nondot.org
Thu Nov 9 01:55:34 CST 2017


> On Nov 2, 2017, at 4:39 AM, Richard Wei <rxrwei at gmail.com> wrote:
>> Question though: how does it work?  Say the first np.array call threw a python exception:
>> 
>>> try Python.do {
>>>         let a = np.array([1, 2, 3])
>>>         let b = np.array([[2], [4]])
>>>         print(a.dot(b)) // matrix mul with incompatible shapes
>>>     }
>> 
>> 
>> We can definitely make the python glue code notice it, catch it and squirrel it away somewhere, but without compiler hacks we couldn’t make it jump out of the closure.  This means that np.array would have to return something, and the calls below it would still execute, or am I missing something?
> 
> We make PythonObjects internally nullable (only in the exception-caught state). The second np.array would just return a null PythonObject. 
> 
> To be specific, we define three states in the python overlay:
> - Normal state: PythonObjects are guaranteed to be non-null. Any exception traps.
> - Exception-catching state: PythonObjects are still guaranteed to be non-null. Any exception triggers the exception-caught state.
> - Exception-caught state: PythonObjects are nullable — all python expressions return a null PythonObject.
> 
> The exception-catching state is entered during the execution of Python.do’s body.


I don’t think that will work well in practice: if you’re intermixing swift and python calls, it would be extremely surprising to see values be computed but produce no value.  This seems similar to the ObjC "messaging nil” problem.

That said, I’ve been experimenting with another approach that seems to work well in practice.  Due to the language mismatch, a Swift programmer is going to have to decide whether they care about a throwing call or not.  As such, I think it makes sense to make this explicit with some kind of syntax - either a postfix operator like ^ or a method like “throwing”.  This allows you to write (when we have language sugar) either "a.foo()” or “a.throwing.foo()” if you want to handle the case when foo throws an exception.

This sounds yucky, but actually works pretty well in practice because the “try” warnings (about the absence/excess of try) dovetail really well.  Without language support we get something like this:

  // import pickle
  let pickle = Python.import("pickle")

  // file = open(filename)
  guard let file = try? Python.open.throwing.call(args: filename) else {
    fatalError("""
       Didn't find data file at \(filename)!
       Update the DataPath at the top of the file.\n
       """)
  }

  // blob = file.read()
  let blob = file.get(member: "read").call(args: [])

  // a, b, c = pickle.loads(blob)
  let (a, b, c) = pickle.get(member: "loads").call(args: blob).get3Tuple()


When we allow sugaring 'x.get(member: “foo”)’ into ‘x.foo’ and allow sugaring x.call(args: a, b, c) into ‘x(a, b,c)’, we’ll get this code:

  // import pickle
  let pickle = Python.import("pickle")

  // file = open(filename)
  guard let file = try? Python.open.throwing(filename) else {
    fatalError("""
       Didn't find data file at \(filename)!
       Update the DataPath at the top of the file.\n
       """)
  }

  // blob = file.read()
  let blob = file.read()

  // a, b, c = pickle.loads(blob)
  let (a, b, c) = pickleloads(blob).get3Tuple()


Which is pretty nice.   We can talk about making tuple destructuring extensible later.  :-)


Here’s the prototype that backs the code above - it is hacky, has known suboptimality, and probably isn’t self contained.  I’ll clean this up and write up a proposal for the two bits of sugar above when I have time.

-Chris




/// This represents the result of a failable operation when working with
/// python values.
public enum PythonError : Error {
  /// This represents a null IUO being passed into a PyRef.  This should only
  /// occur when working with C APIs.
  case nullValue
  /// This represents an exception thrown out of a Python API.  This can occur
  /// on calls.
  case exception(_ : PyRef)
}

/// Reference to a Python value.  This is always non-null and always owning of
/// the underlying value.
public final class PyRef : PythonObjectionable {
  private var state : UnsafeMutablePointer<PyObject>

  var borrowedPyObject : UnsafeMutablePointer<PyObject> {
    return state
  }
  var ownedPyObject : UnsafeMutablePointer<PyObject> {
    return py_INCREF(state)
  }
  public init(ownedThrowing: UnsafeMutablePointer<PyObject>!) throws {
    if let owned = ownedThrowing {
      state = owned
    } else {
      throw PythonError.nullValue
    }
  }
  public convenience
  init(borrowedThrowing: UnsafeMutablePointer<PyObject>!) throws {
    try self.init(ownedThrowing: borrowedThrowing)
    py_INCREF(state)
  }
  deinit {
    py_DECREF(state)
  }

  public convenience init(owned: UnsafeMutablePointer<PyObject>!) {
    try! self.init(ownedThrowing: owned)
  }
  public convenience init(borrowed: UnsafeMutablePointer<PyObject>!) {
    try! self.init(borrowedThrowing: borrowed)
  }
  public convenience init?(python: UnsafeMutablePointer<PyObject>) {
    self.init(borrowed: python)
  }
  public func toPythonObject() -> PyRef {
    return self
  }


  /// Return a version of this value that throws when an error occurs on its
  /// next use.
  public var throwing : ThrowingPyRef {
    return ThrowingPyRef(self)
  }

  public func throwingGet(member: String) throws -> PyRef {
    return try PyRef(borrowedThrowing: PyObject_GetAttrString(state, member))
  }
  public func get(member: String) -> PyRef {
    return try! throwingGet(member: member)
  }
  public func throwingGet(dictMember: PythonObjectionable) throws -> PyRef {
    return try PyRef(borrowedThrowing:
      PyDict_GetItem(state, dictMember.toPythonObject().borrowedPyObject))
  }

  public func get(dictMember: PythonObjectionable) -> PyRef {
    return try! throwingGet(dictMember: dictMember)
  }
  /// Swift subscripts cannot throw yet, so model this as returning an optional
  /// reference.
  public subscript(throwing idx : PythonObjectionable) -> PyRef? {
    let item = PyObject_GetItem(self.state,
                                idx.toPythonObject().borrowedPyObject)
    return try? PyRef(borrowedThrowing: item)
  }

  public subscript(idx : PythonObjectionable) -> PyRef {
    return self[throwing: idx]!
  }

  public func throwingGet(tupleItem: Int) throws -> PyRef {
    return try PyRef(borrowedThrowing: PyTuple_GetItem(state, tupleItem))
  }

  public func get(tupleItem: Int) -> PyRef {
    return try! throwingGet(tupleItem: tupleItem)
  }
  // Helpers for destructuring tuples
  public func get2Tuple() -> (PyRef, PyRef) {
    return (get(tupleItem: 0), get(tupleItem: 1))
  }
  public func get3Tuple() -> (PyRef, PyRef, PyRef) {
    return (get(tupleItem: 0), get(tupleItem: 1), get(tupleItem: 2))
  }

  /// Call self, which must be a Python Callable.
  public
  func throwingCall(args: [PythonObjectionable],
                    kwargs: [(PythonObjectionable,PythonObjectionable)] = [])
    throws -> PyRef {
    // Make sure state errors are not around.
    assert(PyErr_Occurred() == nil, "Python threw an error but wasn't handled")

    let kwdict = kwargs.isEmpty ? nil : pyDict(kwargs)

    // Python calls always return a non-null value when successful.  If the
    // Python function produces the equivalent of C "void", it returns the None
    // value.  A null result of PyObjectCall happens when there is an error,
    // like 'self' not being a Python callable.
    let result = try PyRef(ownedThrowing:
      PyObject_Call(state, pyTuple(args), kwdict))

    // Translate a Python exception into a Swift error if one was thrown.
    if let exception = PyErr_Occurred() {
      PyErr_Clear()
      throw PythonError.exception(PyRef(borrowed: exception))
    }

    return result
  }

  /// Call self, which must be a Python Callable.
  public
  func call(args: [PythonObjectionable],
            kwargs: [(PythonObjectionable,PythonObjectionable)] = []) -> PyRef {
    return try! throwingCall(args: args, kwargs: kwargs)
  }
  /// Call self, which must be a Python Callable.
  public
  func call(args: PythonObjectionable...,
            kwargs: [(PythonObjectionable,PythonObjectionable)] = []) -> PyRef {
    return try! throwingCall(args: args, kwargs: kwargs)
  }

  // Run the specified closure on the borrowed function guaranteeing the pointer
  // isn't deallocated while the closure runs.
  public func borrowedMap<T>(_ fn: (UnsafeMutablePointer<PyObject>)->T) -> T {
    return withExtendedLifetime(self) {
      return fn(borrowedPyObject)
    }
  }
}


/// Reference to a Python value.  This always throws when handed a null object
/// or when a call produces a Python exception.
public struct ThrowingPyRef {
  private var state : PyRef

  public init(_ value : PyRef) {
    state = value
  }
  public init(owned: UnsafeMutablePointer<PyObject>!) throws {
    state = try PyRef(ownedThrowing: owned)
  }
  public init(borrowed: UnsafeMutablePointer<PyObject>!) throws {
    state = try PyRef(borrowedThrowing: borrowed)
  }

  public func get(member: String) throws -> PyRef {
    return try state.throwingGet(member: member)
  }

  public func get(dictMember: PythonObjectionable) throws -> PyRef {
    return try state.throwingGet(dictMember: dictMember)
  }
  /// Swift subscripts cannot throw yet, so model this as returning an optional
  /// reference.
  public subscript(idx : PythonObjectionable) -> PyRef? {
    return state[throwing: idx]
  }

  public func get(tupleItem: Int) throws -> PyRef {
    return try state.throwingGet(tupleItem: tupleItem)
  }

  public func get2Tuple() throws -> (PyRef, PyRef) {
    return try (get(tupleItem: 0), get(tupleItem: 1))
  }
  public func get3Tuple() throws -> (PyRef, PyRef, PyRef) {
    return try (get(tupleItem: 0), get(tupleItem: 1), get(tupleItem: 2))
  }

  /// Call self, which must be a Python Callable.
  public func call(args: [PythonObjectionable],
                   kwargs: [(PythonObjectionable,PythonObjectionable)] = [])
                   throws -> PyRef {
    return try state.throwingCall(args: args, kwargs: kwargs)
  }

  /// Call self, which must be a Python Callable.
  public func call(args: PythonObjectionable...,
                   kwargs: [(PythonObjectionable,PythonObjectionable)] = [])
                   throws -> PyRef {
    return try state.throwingCall(args: args, kwargs: kwargs)
  }
}

extension ThrowingPyRef : PythonObjectionable {
  public init?(python: UnsafeMutablePointer<PyObject>) {
    self.init(PyRef(python: python)!)
  }
  public func toPythonObject() -> PyRef {
    return state
  }
}

let builtinsObject = PyEval_GetBuiltins()!

public enum Python {
  public static func `import`(_ name: String) -> PyRef {
    return PyRef(owned: PyImport_ImportModule(name)!)
  }

  public static var builtins : PyRef {
    return PyRef(borrowed: builtinsObject)
  }

  // TODO: Make the Python type itself dynamically callable, so that things like
  // "Python.open" naturally resolve to Python.get(member: "open") and all the
  // builtin functions are therefore available naturally and don't have to be
  // enumerated here.
  public static var open : PyRef { return builtins["open"] }
  public static var repr : PyRef { return builtins["repr"] }
}

/// Make “print(pyref)" print a pretty form of the tensor.
extension PyRef : CustomStringConvertible {
  public var description: String {
    return String(python: self.call(member: "__str__"))!
  }
}

// Make PyRef's show up nicely in the Xcode Playground results sidebar.
extension PyRef : CustomPlaygroundQuickLookable {
  public var customPlaygroundQuickLook: PlaygroundQuickLook {
    return .text(description)
  }
}

//===----------------------------------------------------------------------===//
// Helpers working with PyObjects
//===----------------------------------------------------------------------===//

// Create a Python tuple object with the specified elements.
public func pyTuple(_ vals : [PythonObjectionable])
  -> UnsafeMutablePointer<PyObject> {
  let t = PyTuple_New(vals.count)!
  for (idx, elt) in vals.enumerated() {
    PyTuple_SetItem(t, idx, elt.toPythonObject().ownedPyObject)
  }
  return t
}

public func pyTuple(_ vals : PythonObjectionable...)
  -> UnsafeMutablePointer<PyObject> {
  return pyTuple(vals)
}

public func pyList(_ vals : PythonObjectionable...)
  -> UnsafeMutablePointer<PyObject> {
  return pyList(vals)
}
public func pyList(_ vals : [PythonObjectionable])
  -> UnsafeMutablePointer<PyObject> {
  let list = PyList_New(vals.count)!
  for (idx, elt) in vals.enumerated() {
    PyList_SetItem(list, idx, elt.toPythonObject().ownedPyObject)
  }
  return list
}

private func pyDict(_ elts : [(PythonObjectionable,PythonObjectionable)])
  -> UnsafeMutablePointer<PyObject> {
  let dict = PyDict_New()!
  for (key, val) in elts {
    PyDict_SetItem(dict, key.toPythonObject().ownedPyObject,
                   val.toPythonObject().ownedPyObject)
  }
  return dict
}

public func pySlice(_ start: PythonObjectionable,
                    _ end: PythonObjectionable,
                    _ step : PythonObjectionable? = nil)
  -> UnsafeMutablePointer<PyObject> {
  let stepv = step.flatMap { $0.toPythonObject().ownedPyObject }

  return PySlice_New(start.toPythonObject().ownedPyObject,
                     end.toPythonObject().ownedPyObject, stepv)!
}


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171109/cd329f55/attachment.html>


More information about the swift-evolution mailing list