[swift-evolution] [Discussion] Swift for Data Science / ML / Big Data analytics

Chris Lattner clattner at nondot.org
Sat Oct 28 16:10:40 CDT 2017


On Oct 28, 2017, at 9:45 AM, Maxim Veksler via swift-evolution <swift-evolution at swift.org> wrote:
> 
> Hey Guys,
> 
> The big data and machine learning world is dominated by Python, Scala an R. 
> 
> I'm a Swifter by heart, but not so much by tools of trait. 

Hi Max,

I’m very interested in this topic, with a specific focus on Python.  It isn’t the immediate thing on my priority list to deal with, but I hope that we get to push on this.

In short, I think we should build a simple Swift/Python interop story.  This sort of thing has be built numerous times for many languages (owing to Python’s great support for embed-ability), including things like PyObjC, boost.python, and many others.

In Swift, it is straightforward to make this example (http://cs231n.github.io/python-numpy-tutorial/#numpy-arrays <http://cs231n.github.io/python-numpy-tutorial/#numpy-arrays>) look something like this:

	let np = Python.import(“numpy”)   // Returns a value of type Python.Object.
	let a = np.array([1, 2, 3])
	print(type(a))    // Whether we want to support type(x) or use the Swift equivalent would be up for discussion of course!
	print(a.shape)
	print(a[0], a[1], a[2])
	a[0] = 5
	print(a)

	let b = np.array([[1,2,3],[4,5,6]])
	print(b.shape)
	print(b[0, 0], b[0, 1], b[1, 0])

… which is to say, exactly identical to the Python version except that new variables need to be declared with let/var.  This can be done by blessing Python.Object (which is identical to “PyObject*” at the machine level) with some special dynamic name lookup behavior:  Dot syntax turns into a call to PyObject_GetAttrString, subscripts turn into PyObject_GetItem, calls turn into PyObject_Call, etc.  ARC would be implemented with INCREF etc.

If we do this, the vast majority of the Python ecosystem should be directly usable from within Swift code, and the only a few major syntactic differences (e.g. ranges work differently).  We would add failable inits to the primitive datatypes like Int/String/etc to convert Python.Object values into them, and add the corresponding non-failable conversions from Python.Object to those primitives.

Overall, I think it will provide a really nice experience, and allow us to leverage the vast majority of the Python ecosystem directly in Swift code. This project would also have much more narrow impact on the Swift compiler than the ObjC importer (since it works completely differently).  For a first cut, I don’t think we would have to worry about Swift classes subclassing Python classes, for example.

-Chris





> 
> I'd appreciate a constructive discussion on how that could be changed.
> 
> While R is a non goal for obvious reasons, i'd argue that since both Scala and Python are general purpose languages, taking them head to head might be a low hanging fruit.
> 
> To make the claim I'd like to reference to projects such as 
> 
>  - Hadoop, Spark, Hive are all huge eco-systems which are entirely JVM based.
>  - Apache Parquet, a highly efficient column based storage format for big data analytics which was implemented in Java, and C++.
>  - Apache Arrow, a physical memory spec that big data systems can use to allow zero transformations on data transferred between systems. Which (for obvious reasons) focused on JVM, to C interoperability.
> 
> Python's Buffer Protocol which ensures it's predominance (for the time being) as a prime candidate for data science related projects https://jeffknupp.com/blog/2017/09/15/python-is-the-fastest-growing-programming-language-due-to-a-feature-youve-never-heard-of/ <https://jeffknupp.com/blog/2017/09/15/python-is-the-fastest-growing-programming-language-due-to-a-feature-youve-never-heard-of/>
> 
> While Swift's Memory Ownership manifesto touches similar turf discussing copy on write and optimizing memory access overhead it IMHO takes a system level perspective targeting projects such as kernel code. I'd suggest that viewing the problem from an efficient CPU/GPU data crunching machine perspective might shade a different light on the requirements and use cases. 
> 
> 
> I'd be happy to learn more, and have a constructive discussion on the subject.
> 
> 
> Thank you,
> Max.
>  
> 
> -- 
> puıɯ ʎɯ ɯoɹɟ ʇuǝs
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171028/d1ff89d1/attachment.html>


More information about the swift-evolution mailing list