[swift-evolution] [Discussion] Swift for Data Science / ML / Big Data analytics

Maxim Veksler maxim at vekslers.org
Sat Oct 28 11:44:51 CDT 2017


Hey Guys,

The big data and machine learning world is dominated by Python, Scala an R.

I'm a Swifter by heart, but not so much by tools of trait.

I'd appreciate a constructive discussion on how that could be changed.

While R is a non goal for obvious reasons, i'd argue that since both Scala
and Python are general purpose languages, taking them head to head might be
a low hanging fruit.

To make the claim I'd like to reference to projects such as

 - Hadoop, Spark, Hive are all huge eco-systems which are entirely JVM
based.
 - Apache Parquet, a highly efficient column based storage format for big
data analytics which was implemented in Java, and C++.
 - Apache Arrow, a physical memory spec that big data systems can use to
allow zero transformations on data transferred between systems. Which (for
obvious reasons) focused on JVM, to C interoperability.

Python's Buffer Protocol which ensures it's predominance (for the time
being) as a prime candidate for data science related projects
https://jeffknupp.com/blog/2017/09/15/python-is-the-
fastest-growing-programming-language-due-to-a-feature-youve-never-heard-of/

While Swift's Memory Ownership manifesto touches similar turf discussing
copy on write and optimizing memory access overhead it IMHO takes a system
level perspective targeting projects such as kernel code. I'd suggest that
viewing the problem from an efficient CPU/GPU data crunching machine
perspective might shade a different light on the requirements and use
cases.


I'd be happy to learn more, and have a constructive discussion on the
subject.


Thank you,
Max.


-- 
puıɯ ʎɯ ɯoɹɟ ʇuǝs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171028/6e4daf93/attachment.html>


More information about the swift-evolution mailing list