<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">If you’re going to build on top of something like Spark it seems you’d have better luck wrapping the JNI and using Swift protocols to try to automate away as much of the boilerplate of creating JNI classes dynamically. </div><div class=""><br class=""></div><div class="">~Robert Widmann</div><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 18, 2016, at 9:40 AM, Robert Goodman via swift-evolution <<a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="socmaildefaultfont" dir="ltr" style="font-family:"Helvetica Neue", Helvetica, Arial, sans-serif;font-size:10.5pt"><div class="socmaildefaultfont" dir="ltr" style="font-family:"Helvetica Neue", Helvetica, Arial, sans-serif;font-size:10.5pt"><div dir="ltr" class=""> </div>
<div dir="ltr" class=""><div class="">I know that there has been some discussion around improving reflection in Swift and I wanted to add to the discussion with some of the work I have been trying to do using the Swift Language. I have been investigating using Swift to create a framework that provides a programming API to process data and execute functions in parallel on a cluster. The framework needs to be able instantiate these functions on the cluster workers and have the data processed by the functions. The plans are to use one of the existing cluster managers, such as Spark or Storm. As of today, I have been looking at using Spark. There would be a predefined set of functions supported such as map, fliter, join, etc. as defined by the cluster manager.</div>
<div class="">In my experimenting, I have run into a number of issues which I haven't been able to solve due to the limited support for reflection in Swift. In my description of the issues, I'm going to use APIs based on Spark since that is the cluster manager I have been playing with.</div>
<div class=""> </div>
<div class="">Parameter and Return types</div>
<div class=""> </div>
<div class="">The following is an example of a Swift class that maps to the RDD class in Spark.</div>
<div class=""> </div>
<div class="">public class RDD <T> {<br class=""> public func collect() throws -> [T] {<br class=""> ....<br class=""> }<br class="">}</div>
<div class=""> </div>
<div class="">The value of T could be any basic type to a class. Even if the types are limited to basic types and known Spark types, the list of possibilities is large. From one of the Spark examples, T would be</div>
<div class=""> </div>
<div class=""> Tuple2<Int32, Tuple3<Int32, Int32, Double>></div>
<div class=""> </div>
<div class="">The possible combinations of types is too large to be hard coded given Spark supports Tuples with 22 different types. I can get the type of T in a string, but haven't found a way to instantiate the type using the string. Is there some way around this problem?</div>
<div class=""><br class="">User-Defined Functions</div>
<div class=""> </div>
<div class="">A programmer would define functions that will be executed on a cluster to process data. The programmer doesn't need to do special packaging of functions that run on a cluster. The programmer would code a filter function against the cluster the same way as the filter function for a Swift array. For instance, for a filter method such as the following:</div>
<div class=""> </div>
<div class="">let result = RDD.filter({ (value) -> Bool in<br class=""> return value > 15<br class="">})</div>
<div class=""> </div>
<div class="">The framework would need to be able to do reflection on the function to get the information needed to instantiate and call the function on the cluster workers. Following is some of the information needed:</div>
<div class=""> </div>
<div class=""> Module name<br class=""> Class/Struct name<br class=""> Function name<br class=""> Parameter names and type information<br class=""> <br class="">Once on the cluster the framework would need to do the following:</div>
<div class=""> </div>
<div class=""> 1. Instantiate the parameters. Again, a parameter could be a basic type to a class.<br class=""> 2. Dynamically load/import the module containing the function.<br class=""> 3. Find the function in the module that matches the signature.<br class=""> 4. Call the function.<br class=""> 5. Handle the return type.</div>
<div class=""> </div>
<div class="">With the existing Swift support for reflection, I couldn't get all of the information that is needed and what information I could get wasn't in a very convenient form. In some cases, I needed to parse a string to get the different parameter types. Even if I had the information, I didn't see a way to use the information to load the module and execute the function. My plans are to require the programmer to pass the location of modules and dependencies that need to be deployed to the cluster workers on application startup. Given the limitations of reflections in Swift, I don't see how this framework could be implemented. Since this needs to run on Linux, I want to avoid any solution that uses Objective C.</div></div>
<div dir="ltr" class=""> </div>
<div class="mail-signature-container" dir="ltr"> Thanks</div>
<div class="mail-signature-container" dir="ltr"> Bob<br class=""><br class="">Robert Goodman</div></div></div><br class="">
_______________________________________________<br class="">swift-evolution mailing list<br class=""><a href="mailto:swift-evolution@swift.org" class="">swift-evolution@swift.org</a><br class="">https://lists.swift.org/mailman/listinfo/swift-evolution<br class=""></div></blockquote></div><br class=""></body></html>