Support `JavaRDD<T>` #3

dfdx · 2015-05-24T23:09:52Z

It's a kind of a feature request/discussion issue. Currently the only way to create new RDD is to call parallelize(), which boxes every value in collection into JuliaObject and returns corresponding JavaRDD<JuliaObject>. In practice, however, we will need to deal with custom RDDs, i.e. JavaRDD<T>.

Simplest way to deal with it is to restrict T to be either byte array, or, as a special case, string. This will enable us to call things like textFile and get RDD{String}, which is already enough for real applications.

More interesting and tricky way is to support custom serializers / deserializers. Say, we can request interested users to implement some kind of JuliaSerializer<T> which will transform T into byte array on Java side and corresponding convert method that will construct corresponding object on Julia side.

I'm currently looking at PySpark implementation of their default AutoBatchedSerializer, but any ideas are welcome.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `JavaRDD<T>` #3

Support `JavaRDD<T>` #3

dfdx commented May 24, 2015

Support JavaRDD<T> #3

Support JavaRDD<T> #3

Comments

dfdx commented May 24, 2015

Support `JavaRDD<T>` #3

Support `JavaRDD<T>` #3