You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 30, 2019. It is now read-only.
It's a kind of a feature request/discussion issue. Currently the only way to create new RDD is to call parallelize(), which boxes every value in collection into JuliaObject and returns corresponding JavaRDD<JuliaObject>. In practice, however, we will need to deal with custom RDDs, i.e. JavaRDD<T>.
Simplest way to deal with it is to restrict T to be either byte array, or, as a special case, string. This will enable us to call things like textFile and get RDD{String}, which is already enough for real applications.
More interesting and tricky way is to support custom serializers / deserializers. Say, we can request interested users to implement some kind of JuliaSerializer<T> which will transform T into byte array on Java side and corresponding convert method that will construct corresponding object on Julia side.
I'm currently looking at PySpark implementation of their default AutoBatchedSerializer, but any ideas are welcome.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
It's a kind of a feature request/discussion issue. Currently the only way to create new RDD is to call
parallelize()
, which boxes every value in collection intoJuliaObject
and returns correspondingJavaRDD<JuliaObject>
. In practice, however, we will need to deal with custom RDDs, i.e.JavaRDD<T>
.Simplest way to deal with it is to restrict
T
to be either byte array, or, as a special case, string. This will enable us to call things liketextFile
and getRDD{String}
, which is already enough for real applications.More interesting and tricky way is to support custom serializers / deserializers. Say, we can request interested users to implement some kind of
JuliaSerializer<T>
which will transformT
into byte array on Java side and correspondingconvert
method that will construct corresponding object on Julia side.I'm currently looking at PySpark implementation of their default
AutoBatchedSerializer
, but any ideas are welcome.The text was updated successfully, but these errors were encountered: