-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Support] Request: Ability to Invoke Scala Code for Operations in oneTable #353
Comments
Do you mean you want to call the OneTable classes directly from scala? I've done something similar locally in the docker demo with the notebook but will need to look into how to get the same jars available for Glue. |
I am interested in running oneTable on Glue for my synchronization process. Specifically, I am looking for a method to translate metadata efficiently. I was hoping there might be a solution to invoke Scala code and utilize jar files within Glue to accomplish this task. By enabling such functionality, customers would find it significantly easier to schedule their jobs, eliminating the need for manual execution via shell commands. Thank you for considering this request. Please let me know if there are any possibilities or recommendations regarding this matter. |
I still don't really understand the request. There is functionality to add jars to your glue jobs with We also have a dockerized demo you can run which has a scala notebook https://github.com/apache/incubator-xtable/tree/main/demo |
Hello @the-other-tim-brown, Thank you for your response. I understand that we can add Jar files, but I was wondering if you have any examples of glue code with one table that you could share? Additionally, I'm curious if there's a method to invoke this in PySpark. We have numerous jobs running on Glue, primarily in Python. Do you happen to have any examples with PySpark? If not, perhaps it would be beneficial to consider adding some examples to the website, specifically onetable.dev. Thank you for your assistance. |
I have not worked with Glue before. Can you help me understand where it differs from regular Scala code? There is no support for python at this time but that is tracked in this issue: #253 |
Hello @the-other-tim-brown I'm curious about the Python wrapper 253 Ticket. Do you think it would enable us to utilize Onetable in PySpark? I haven't delved deeply into Scala and Glue, but I believe we can collaborate effectively here by providing some examples. This could help spread awareness among companies about leveraging Onetable on AWS. Many customers are eager to utilize Onetable on AWS Glue. Perhaps during our free time, we could collaborate and get some examples up and running. |
Hey @the-other-tim-brown I started some work with glue and onetable on delta streamer here is what I am doing and I know 99% its jar issue Step 1: Upload jar to S3
Step 2: Upload Sample dataset inside test folderLink. ; https://drive.google.com/drive/folders/1BwNEK649hErbsWcYLZhqCWnaXFX3mIsg?usp=share_link Stop 3: Create Glue job with Delta Streamer and onetable
Glue side make to to add all jar as shown in image aboveHudi Tables is created I think it fails sync tool same error from past which mean mostly Jar issue Error
|
I think as sagar was saying there is NULL pointer exception hmm |
For Glue Scala should I create separate Ticket or this thread is good ? |
This ticket is fine. There is no NullPointerException thrown here. If you inspect the stacktrace you'll see a |
I've encountered a slight challenge with our AWS setup. While AWS utilizes Spark 3.3-amzn, my local environment runs Spark 3.4. My jar files are built for Spark 3.4. Could you advise on the best approach for building the jar files to ensure compatibility with Spark 3.3-amzn on AWS? |
let me try this today not sure if works |
didn't work by the way how would I build for specific spark version @the-other-tim-brown or @sagarlakshmipathy shouldn't the jar that I build locally should work on Glue ? |
You can change the versions used in the |
Guess what it works for ICEBERG failed for Delta |
is this a bug then for delta not sure why it failed for delta ? |
The "parent pom" as it is sometimes referred is where we keep all the version information: https://github.com/apache/incubator-xtable/blob/main/pom.xml#L29 Updating here will set the versions in the If the delta error is still a class not found or something related to the catalyst package, it is going to be some packaging error. This is something we'll have to solve in the future. Ideally we would be able to use the new Delta Kernel library. I tried using their "standalone java" library in the past but it was missing some necessary features for partitioning. |
Thank you very much. I wholeheartedly agree that addressing this issue in the future is crucial. From my testing, it appears that the functionality of onetable is working as expected, which is promising. It would be tremendously helpful to complement this with some Python examples. I'm aware that there is an open ticket for Python-related tasks wanted to say thank you for all help |
Hello,
I hope this message finds you well.
I'm currently in the midst of experimenting with oneTable and conducting some proof of concept work. As I delve deeper into this, I've encountered a few questions, and I was hoping to seek some guidance. Please forgive me if some of these questions seem basic; I'm earnestly striving to expand my knowledge about oneTable.
One of the queries I have pertains to the invocation of Scala code instead of the following command:
The reason behind this inquiry is that, during regular write operations, I typically utilize Scala code to execute tasks on AWS Glue for table synchronization.
I would greatly appreciate any insights or suggestions you might have regarding this matter.
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered: