Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while creating Hudi format from source Iceberg table using Hive Catalog . #443

Open
3 of 4 tasks
amnchauhan opened this issue May 22, 2024 · 26 comments
Open
3 of 4 tasks
Labels
bug Something isn't working

Comments

@amnchauhan
Copy link

amnchauhan commented May 22, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

Hi all Iam trying to convert my iceberg table which is using hive catalog to hudi format as target but I am getting below error while configuring catalog as --icebergCatalogConfig option. catalog config file is:
catalogImpl: org.apache.iceberg.hive.HiveCatalog
catalogName: prod_iceberg
catalogOptions:
uri: thrift metastore uri
warehouse: s3a://prod_iceberg/warehouse
But if i use hadoopCatalog and pass catalogImpl as org.apache.iceberg.hadoop.HadoopCatalog iam getting no such error , if there anything we need to configure for hive Catalog.
Attaching Screenshot for your reference.

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

@amnchauhan amnchauhan added the bug Something isn't working label May 22, 2024
@the-other-tim-brown
Copy link
Contributor

@amnchauhan can you try adding the iceberg-hive-runtime jar to your classpath? https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-hive-runtime/1.4.2

@amnchauhan
Copy link
Author

amnchauhan commented May 25, 2024

@amnchauhan can you try adding the iceberg-hive-runtime jar to your classpath? https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-hive-runtime/1.4.2

@the-other-tim-brown after adding iceberg-hive-runtime-1.4.2.jar as 1. java -cp ./path/to/iceberg-hive-runtime-1.4.2.jar mainClass --icebergCatalogConfig icebergCatalog.yaml and 2. java -cp ./path/to/iceberg-hive-runtime-1.4.2.jar -jar xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --icebergCatalogConfig icebergCatalog.yaml issue is still not resolved and getting same error . Also tried adding Class-Path attribute in manifest file but still same error . Please let me know if I'm missing anything.

@the-other-tim-brown
Copy link
Contributor

@amnchauhan I think I was looking at the wrong module, can you try this one instead? https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-hive-metastore/1.4.2 to the classpath

@amnchauhan
Copy link
Author

@amnchauhan I think I was looking at the wrong module, can you try this one instead? https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-hive-metastore/1.4.2 to the classpath

@the-other-tim-brown i have tried with iceberg-hive-metastore jar still getting the same error.

@the-other-tim-brown
Copy link
Contributor

Can you paste the stacktraces so they are easier to inspect and copy locally? The class in the screenshot looks like the one in the iceberg jars.

@amnchauhan
Copy link
Author

amnchauhan commented May 27, 2024

@the-other-tim-brown
[root incubator-xtable-main]# java -cp ./lib/iceberg-spark-runtime-3.2_2.12-1.4.2.jar:iceberg-hive-metastore-1.4.2.jar -jar /home/incubator-xtable-main/utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig /home/incubator-xtable-main/my_config.yaml --icebergCatalogConfig /home/incubator-xtable-main/icebergCatalogConfig.yaml --hadoopConfig /home//incubator-xtable-main/my_hadoop_conf.xml 2024-05-27 21:51:59 INFO org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3a://spark-iceberg/streaming_catalog/streaming_iceberg_pp for following table formats [HUDI] 2024-05-27 21:51:59 INFO org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3a://spark-iceberg/streaming_catalog/streaming_iceberg_pp/data 2024-05-27 21:51:59 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-05-27 21:52:00 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 2024-05-27 21:52:01 INFO org.apache.xtable.hudi.HudiTableManager:73 - Hudi table does not exist, will be created on first sync 2024-05-27 21:52:01 INFO org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync. 2024-05-27 21:52:01 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3a://spark-iceberg/streaming_catalog/streaming_iceberg_pp java.lang.IllegalArgumentException: Cannot initialize Catalog implementation org.apache.iceberg.hive.HiveCatalog: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog Missing org.apache.iceberg.hive.HiveCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.hive.HiveCatalog] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:224) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.lambda$getCatalog$6(IcebergTableManager.java:116) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[?:1.8.0_412] at org.apache.xtable.iceberg.IcebergTableManager.getCatalog(IcebergTableManager.java:113) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:56) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] Caused by: java.lang.NoSuchMethodException: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog Missing org.apache.iceberg.hive.HiveCatalog [java.lang.ClassNotFoundException: org.apache.iceberg.hive.HiveCatalog] at org.apache.iceberg.common.DynConstructors.buildCheckedException(DynConstructors.java:250) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.common.DynConstructors.access$200(DynConstructors.java:32) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.common.DynConstructors$Builder.buildChecked(DynConstructors.java:220) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:221) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] ... 11 more Suppressed: java.lang.ClassNotFoundException: org.apache.iceberg.hive.HiveCatalog at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_412] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_412] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_412] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_412] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_412] at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_412] at org.apache.iceberg.common.DynConstructors$Builder.impl(DynConstructors.java:149) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:221) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.lambda$getCatalog$6(IcebergTableManager.java:116) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[?:1.8.0_412] at org.apache.xtable.iceberg.IcebergTableManager.getCatalog(IcebergTableManager.java:113) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:56) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]

my_hadoop_conf.yaml configured as -

<?xml version="1.0" encoding="UTF-8"?>
<!--
       ~ Licensed to the Apache Software Foundation (ASF) under one or more
  ~ contributor license agreements.  See the NOTICE file distributed with
  ~ this work for additional information regarding copyright ownership.
  ~ The ASF licenses this file to You under the Apache License, Version 2.0
  ~ (the "License"); you may not use this file except in compliance with
  ~ the License.  You may obtain a copy of the License at
  ~
  ~     http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing, software
  ~ distributed under the License is distributed on an "AS IS" BASIS,
  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  ~ See the License for the specific language governing permissions and
  ~ limitations under the License.
-->
<configuration>

  <!-- Default file system for AWS S3/S3A scheme, s3:// -->
  <property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>
  <property>
    <name>fs.s3a.aws.credentials.provider</name>
    <value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>
  </property>
  <property>
    <name>fs.s3a.access.key</name>
    <value>####</value>
  </property>
  <property>
    <name>fs.s3a.secret.key</name>
    <value>####</value>
  </property>
  <property>
    <name>fs.s3a.endpoint</name>
    <value>####</value>
  </property>
  <property>
    <name>fs.s3a.path.style.access</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.s3a.ssl.channel.mode</name>
    <value>default_jsse_with_gcm</value>
  </property>
  <property>
    <name>fs.s3a.connection.ssl.enabled</name>
    <value>true</value>
  </property>

icebergCatalogConfig.yaml configured as-

catalogImpl: org.apache.iceberg.hive.HiveCatalog
catalogName: iceberg
catalogOptions:
warehouse: s3a://iceberg/
uri: thrift://hive-metastore.hive-metastore:9083

@the-other-tim-brown
Copy link
Contributor

@amnchauhan it looks like -cp and -jar do not work properly together, can you try with just -cp? You may want to consider building a bundled jar with all of the dependencies you require.

@amnchauhan
Copy link
Author

amnchauhan commented May 28, 2024

@amnchauhan it looks like -cp and -jar do not work properly together, can you try with just -cp? You may want to consider building a bundled jar with all of the dependencies you require.

@the-other-tim-brown i have created bundled jar with by adding dependencies in pom.xml and also tried adding all dependencies in my classpath as below but still issue persist.

[root@incubator-xtable-main]# java -cp ./xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:/lib/iceberg-hive-metastore-1.4.2.jar:/lib/iceberg-hive-runtime-1.4.2.jar org.apache.xtable.utilities.RunSync -d my_config.yaml -i icebergCatalogConfig.yaml --hadoopConfig my_hadoop_conf.xml 
2024-05-28 17:56:02 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3a://spark-iceberg-test/streaming_test_catalog/streaming_iceberg_test_pp for following table formats [HUDI]
2024-05-28 17:56:02 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3a://spark-iceberg-test/streaming_test_catalog/streaming_iceberg_test_pp/data
2024-05-28 17:56:02 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-05-28 17:56:02 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-05-28 17:56:03 INFO  org.apache.xtable.hudi.HudiTableManager:73 - Hudi table does not exist, will be created on first sync
2024-05-28 17:56:03 INFO  org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-05-28 17:56:03 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3a://spark-iceberg-test/streaming_test_catalog/streaming_iceberg_test_pp
java.lang.IllegalArgumentException: Cannot initialize Catalog implementation org.apache.iceberg.hive.HiveCatalog: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog
        Missing org.apache.iceberg.hive.HiveCatalog [java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/UnknownDBException]
        at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:224) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergTableManager.lambda$getCatalog$6(IcebergTableManager.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[?:1.8.0_412]
        at org.apache.xtable.iceberg.IcebergTableManager.getCatalog(IcebergTableManager.java:113) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:56) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
Caused by: java.lang.NoSuchMethodException: Cannot find constructor for interface org.apache.iceberg.catalog.Catalog
        Missing org.apache.iceberg.hive.HiveCatalog [java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/UnknownDBException]
        at org.apache.iceberg.common.DynConstructors.buildCheckedException(DynConstructors.java:250) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.common.DynConstructors.access$200(DynConstructors.java:32) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.common.DynConstructors$Builder.buildChecked(DynConstructors.java:220) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:221) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        ... 11 more
        Suppressed: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/UnknownDBException
                at java.lang.Class.forName0(Native Method) ~[?:1.8.0_412]
                at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_412]
                at org.apache.iceberg.common.DynConstructors$Builder.impl(DynConstructors.java:149) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:221) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.iceberg.IcebergTableManager.lambda$getCatalog$6(IcebergTableManager.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660) ~[?:1.8.0_412]
                at org.apache.xtable.iceberg.IcebergTableManager.getCatalog(IcebergTableManager.java:113) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:56) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
                at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.api.UnknownDBException
                at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_412]
                at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_412]
                at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) ~[?:1.8.0_412]
                at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_412]
                ... 15 more

@the-other-tim-brown
Copy link
Contributor

@amnchauhan I will create a sample fork this weekend that you can use, apologies for the delay

@the-other-tim-brown
Copy link
Contributor

@amnchauhan check out this branch #456

@amnchauhan
Copy link
Author

amnchauhan commented Jun 3, 2024

Hi @the-other-tim-brown thanks for your update , after testing with #456 i think there's some dependency error attaching stacktrace file for your reference.
stacktrace.txt

@alberttwong
Copy link
Contributor

alberttwong commented Jun 4, 2024

This isn't exactly the same. I'm using xtable to convert Hudi to Iceberg but you should able to adopt it. #459 and https://github.com/alberttwong/incubator-xtable/tree/main/demo-s3

I use pyspark --package (which is an Apache Ivy client) to download all the correct java dependancies and then add the ivy classpath to my shell to run the java app.

@the-other-tim-brown
Copy link
Contributor

the-other-tim-brown commented Jun 6, 2024

Hi @the-other-tim-brown thanks for your update , after testing with #456 i think there's some dependency error attaching stacktrace file for your reference. stacktrace.txt

@amnchauhan running mvn -pl xtable-utilities dependency:tree shows that the dependency is there org.datanucleus:datanucleus-api-jdo:jar:4.2.4:compile. I have pushed a small update to my branch to include <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> so that the META-INF/services are properly included in the jar. I noticed that they were not when inspecting locally and it looks like that is required for detected which implementation to instantiate.

@amnchauhan
Copy link
Author

Hi @the-other-tim-brown thanks for your update , after testing with #456 i think there's some dependency error attaching stacktrace file for your reference. stacktrace.txt

@amnchauhan running mvn -pl xtable-utilities dependency:tree shows that the dependency is there org.datanucleus:datanucleus-api-jdo:jar:4.2.4:compile. I have pushed a small update to my branch to include <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> so that the META-INF/services are properly included in the jar. I noticed that they were not when inspecting locally and it looks like that is required for detected which implementation to instantiate.

hi @the-other-tim-brown i have tested and getting exactly same error attaching stacktrace file for your reference.
stacktrace (1).txt

@the-other-tim-brown
Copy link
Contributor

@amnchauhan I am unable to reproduce this error locally. I am running the docker demo in the repo to spin up HMS and then running with the shaded jar built off of my branch. I am not sure how to trigger this JDO related path. Is there anything special I need to do when configuring HMS?

@amnchauhan
Copy link
Author

@amnchauhan I am unable to reproduce this error locally. I am running the docker demo in the repo to spin up HMS and then running with the shaded jar built off of my branch. I am not sure how to trigger this JDO related path. Is there anything special I need to do when configuring HMS?

hi @the-other-tim-brown can you please share your metastore-site.xml or hive-site.xml if possible ?

@the-other-tim-brown
Copy link
Contributor

@amnchauhan I am unable to reproduce this error locally. I am running the docker demo in the repo to spin up HMS and then running with the shaded jar built off of my branch. I am not sure how to trigger this JDO related path. Is there anything special I need to do when configuring HMS?

hi @the-other-tim-brown can you please share your metastore-site.xml or hive-site.xml if possible ?

@amnchauhan you can find the demo here: https://github.com/apache/incubator-xtable/blob/main/demo/docker-compose.yaml - I am not sure where to find the other information but it must be some defaults

@lordicecream
Copy link

I am also facing this exact same issue, @the-other-tim-brown in the above docker-compose file, the hive metastore version seems to be 4.0.0, is there some chance that the hive metstore version could be causing some mismatch? I am using 3.1.0

@the-other-tim-brown
Copy link
Contributor

I am also facing this exact same issue, @the-other-tim-brown in the above docker-compose file, the hive metastore version seems to be 4.0.0, is there some chance that the hive metstore version could be causing some mismatch? I am using 3.1.0

@lordicecream or @amnchauhan do you have any details for how I can setup a local version of HMS to simulate your environments? I am not well versed in the Iceberg and Hive code so I will need to iterate a lot to try to solve this.

@lordicecream
Copy link

@the-other-tim-brown I am using the standard apache hive metastore docker image with v 3.1.0 backed by postgres and added some certs for connecting to my S3 compatible storage.
For my other hive connectors like my query engines, I can just configure the correct url and have not seen any issues across the board.
Like @amnchauhan , I have also been able to convert hudi to iceberg..but when the source is iceberg maybe there is some fundamental lack of clarity how iceberg connect to the hive metastore and if any extra configs need to be added.

@lordicecream
Copy link

@the-other-tim-brown do you need any more details? Please let me know

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown do you need any more details? Please let me know

@lordicecream can you provide your local docker setup or some build file? Anything that can speed up setup will help. Also let me know what you have tried with respect to the packaging within your projects so I am not going down a path that has already been tried.

@lordicecream
Copy link

lordicecream commented Jul 12, 2024

HMS-setup.txt
@the-other-tim-brown I have a k8s cluster in which have deployed Hive metastore using this https://github.com/naushadh/hive-metastore/blob/main/Dockerfile
On top of this image added s3 certs and deployed HMS using configs that I have attached as a k8s deployment.
then I am connecting to this HMS and s3 using query engines like spark,
Please let me know if any additional details required

@amnchauhan
Copy link
Author

amnchauhan commented Jul 12, 2024

HMS-setup.txt @the-other-tim-brown I have a k8s cluster in which have deployed Hive metastore using this https://github.com/naushadh/hive-metastore/blob/main/Dockerfile On top of this image added s3 certs and deployed HMS using configs that I have attached as a k8s deployment. then I am connecting to this HMS and s3 using query engines like spark, Please let me know if any additional details required
@the-other-tim-brown I'm also using similar setup for HMS , so basically I'm reading data from Kafka and writing it into an S3 MinIO bucket as Iceberg table using Spark Streaming , for this I'm using iceberg runtime jar ( iceberg-spark-runtime-3.2_2.12-1.4.3.jar ) with Spark version 3.2.4 . From Spark itself I'm connecting to HMS which is also running in k8s cluster same as @lordicecream with similar configuration.

@amnchauhan
Copy link
Author

amnchauhan commented Jul 19, 2024

HMS-setup.txt @the-other-tim-brown I have a k8s cluster in which have deployed Hive metastore using this https://github.com/naushadh/hive-metastore/blob/main/Dockerfile On top of this image added s3 certs and deployed HMS using configs that I have attached as a k8s deployment. then I am connecting to this HMS and s3 using query engines like spark, Please let me know if any additional details required
@the-other-tim-brown I'm also using similar setup for HMS , so basically I'm reading data from Kafka and writing it into an S3 MinIO bucket as Iceberg table using Spark Streaming , for this I'm using iceberg runtime jar ( iceberg-spark-runtime-3.2_2.12-1.4.3.jar ) with Spark version 3.2.4 . From Spark itself I'm connecting to HMS which is also running in k8s cluster same as @lordicecream with similar configuration.

@the-other-tim-brown I am able to resolve above issue by adding following Jars into classpath as
java -cp "./avro-1.11.3.jar:./datanucleus-core-4.1.17.jar:./datanucleus-api-jdo-4.2.4.jar:./datanucleus-enhancer-3.1.1.jar:./datanucleus-rdbms-4.1.19.jar:./javax.jdo-3.2.0-m3.jar:./asm-3.0.jar:./jdo-api-3.0.1.jar:./hive-exec-3.1.0.jar:./xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar" org.apache.xtable.utilities.RunSync -d my_config.yaml -i icebergCatalogConfig.yaml -p my_hadoop_conf.xml .
After running this I am able to see .hoodie folder but problem is when I am trying to query this hudi table using Trino iam getting no data but same I am able to query from spark, is there any additional configuration we need to set for Trino ?

@the-other-tim-brown
Copy link
Contributor

Thanks for the update @amnchauhan, there is another issue with querying from Trino since querying the Hudi table requires the Hudi Metadata Table (MDT) on the read side and that is not working: #460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants