Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when spark-sql (using hive) to create iceberg data in S3, it doesn't generate version-hint.text #464

Closed
3 of 4 tasks
alberttwong opened this issue Jun 7, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@alberttwong
Copy link
Contributor

alberttwong commented Jun 7, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Please describe the bug 🐞

Using iceberg via spark-sql normally. When I write the iceberg data, it doesn't generate version-hint.text.

related #463 (comment)

Using iceberg hive

spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.5.2,org.apache.iceberg:iceberg-aws-bundle:1.5.2,org.apache.hadoop:hadoop-client:2.10.2,com.amazonaws:aws-java-sdk-s3:1.11.271,org.apache.hadoop:hadoop-aws:2.10.2 \
    --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider \
    --conf spark.sql.defaultCatalog=iceberg \
    --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.iceberg.warehouse=s3://warehouse \
    --conf spark.sql.catalog.iceberg.type=hive
CREATE SCHEMA iceberg_db LOCATION 's3a://warehouse/';
CREATE TABLE iceberg_db.taxis 
(
  vendor_id bigint,
  trip_id bigint,
  trip_distance float,
  fare_amount double,
  store_and_fwd_flag string
)
PARTITIONED BY (vendor_id) ;
INSERT INTO iceberg_db.taxis VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');

it gives this error

root@spark:/opt/xtable/jars# export AWS_SECRET_ACCESS_KEY=password
root@spark:/opt/xtable/jars# export AWS_ACCESS_KEY_ID=admin
root@spark:/opt/xtable/jars# export ENDPOINT=http://minio:9000
root@spark:/opt/xtable/jars# export AWS_REGION=us-east-1
root@spark:/opt/xtable/jars# cd /opt/xtable/jars/; java -jar xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig xtable_iceberg.yaml -p core-site.xml
WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features.
2024-06-07 19:54:51 INFO  org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3a://warehouse/taxis for following table formats [HUDI, DELTA]
2024-06-07 19:54:51 INFO  org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3a://warehouse/taxis
2024-06-07 19:54:51 WARN  org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-06-07 19:54:51 WARN  org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-06-07 19:54:52 WARN  org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2
2024-06-07 19:54:52 INFO  org.apache.xtable.hudi.HudiTableManager:73 - Hudi table does not exist, will be created on first sync
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/xtable/jars/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-06-07 19:54:53 INFO  org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3a`
2024-06-07 19:54:53 INFO  org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty
2024-06-07 19:54:54 INFO  org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=95c3e958-7fec-4917-bb9d-28bdf4504d33] Created snapshot InitialSnapshot(path=s3a://warehouse/taxis/_delta_log, version=-1, metadata=Metadata(bd69b4e8-e7de-4df2-b8fe-6ada0e1d0cc8,null,null,Format(parquet,Map()),null,List(),Map(),Some(1717790094004)), logSegment=LogSegment(s3a://warehouse/taxis/_delta_log,-1,List(),None,-1), checksumOpt=None)
2024-06-07 19:54:54 INFO  org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-06-07 19:54:54 INFO  org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync.
2024-06-07 19:54:54 WARN  org.apache.iceberg.hadoop.HadoopTableOperations:325 - Error reading version hint file s3a://warehouse/taxis/metadata/version-hint.text
java.io.FileNotFoundException: No such file or directory: s3a://warehouse/taxis/metadata/version-hint.text
        at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3801) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3652) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5288) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$executeOpen$6(S3AFileSystem.java:1578) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.executeOpen(S3AFileSystem.java:1576) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1550) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:997) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.hadoop.HadoopTableOperations.findVersion(HadoopTableOperations.java:318) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:104) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:94) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergTableManager.lambda$getTable$1(IcebergTableManager.java:58) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at java.util.Optional.orElseGet(Unknown Source) [?:?]
        at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:58) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
2024-06-07 19:54:54 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3a://warehouse/taxis
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at location: s3a://warehouse/taxis
        at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:97) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergTableManager.lambda$getTable$1(IcebergTableManager.java:58) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at java.util.Optional.orElseGet(Unknown Source) ~[?:?]
        at org.apache.xtable.iceberg.IcebergTableManager.getTable(IcebergTableManager.java:58) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.initSourceTable(IcebergConversionSource.java:81) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.getSourceTable(IcebergConversionSource.java:60) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.iceberg.IcebergConversionSource.getCurrentSnapshot(IcebergConversionSource.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
        at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
sh-5.1# mc alias set warehouse http://minio:9000 admin password
Added `warehouse` successfully.
sh-5.1# mc ls -r warehouse
[2024-06-07 20:01:59 UTC]     0B STANDARD warehouse/taxis/_delta_log/
[2024-06-07 20:00:59 UTC] 1.5KiB STANDARD warehouse/taxis/data/vendor_id=1/00000-10-e0fc3ef1-3606-4591-bcf6-d72b25747380-0-00001.parquet
[2024-06-07 19:59:51 UTC] 1.5KiB STANDARD warehouse/taxis/data/vendor_id=1/00000-5-865f6992-e612-49a0-a8db-a27fd0f7d02a-0-00001.parquet
[2024-06-07 20:00:59 UTC] 1.5KiB STANDARD warehouse/taxis/data/vendor_id=2/00000-10-e0fc3ef1-3606-4591-bcf6-d72b25747380-0-00002.parquet
[2024-06-07 19:59:51 UTC] 1.5KiB STANDARD warehouse/taxis/data/vendor_id=2/00000-5-865f6992-e612-49a0-a8db-a27fd0f7d02a-0-00002.parquet
[2024-06-07 19:59:24 UTC] 1.4KiB STANDARD warehouse/taxis/metadata/00000-77bdf818-507f-48f1-971a-1898c294bf49.metadata.json
[2024-06-07 19:59:51 UTC] 2.4KiB STANDARD warehouse/taxis/metadata/00001-ef7e1726-2ec0-4582-bef8-9c37f96e2909.metadata.json
[2024-06-07 20:00:59 UTC] 3.4KiB STANDARD warehouse/taxis/metadata/00002-15d50d43-f5d8-4faa-b2b7-f41aca3f758f.metadata.json
[2024-06-07 19:59:51 UTC] 7.0KiB STANDARD warehouse/taxis/metadata/ae7384d8-6b4a-4fd6-bbc2-e9b621ba9e0b-m0.avro
[2024-06-07 20:00:59 UTC] 7.0KiB STANDARD warehouse/taxis/metadata/ebef7e50-73aa-4428-98aa-6ad0a8ed7802-m0.avro
[2024-06-07 19:59:51 UTC] 4.1KiB STANDARD warehouse/taxis/metadata/snap-1202211864160811787-1-ae7384d8-6b4a-4fd6-bbc2-e9b621ba9e0b.avro
[2024-06-07 20:00:59 UTC] 4.2KiB STANDARD warehouse/taxis/metadata/snap-3391991307049980362-1-ebef7e50-73aa-4428-98aa-6ad0a8ed7802.avro

Are you willing to submit PR?

  • I am willing to submit a PR!
  • I am willing to submit a PR but need help getting started!

Code of Conduct

@alberttwong alberttwong added the bug Something isn't working label Jun 7, 2024
@alberttwong
Copy link
Contributor Author

alberttwong commented Jun 7, 2024

maybe this is the hint. #431 (comment). Switching from hive to hadoop type.

@dipankarmazumdar
Copy link
Contributor

@alberttwong - yeah looks like the same thing. You use a Hive catalog to create the Iceberg table but there are no configs. So if you are not bound to hive, you can use a file system-based catalog like Hadoop in Iceberg.

@alberttwong
Copy link
Contributor Author

gist is that you have to use type=hadoop or else it won't generate the version-hint.text file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants