Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support storage options for spark read and write #2990

Merged
merged 4 commits into from
Oct 11, 2024

Conversation

SaintBacchus
Copy link
Contributor

@SaintBacchus SaintBacchus commented Oct 9, 2024

#2854
Set storage options configuration in spark-default

spark.sql.catalog.lance com.lancedb.lance.spark.LanceCatalog
spark.sql.catalog.lance.access_key_id AKAKAKAKAKAKAKAKAKAKAKAKAKAKAKAKAKAKAKAK
spark.sql.catalog.lance.secret_access_key SKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSK
spark.sql.catalog.lance.aws_region region
spark.sql.catalog.lance.aws_endpoint https://Endpoint
spark.sql.catalog.lance.virtual_hosted_style_request true

Then you can read and write lance file for object store.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2)
).toDF("name", "id")

df.write.format("lance").option("path", "s3://test/lance/demo.lance").save()


val data = spark.read.format("lance").option("path", "s3://test/lance/demo.lance").load();

data.show(10)

@github-actions github-actions bot added enhancement New feature or request java labels Oct 9, 2024
Copy link

github-actions bot commented Oct 9, 2024

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@SaintBacchus SaintBacchus changed the title feat: Support storage options for spark read and write feat: support storage options for spark read and write Oct 9, 2024
@LuQQiu
Copy link
Collaborator

LuQQiu commented Oct 9, 2024

@SaintBacchus thanks for contributing to the Java API! Awesome! Could you help fix the "cargo clippy --all-targets -- -D warnings" errors shown in the test https://github.com/lancedb/lance/actions/runs/11254749789/job/31292838444?pr=2990

import java.util.HashMap;
import java.util.Map;

public class SparkOptions {
Copy link
Collaborator

@LuQQiu LuQQiu Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put SparkOptions inside LanceConfig?
LanceConfig is expected to be the centralized place of putting Spark related configuration.
It can have Optional of ReadOptions and Optional of WriteParams?

@SaintBacchus
Copy link
Contributor Author

SaintBacchus commented Oct 10, 2024

OK, I will fix the ut later.

Copy link
Collaborator

@LuQQiu LuQQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LuQQiu LuQQiu merged commit 0b9840a into lancedb:main Oct 11, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request java
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants