Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Generated S3 bucket paths for Spark support are valid but not unique #184

Open
ekoniec1 opened this issue Jul 2, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ekoniec1
Copy link

ekoniec1 commented Jul 2, 2024

Description

When leveraging Spark or PySpark data delivery pipelines, s3a://spark-infrastructure is referenced in generated artifacts as the S3 bucket at which the Spark SQL warehouse, event logging, and other data is stored. As S3 bucket names must be unique across all AWS accounts within an AWS partition, deploying Spark pipelines to a non local environment will consistently fail and requires developers to update all reference to use a unique bucket name. In order to mitigate bucket collisions, consider generating S3 buckets using a different naming convention, such as pre-pending the project name (i.e. s3a://my-aissemble-project-spark-infrastructure).

Steps to Reproduce

Clear, specific, and detailed steps taken to enable reproduction of the bug for investigation.

  1. Create an aiSSEMBLE project with Spark or PySpark data delivery pipelines that relies on S3
  2. Deploy to a non local environment

Expected Behavior

While reasonable to expect developers to perform some manual changes to support non-local deployment (i.e. creating sealed secrets for AWS credentials), using S3 buckets that are likely to be unique will improve deployment velocity and reduces potential sources of confusion.

Actual Behavior

Non-local deployment of Spark or PySpark data delivery pipelines that reference S3 will always fail without manual intervention.

Additional Context

N/A

@ekoniec1 ekoniec1 added the bug Something isn't working label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant