-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to specify extras in a python wheel installation for Databricks Asset Bundles #1602
Comments
Actually workaround 2 does not work... I tried splitting the repo into two packages, but it seems like DABs cleans up the Artifacts config:
Deploy run:
Expected behavior:
|
Thanks for reporting the issue, @aabilov-dataminr. We'll take a look at the cleanup of As for proper extras support, we'll take a look as well. If this works at the API, we should keep the extras suffix intact when we glob to find the wheel file. |
## Changes Now prepare stage which does cleanup is execute once before every build, so artifacts built into the same folder are correctly kept Fixes workaround 2 from this issue #1602 ## Tests Added unit test
@aabilov-dataminr the fix to support workaround 2 has been merged and released in 0.224.1 version, please give it a try. In the meantime, I'm verifying if Databricks backend support providing libraries with extras, I'll keep this issue updated |
We use another workaround for installing extra dependencies for our integration tests: After specifying the wheel file(s) as cluster dependency, we install the extra dependencies on runtime with a subprocess call. the test ressource config: targets:
test:
sync:
include:
- ../dist/*.whl
resources:
jobs:
integration-test:
name: integration-test
tasks:
- task_key: "main"
spark_python_task:
python_file: ${workspace.file_path}/tests/entrypoint.py
libraries:
- whl: ../dist/*.whl
...
job_clusters:
- job_cluster_key: test-cluster
new_cluster:
...
spark_env_vars:
DIST_FOLDER_PATH: ${workspace.file_path}/dist
... databricks.yml ...
artifacts:
default:
type: whl
path: .
... and the entrypoint.py file import os
import subprocess
import sys
if __name__ == "__main__":
# no bytecode io
sys.dont_write_bytecode = True
# install extra dependencies, workaround for https://github.com/databricks/cli/issues/1602
dist_folder: str = os.environ.get("DIST_FOLDER_PATH")
if dist_folder is None:
raise KeyError(
"The env variable DIST_FOLDER_PATH is not set but is needed to run the tests."
)
wheel_files = [os.path.join(dist_folder, f) for f in os.listdir(dist_folder) if f.endswith("whl")]
for wheel_file in wheel_files:
subprocess.check_call(
[sys.executable, "-m", "pip", "install", f"{wheel_file}[test]"]
)
... |
Describe the issue
When packaging a wheel in Python it's standard practice to put some libraries in extras groups. This is commonly used in GPU/ML experimentation repositories to scope dependency groups for specific use-cases or workflows.
When attempting to specify an extras group in the libraries config for a DABs project the bundle build throws an error:
Hoping that this can be resolved! The only possible workarounds are of now are very destructive to standard python packaging workflows:
If there are other/better workarounds, I'd love to hear them!
Configuration (shortened for brevity)
In
pyproject.toml
:In
databricks.yml
(shortened for brevity):Steps to reproduce the behavior
databricks bundle deploy
Expected Behavior
Instead of attempting to find a local file
./dist/*.whl[train]
the bundle should correctly identify that[train]
is an extras group and install the extras appropriately. This is standard behavior in python wheels.Actual Behavior
Bundle build fails because the wheel file can't be found.
OS and CLI version
OS X, Databricks CLI v0.219.0
Is this a regression?
No
The text was updated successfully, but these errors were encountered: