Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paths field to bundle sync configuration #1694

Merged
merged 20 commits into from
Aug 21, 2024
Merged

Add paths field to bundle sync configuration #1694

merged 20 commits into from
Aug 21, 2024

Conversation

pietern
Copy link
Contributor

@pietern pietern commented Aug 19, 2024

Changes

This field allows a user to configure paths to synchronize to the workspace.

Allowed values are relative paths to files and directories anchored at the directory where the field is set. If one or more values traverse up the directory tree (to an ancestor of the bundle root directory), the CLI will dynamically determine the root path to use to ensure that the file tree structure remains intact.

For example, given a databricks.yml in my_bundle that includes:

sync:
  paths:
    - ../common
    - .

Then upon synchronization, the workspace will look like:

.
├── common
│   └── lib.py
└── my_bundle
    ├── databricks.yml
    └── notebook.py

If not set behavior remains identical.

Tests

  • Newly added unit tests for the mutators and under bundle/tests.
  • Manually confirmed a bundle without this configuration works the same.
  • Manually confirmed a bundle with this configuration works.

This field allows a user to configure paths to synchronize to the workspace.

Allowed values are relative paths to files and directories, anchored at the
directory where the field is set. If one or more values traverse up the
directory tree (to an ancestor of the bundle root directory), the CLI will
dynamically figure out the root path to use to ensure that the file tree
structure remains intact.

For example, given a `databricks.yml` in `my_bundle` that includes:

```yaml
sync:
  paths:
    - ../common
    - .
```

Then upon synchronization, the workspace will look like:
```
.
├── common
│   └── lib.py
└── my_bundle
    ├── databricks.yml
    └── notebook.py
```

If not set behavior remains identical.
Base automatically changed from sync-paths to main August 19, 2024 15:47
@pietern
Copy link
Contributor Author

pietern commented Aug 19, 2024

#1695 needs to merge before this one; I broke it out of this one to keep this one focused.

@pietern pietern marked this pull request as ready for review August 19, 2024 16:02
// are synchronized to the workspace. It can be an ancestor to [BundleRoot],
// but not a descendant; that is, [SyncRoot] must contain [BundleRoot].
SyncRoot vfs.Path
SyncRootPath string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncRootPath is the remote path were files will be synced to, correct? The naming seems to be a bit confusing when I read the rest of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to clarify; this is the local path to the sync root (the same as SyncRoot.Native()).

// If the path does not exist, it returns an empty string.
//
// See "sync_infer_root_internal_test.go" for examples.
func (m *syncInferRoot) computeRoot(path string, root string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly it finds a path to which both path and root belongs, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, would it be simpler to make both paths absolute, find a common prefix path and then make it relative again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. It finds the longest prefix of root that contains path.

Your suggestion might work, but it will break if path traverses through the root dir (i.e. ../../.. on /tmp). This case is handled correctly here (as in, it fails and returns an empty string), because it deals with path components one-by-one.

Separately, we still allow the bundle root path to be relative or absolute and we don't force it to be absolute anywhere. We could consider doing this but we'd need to pay attention to what happens on the presentation side of errors and warnings. It is possible we'd all of a sudden show full paths where today we show just databricks.yml.

@pietern
Copy link
Contributor Author

pietern commented Aug 21, 2024

Need to take a look at:

  • Trampoline code
  • Metadata computation

@pietern
Copy link
Contributor Author

pietern commented Aug 21, 2024

After discussing the metadata computation with @shreyas-goenka, we decided to back out the last commit that changes metadata computation to be relative to the sync root because it turns out the relative job path is relative to the bundle root path in the Git section of the metadata (this is the relative path of the bundle root inside the Git repository).

@pietern pietern added this pull request to the merge queue Aug 21, 2024
Merged via the queue into main with commit 6e8cd83 Aug 21, 2024
5 checks passed
@pietern pietern deleted the bundle-sync-paths branch August 21, 2024 15:41
andrewnester added a commit that referenced this pull request Aug 21, 2024
CLI:
 * Added filtering flags for cluster list commands ([#1703](#1703)).

Bundles:
 * Remove reference to "dbt" in the default-sql template ([#1696](#1696)).
 * Pause continuous pipelines when 'mode: development' is used ([#1590](#1590)).
 * Add configurable presets for name prefixes, tags, etc. ([#1490](#1490)).
 * Report all empty resources present in error diagnostic ([#1685](#1685)).
 * Improves detection of PyPI package names in environment dependencies ([#1699](#1699)).
 * [DAB] Add support for requirements libraries in Job Tasks ([#1543](#1543)).
 * Add paths field to bundle sync configuration ([#1694](#1694)).

Internal:
 * Add `import` option for PyDABs ([#1693](#1693)).
 * Make fileset take optional list of paths to list ([#1684](#1684)).
 * Pass through paths argument to libs/sync ([#1689](#1689)).
 * Correctly mark package names with versions as remote libraries ([#1697](#1697)).
 * Share test initializer in common helper function ([#1695](#1695)).
 * Make `pydabs/venv_path` optional ([#1687](#1687)).
 * Use API mocks for duplicate path errors in workspace files extensions client ([#1690](#1690)).
 * Fix prefix preset used for UC schemas ([#1704](#1704)).
github-merge-queue bot pushed a commit that referenced this pull request Aug 22, 2024
CLI:
* Added filtering flags for cluster list commands
([#1703](#1703)).

Bundles:
* Remove reference to "dbt" in the default-sql template
([#1696](#1696)).
* Pause continuous pipelines when 'mode: development' is used
([#1590](#1590)).
* Add configurable presets for name prefixes, tags, etc.
([#1490](#1490)).
* Report all empty resources present in error diagnostic
([#1685](#1685)).
* Improves detection of PyPI package names in environment dependencies
([#1699](#1699)).
* [DAB] Add support for requirements libraries in Job Tasks
([#1543](#1543)).
* Add paths field to bundle sync configuration
([#1694](#1694)).

Internal:
* Add `import` option for PyDABs
([#1693](#1693)).
* Make fileset take optional list of paths to list
([#1684](#1684)).
* Pass through paths argument to libs/sync
([#1689](#1689)).
* Correctly mark package names with versions as remote libraries
([#1697](#1697)).
* Share test initializer in common helper function
([#1695](#1695)).
* Make `pydabs/venv_path` optional
([#1687](#1687)).
* Use API mocks for duplicate path errors in workspace files extensions
client ([#1690](#1690)).
* Fix prefix preset used for UC schemas
([#1704](#1704)).
pietern added a commit that referenced this pull request Sep 9, 2024
Library glob expansion happens during deployment. Before that, all entries that
refer to local paths in resource definitions are made relative to the _sync
root_. Before #1694, they were made relative to the _bundle root_. This PR
didn't update the library glob expansion code to use the sync root path.

If you were using the sync paths setting with library globs, the CLI would fail
to expand the globs seeing as the code was using the wrong path to anchor those
globs on.

This change fixes the issue.
github-merge-queue bot pushed a commit that referenced this pull request Sep 9, 2024
## Changes

Library glob expansion happens during deployment. Before that, all
entries that refer to local paths in resource definitions are made
relative to the _sync root_. Before #1694, they were made relative to
the _bundle root_. This PR didn't update the library glob expansion code
to use the sync root path.

If you were using the sync paths setting with library globs, the CLI
would fail to expand the globs because the code was using the wrong path
to anchor those globs.

This change fixes the issue.

## Tests

Manually confirmed that this fixes the issue reported in #1755.
github-merge-queue bot pushed a commit that referenced this pull request Sep 27, 2024
## Changes

After introducing the `SyncRootPath` field on the bundle (#1694), the
previous `RootPath` became ambiguous. Does it mean the bundle root path
or the sync root path? This PR renames to field to `BundleRootPath` to
remove the ambiguity.

## Tests

n/a

---------

Co-authored-by: shreyas-goenka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Do not warn about no files to sync
2 participants