Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fail the bundle build when there is component failure #3741

Closed
wants to merge 0 commits into from

Conversation

rishabh6788
Copy link
Collaborator

Description

In the current scenario in case of any plugin build failure the complete distribution build is discarded and it keeps failing till the component owner fixes it. This can take from a few days to weeks blocking everyone else to continue with their testing and development using bundle build with their latest changes.
This PR adds support to continue the distribution build in case of any plugin failure so that a usable bundle artifact is generated with successful plugins installed.

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link

codecov bot commented Jul 14, 2023

Codecov Report

Merging #3741 (b42b182) into main (969d3ba) will decrease coverage by 0.09%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##             main    #3741      +/-   ##
==========================================
- Coverage   91.54%   91.46%   -0.09%     
==========================================
  Files         182      182              
  Lines        5420     5427       +7     
==========================================
+ Hits         4962     4964       +2     
- Misses        458      463       +5     
Impacted Files Coverage Δ
src/run_build.py 83.01% <33.33%> (-8.29%) ⬇️

@rishabh6788
Copy link
Collaborator Author

Below are the artifacts from my preliminary testing.
Command: ./build.sh manifests/3.0.0/opensearch-3.0.0.yml -d tar -p linux -a x64

Final output: 2023-07-14 22:20:45 INFO There are failed components: ['opensearch-reports', 'security', 'performance-analyzer', 'opensearch-observability', 'k-NN', 'neural-search', 'geospatial', 'anomaly-detection', 'cross-cluster-replication', 'notifications-core', 'notifications', 'alerting', 'sql']

Directory Structure:

[opensearch@03f09db82b03 opensearch-build]$ cd tar/builds/opensearch/
[opensearch@03f09db82b03 opensearch]$ ls -l
total 260
drwxrwxr-x 2 opensearch opensearch   4096 Jul 14 22:14 core-plugins
drwxrwxr-x 2 opensearch opensearch   4096 Jul 14 22:14 dist
-rw-rw-r-- 1 opensearch opensearch 248259 Jul 14 22:20 manifest.yml
drwxrwxr-x 3 opensearch opensearch   4096 Jul 14 22:14 maven
drwxrwxr-x 2 opensearch opensearch   4096 Jul 14 22:17 plugins

[opensearch@03f09db82b03 opensearch]$ cd plugins/
[opensearch@03f09db82b03 plugins]$ ls -l
total 283908
-rw-rw-r-- 1 opensearch opensearch   3276543 Jul 14 22:17 opensearch-job-scheduler-3.0.0.0.zip
-rw-rw-r-- 1 opensearch opensearch 287440506 Jul 14 22:16 opensearch-ml-3.0.0.0.zip

[opensearch@03f09db82b03 opensearch]$ cd dist/
[opensearch@03f09db82b03 dist]$ ls -l
total 240100
-rw-rw-r-- 1 opensearch opensearch 245854998 Jul 14 22:14 opensearch-min-3.0.0-linux-x64.tar.gz

build-manifest.yml.zip
@peterzhuamazon @prudhvigodithi @gaiksaya

@rishabh6788
Copy link
Collaborator Author

@dblock Do you think it is a good idea to not block the distribution build in case of certain plugin failures?

src/run_build.py Outdated
@@ -69,12 +70,20 @@ def main() -> int:
builder.export_artifacts(build_recorder)
except:
logging.error(f"Error building {component.name}, retry with: {args.component_command(component.name)}")
raise
if component.name in ['OpenSearch', 'common-utils', 'OpenSearch-Dashboards']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I like this approach.
There are so many dependencies between plugins, such as AD based on JS/ML, neural based on KNN, and more.

I would say just add everything to the failed list, and the users can figure out the issue by checking failed list, instead of hard coding which one should hard fail which one should be added to list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also say let us add a param to hard fail no matter what, so user can choose where 1 component fail immediately fail the entire thing, or get added to the fail list. Which is similar to your approach but does not have a hardcode list, but depending on user input params.

@dblock
Copy link
Member

dblock commented Jul 17, 2023

I would add a --continue-on-error option or something like that to achieve this behavior. Then you can decide separately when it makes sense to use it or not.

@rishabh6788
Copy link
Collaborator Author

I would add a --continue-on-error option or something like that to achieve this behavior. Then you can decide separately when it makes sense to use it or not.

Thanks @dblock for the comment, it makes sense to keep it optional.
Will update the code.

@prudhvigodithi
Copy link
Collaborator

--continue-on-error

+1 to go with --continue-on-error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants