Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need --purge option to remove and compress output for long-term storage #50

Open
RickKessler opened this issue May 11, 2021 · 4 comments
Labels
enhancement New feature or request pippin 2.0

Comments

@RickKessler
Copy link
Collaborator

No description provided.

@djbrout
Copy link
Collaborator

djbrout commented May 11, 2021

1_SIM: Recursively Tar Folders
2_LCFIT:
- allow pipin to continue after submit_batch_jobs.sh --purge
- recursive tar option for archiving
3_CLASS:
- reduce log output (need to ask people who implemented classifiers to help)
- recursive tar option for archiving
4_AGG:
- gzip the .csv and .key files immediately
- recursive tar option for archiving
5_MERGE:
- recursive tar option for archiving
6_BIASCOR:
- allow pipin to continue after submit_batch_jobs.sh --purge
- recursive tar option for archiving
7_CREATE_COV:
- allow for gzipping immediately of covariance and correlation matrices. may have to implement this for cosmomc covariance matrices.
- recursive tar option for archiving
8_COSMOMC:
- remove chains log files except last one
- catenate chains and gzip just like its done in 9_ANALYZE (but dont get rid of the original walkers until arxiving, but can gzip the walkers)
- arxiving: remove walkers and recursive tar
9_ANALYSE
- dont recopy the chains here, just make symlink
- remove parenthesis and blank spaces in filenames
- recursive tar option for archiving (maybe not needed, but cant hurt?)

@djbrout
Copy link
Collaborator

djbrout commented May 11, 2021

cfg.yml - should have a max_output (to prevent someone accidentally generating terabytes of data)

@djbrout
Copy link
Collaborator

djbrout commented May 11, 2021

cfg.yml should have parallel outputs for SCRATCH and PROJECT. each of these directories should have an ALLOWED.TXT for SCRATCH it will be * but for PROJECT it will be PANTHEON or something that only permits certain outputs to be generated to limit the project area from being overrun.

@OmegaLambda1998
Copy link
Member

More suggestions from @RickKessler (#111)

under 3_CLAS/XYZ

compress output.log
compress predictions.csv
remove /dump directory unless user requests it (there is a huge .pickle file that is likely not used)

In stage 7_CREATE_COV, suppress /cosmomc output unless specifically requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pippin 2.0
Projects
None yet
Development

No branches or pull requests

3 participants