Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to ensure availability of older releases for reproducibility? #238

Open
jchodera opened this issue Jun 27, 2015 · 6 comments
Open

Comments

@jchodera
Copy link
Member

What is the best way for us to ensure reproducibility by making older releases available in perpetuity?

Keeping packages only on binstar is not necessarily safe; we may want to archive older release versions somewhere.

Alternatively, we could simply have a different version of the conda package for each release (e.g. keep both openmm-6.2 and openmm-6.3 recipes). This will ensure that older release versions are rebuilt and reuploaded should they happen to accidentally be removed from binstar.

Having older releases available is very useful since we can simply specify a series of conda install commands with specific release version numbers in a paper, allowing users to quickly grab the exact versions of the packages used for a given work.

@rmcgibbo
Copy link
Contributor

Why is binstar unsafe? With mdtraj for example, I see all of the old
versions https://binstar.org/omnia/mdtraj/files.

Also, the canonical source is the git history, no?
On Jun 27, 2015 7:59 AM, "John Chodera" [email protected] wrote:

What is the best way for us to ensure reproducibility by making older
releases available in perpetuity?

Keeping packages only on binstar is not necessarily safe; we may want to
archive older release versions somewhere.

Alternatively, we could simply have a different version of the conda
package for each release (e.g. keep both openmm-6.2 and openmm-6.3
recipes). This will ensure that older release versions are rebuilt and
reuploaded should they happen to accidentally be removed from binstar.

Having older releases available is very useful since we can simply specify
a series of conda install commands with specific release version numbers
in a paper, allowing users to quickly grab the exact versions of the
packages used for a given work.


Reply to this email directly or view it on GitHub
#238.

@jchodera
Copy link
Member Author

Why is binstar unsafe? With mdtraj for example, I see all of the old versions https://binstar.org/omnia/mdtraj/files.

It's mostly a human thing. For gaff2xml, for example, you suggested I clean out some old files and I did before we realized that these were important for reproducibility.

I've also had issues with automated push scripts accidentally "disappearing" old files, but I think this is because of a weird binstar bug where when the --force flag was called to overwrite an existing package and binstar complained about Invalid dist_id, it would delete the old package but not upload the new package. That bug was confined to the -dev packages, but I don't sufficiently trust that binstar is free of bugs (it really isn't) to entrust binstar to be the only way to store packages.

We can certainly back up the binstar repositories, but we'd have to figure that out.

Also, the canonical source is the git history, no?

Yes, you could go and find the exact release history version of each package (assuming you can figure out what the package is now called if it has changed names, like gaff2xml) and install them manually. But having something like a "safe" repository of historical packages (or the build scripts to easily regenerate those packages) would certainly make me feel safer.

@rmcgibbo
Copy link
Contributor

Have you used zenodo? Might solve your problem, and they give you a doi.
On Jun 27, 2015 11:17 AM, "John Chodera" [email protected] wrote:

Why is binstar unsafe? With mdtraj for example, I see all of the old
versions https://binstar.org/omnia/mdtraj/files.

It's mostly a human thing. For gaff2xml, for example, you suggested I
clean out some old files and I did before we realized that these were
important for reproducibility.

I've also had issues with automated push scripts accidentally
"disappearing" old files, but I think this is because of a weird binstar
bug where when the --force flag was called to overwrite an existing
package and binstar complained about Invalid dist_id, it would delete the
old package but not upload the new package. That bug was confined to the
-dev packages, but I don't sufficiently trust that binstar is free of
bugs (it really isn't) to entrust binstar to be the only way to store
packages.

We can certainly back up the binstar repositories, but we'd have to figure
that out.

Also, the canonical source is the git history, no?

Yes, you could go and find the exact release history version of each
package (assuming you can figure out what the package is now called if it
has changed names, like gaff2xml) and install them manually. But having
something like a "safe" repository of historical packages (or the build
scripts to easily regenerate those packages) would certainly make me feel
safer.


Reply to this email directly or view it on GitHub
#238 (comment)
.

@jchodera
Copy link
Member Author

jchodera commented Jul 6, 2015

Zenodo may be a good idea.

In other news, it looks like the pymbar 2 binstar packages somehow went away:
https://binstar.org/omnia/pymbar/files

I'm going to add back a separate pymbar2 conda recipe to ensure these don't disappear.

@jchodera
Copy link
Member Author

jchodera commented Jul 9, 2016 via email

@mpharrigan
Copy link
Contributor

Feel free to reopen

@jchodera jchodera reopened this Jul 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants