Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProjectTemplate and backward compatibility for existing projects #272

Open
jeromyanglim opened this issue Aug 21, 2018 · 1 comment
Open

Comments

@jeromyanglim
Copy link
Contributor

I think that ProjectTemplate should be a solid and reliable basis for building data analysis projects.

If a user creates a project with a set of data import functions in place, we don't want that data analysis project to break six months or a year later because the data import rules have changed. We want ProjectTemplate to be a solid basis for building projects. For many years, ProjectTemplate has provided this solid basis.

Specifically, I think that any change to data import rules should not break existing data analysis projects.

In contrast,

  • converting data.frames to tibbles breaks existing code
  • converting to tidyverse data import functions breaks existing code (see post on readcsv, tidyverse

As a general rule, data import functions have to make a wide range of choices around variable names, variable types, row names, na conversion, tibbles versus data.frames, strings/factors, use of meta-data and so on.

Thus, the starting assumption should be that whenever you change a data import function, you will break existing code. If the tests are not breaking, it's more likely that the tests are not thorough enough.

That said, several new data import functions do offer benefits. readxl removes dependencies on java, perl, etc, readr is faster than read.csv.

Possible resolutions

So, what happens if the ProjectTemplate community decides, for example, that readxl would be a better excel import function, because it does not require dependencies.

Use project version number to choose data import function. I suppose the code could have something conditional that looks at the config$version. Thus, any modification to the data import rules would involve having a condition so that the new import function only applies to projects with a later version.

Implement a function like archive.project(): This could create some kind of localised version of ProjectTemplate in a folder in the project. I'm not quite sure how this would work.

Anyway, I don't really have the solution to this tension between improving data import functions and maintaining backwards compatibility. But I just thought I'd post this to emphasise the value of backwards compatibility and stability as a counterpoint to the desire to improve data import functions.

@Hugovdberg
Copy link
Collaborator

There is an option to dump the code when creating a new project but I guess it makes more sense to do that in an archive.project function. I'm wondering to what extent our current project layout is compatible with packrat. But it seems to me that archive.project could be a thin wrapper around packrat::init and packrat::snapshot (depending on whether packrat was already initialised in the project). I guess you could combine it with devtools::install_version if you need to install an older version in the packrat library, although I haven't tried this..
The major advantage of this is that we don't have to incorporate logic to simulate all different versions of ProjectTemplate, which is bound to break and it allows us to diverge from previous choices instead of going out of our way to maintain backward compatibility with all previous versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants