Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project suggestion: Convert ProjectTemplate to tidyverse #230

Open
rsangole opened this issue Apr 21, 2018 · 7 comments
Open

Project suggestion: Convert ProjectTemplate to tidyverse #230

rsangole opened this issue Apr 21, 2018 · 7 comments
Assignees
Milestone

Comments

@rsangole
Copy link
Collaborator

rsangole commented Apr 21, 2018

@Hugovdberg - making a space for you to document your thoughts on this. I recall you mentioned this in #220 .

@rsangole rsangole added the Under investigation Default label before an issue is classified as Bug/Feature/... label Apr 23, 2018
@Hugovdberg Hugovdberg changed the title Convert ProjectTemplate to tidyverse Project suggestion: Convert ProjectTemplate to tidyverse Apr 28, 2018
@rsangole rsangole added Internal and removed Under investigation Default label before an issue is classified as Bug/Feature/... labels Apr 29, 2018
@Hugovdberg Hugovdberg added this to the 1.0 milestone May 2, 2018
@Hugovdberg
Copy link
Collaborator

Ok, I'll do this in steps, it appears I can't write this proposal all in one go..

A large part of the R community, among which I suspect a large part of the ProjectTemplate users, are migrating to the tidyverse packages by Hadley Wickham. A major advantage of these packages is their uniform interface to functions (like the convention to always take data as a first argument) in combination with the high performance.
A few of the major packages in the tidyverse that can be useful in the internals of ProjectTemplate are:

  • dplyr: famously well at manipulating data.frames with clear verbs as function names (mutate, filter, etc.)
  • purrr: replacements for the apply family of functions for looping over vectors
  • tibble: enhanced print functions for data.frames, and a replacement function data_frame which also handles list-type columns better.
  • readr: High performace import functions for several file types
  • readxl: High performance import function for Excel files without external dependencies

Because inside several functions there is actually some data manipulation, and of course we import several types of files. There are some caveats though:

  • Not all file types can be read in using the tidyverse, some default variable names will become inconsistent between file types if we don't take some precautions.

@rsangole
Copy link
Collaborator Author

rsangole commented May 3, 2018

Thanks for getting this kicked off. As I think of things, I'll append too.

One thing that I'd like to consider is using assertive as the pipe-able tool of choice to writing robust unit tests. If you haven't used it before, give it a go. It's a very expressive and tidyverse friendly package.

@Hugovdberg
Copy link
Collaborator

@rsangole I haven't used assertive yet, but I would say that changing the tests is something to be done with much care. We should NOT convert library functions to tidyverse while at the same time changing the tests. If assertive is a great improvement over testthat then we should either:

  1. build a complete set of redundant tests and only after all existing cases are covered remove the old tests, and then start changing the code; or
  2. Update the library code to the tidyverse and then prove using the old tests that nothing has changed we don't want to change, and only then change the tests (again by building redundant tests and then removing the old tests).

I think that option 2 is the best if we don't want interface changes. If we do accept interface changes we might also first convert the tests and then change the library code.

Does assertive integrate with the RStudio build tools and devtools::test() by the way?

@KentonWhite
Copy link
Owner

We would also need to consider if assertive will work with travis.

@rsangole
Copy link
Collaborator Author

rsangole commented May 3, 2018

Agree with both @Hugovdberg and @KentonWhite . We should not try and 'fix' what currently works. I was of the opinion that:

  • After moving existing library code to tidyverse, if the existing tests work, do not change the test codes at all.
  • Investigate assertive as a method to write any new tests we write for new features/bug removal etc, as assertive is more programmer friendly (like other tidyverse tools).
  • If assertive works with requirements of devtools or travis, use assertive as an alternate way of writing tests.

Does assertive integrate with the RStudio build tools and devtools::test() by the way?
We would also need to consider if assertive will work with travis.

I don't know answers to either of these questions. I can read up and check.

@bugsysiegals
Copy link
Contributor

Any ETA on how far off this is from realization? I've had to rewrite a bunch of code because the field names being imported are now with a dot between them, which may not be a bad option, but would be nice to import without doing this to avoid a lot of code rewrite. I don't have a huge amount of extra time but if there's anything I can do to help, let me know.

@KentonWhite
Copy link
Owner

A ways off. There are issues with backwards compatibility. Not every project is tidyverse compliate. Bioconductor, which represents a large user base from the biological sciences, are not tidyverse compliant. Moving everything over to tidyverse causes problems for those users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants