Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full package review for epichains v0.1.0 release #122

Merged
merged 23 commits into from
Dec 15, 2023
Merged

Full package review for epichains v0.1.0 release #122

merged 23 commits into from
Dec 15, 2023

Conversation

jamesmbaazam
Copy link
Member

@jamesmbaazam jamesmbaazam commented Dec 4, 2023

This is a full package review in preparation for the release of {epichains} v0.1.0 on GitHub.

Note the following

Please self-assign as a reviewer under the "Reviewers" tab on the right side of this window.

Expectations

All kinds of reviews are welcome either as line comments or comment blocks:

  • Ideas for model enhancements
  • Enhancements to enable integration with other packages
  • Code improvements
  • Comments to improve usability
  • Grammatical issues

Deadline

I'm on leave until Dec 11 and will attend to reviews when I return and before the Christmas break on Dec 21. Ideally, all reviews should be posted by Dec 12, but please let me know if you need more time.

What's next after your review?

  • Only major comments will be addressed before the minor release, after which other issues will be addressed in future release cycles.
  • This PR will be closed when all relevant issues have been resolved. Closing this PR will not close any issues raised but will serve as a reference for raising issues to be resolved in subsequent pull requests.
  • Some issues have already been raised in earlier reviews and this full review is a move to solicit agreement or extra comments. If your comments relate to the already raised issues, please link them using appropriate GitHub features.

Thanks and looking forward to your reviews.

@pratikunterwegs
Copy link
Collaborator

pratikunterwegs commented Dec 5, 2023

Thanks @jamesmbaazam - will be reviewing this Weds 6 Dec - Thurs 7 Dec hoping to get this done tomorrow. Happy to coordinate with other reviewers.

Copy link
Member

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you work on this! I think it's overall in good shape. I've made some comments inline relating to design, performance, documentation & good practices.

In terms of design & usability, I have to main concerns:

  • Some ... (e.g., in likelihood()) are passed to child functions several levels deep. This makes it very difficult to know which arguments this function is taking and where they are documented.
    The nobs_offspring argument is a good example of that. It's required if you passed a distribution that epichains doesn't know but it's not really explicit from the function documentation. And as far as I can tell, you cannot arrive to the help page describing it from links in likelihood().
    I am not yet completely sure about the solution for this. Maybe make these arguments explicit rather than implicit in the ellipses?

  • The fact that we have 3 very similar functions, with similar names, similar scopes and similar arguments feels confusing to me.
    From a conceptual point of view, simulate_tree() feels (is?) the same as simulate_from_pop(pop = Inf) and simulate_summary() feels like a downstream analysis function that should be applied to the output of simulate_tree().
    I understand they are separate for technical reasons. I understand that simulate_summary() exists to circumvent the memory issue that arises from the fact we're working with exponential growth processes. I understand simulate_from_pop() and simulate_tree() have slightly different implementations.
    But it's still probably a problem to let technical considerations inform the design of our user interface. Even if we had to, e.g., internally dispatch to simulate_from_pop() or simulate_tree(), there may be value in having a single user-facing wrapper function.
    Tangentially but on a related note, I don't follow why the output of simulate_from_pop() has fewer columns than the one from simulate_tree().

DESCRIPTION Outdated Show resolved Hide resolved
DESCRIPTION Outdated Show resolved Hide resolved
DESCRIPTION Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
R/borel.r Outdated Show resolved Hide resolved
R/stat_likelihoods.R Outdated Show resolved Hide resolved
tests/testthat/test-likelihood.R Outdated Show resolved Hide resolved
tests/testthat/test-stat_likelihoods.R Outdated Show resolved Hide resolved
vignettes/projecting_incidence.Rmd Outdated Show resolved Hide resolved
R/simulate.r Outdated Show resolved Hide resolved
R/epichains.R Outdated Show resolved Hide resolved
R/epichains.R Outdated Show resolved Hide resolved
R/epichains.R Outdated Show resolved Hide resolved
R/epichains.R Outdated Show resolved Hide resolved
R/epichains.R Outdated Show resolved Hide resolved
R/epichains.R Outdated Show resolved Hide resolved
@adamkucharski adamkucharski self-requested a review December 8, 2023 08:27
Copy link
Member

@adamkucharski adamkucharski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting all this together. Have added some comments mostly on vignettes – in particular, think there are some tweaks we could made to improve entry point for new users and highlight complementary aspects of other packages.

vignettes/interventions.Rmd Outdated Show resolved Hide resolved
vignettes/interventions.Rmd Outdated Show resolved Hide resolved
vignettes/interventions.Rmd Show resolved Hide resolved
vignettes/epichains.Rmd Outdated Show resolved Hide resolved
vignettes/branching_process_literature.Rmd Outdated Show resolved Hide resolved
README.Rmd Outdated Show resolved Hide resolved
vignettes/epichains.Rmd Outdated Show resolved Hide resolved
vignettes/projecting_incidence.Rmd Outdated Show resolved Hide resolved
vignettes/projecting_incidence.Rmd Outdated Show resolved Hide resolved
Copy link
Collaborator

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jamesmbaazam for opening this review. I think the package looks mostly alright. I haven't looked into the tests and the vignettes, and I'm laying out some general points below.

General

  1. I wonder whether the three simulate_*() functions can be rolled into one. I'm not able to tell why simulate_tree() and simulate_tree_from_pop() are different. It should be possible to have both an initial number of cases, and a finite population size.
  2. The simulate_summary() function can probably be removed in favour of a summary.epichains() method for other simulate_*() outputs. This would probably cut down on the code and I don't feel that simulate_tree() is too slow to run.
  3. I am not convinced that this package benefits from S3 classes, as they are very thin wrappers around standard structures. One good reason to have a custom class is for methods such as summary.epichains() (see point above), or methods to link it with {igraph}/{tidygraph} or {epicontacts}. This could still be done by returning data.frames in an edgelist format (from, to, edge strength) which can be ingested by e.g. igraph::graph_from_data_frame() - this opens the door to easy plotting.
  4. The vignette on interventions should be rephrased or removed, as the examples are not modelling reactive interventions, but rather scenarios with different pathogen or public-health parameters. It should be possible to implement an intervention that reduces a model parameter between some timepoints (see <rate_interventions> in {epidemics}), but this would require you to keep track of the model time separately from the infection's generation time, and move the code in a more (discrete?) time-based direction.

General technical

  1. Bring package test coverage up to 100%;
  2. Update the WORDLIST and enable R CMD check failure on spelling errors;
  3. Run {styler};
  4. Ensure that the GH workflows are up to date with {packagetemplate}, I haven't checked if they are;
  5. Go through the spring cleaning tasklist and implement any fixes needed; see also the {usethis} release checklist: e.g. add examples for all exported functions;
  6. Suggest numeric vectors should never have infinite values, check all instances of checkmate::assert_*() and especially do not let the upper value of numerics be Inf;
  7. Set all optional arguments to NULL rather than missing, check instances of this, e.g. R/likelihood.R;
  8. Mention the type of all function arguments and returns in the documentation. E.g. for likelihood(), @param chains A numeric vector representing chain sizes or lengths.;
  9. Be consistent with the use of return();
  10. Prefer the use of checkmate::assert_*() or stopifnot(checkmate::test_*()) rather than if(!checkmate::test_*()) stop();
  11. Avoid comments inside temporary functions generated within other function bodies;
  12. Prefer rbind(matrix, matrix) and use matrices rather than data.frames in while loops --- this is likely more efficient --- you can convert to data.frame at the end;
  13. Set a seed for consistency, and set parameter values that result in sizeable outbreaks in In function documentation examples, otherwise examples often return empty <epichains_tree>s as the outbreak does not take off --- this might be realistic but not very informative to users;
  14. Set seeds for consistency in the vignettes;
  15. There is often a mention of "chain sizes and lengths" - please explain the difference to users (and developers) if there is one. For example, when mentioning "a vector of chain sizes and lengths", the meaning is unclear - is it a single vector, and "size" and "length" are interchangeable, or is it two different vectors? Is the 'length' the same as the diameter of a network?,
  16. Contributing.md: Change references to {bpmodels}

Epichains classes

I don't think the classes add much and should be considered for removal, unless a suite of methods are planned that actually leverage the object signature.
Until they are removed:

  • I would not allow the <epichains_tree> to inherit all the classes of the input data, as these could pull along classes such as <data.table>, <tbl>, etc.
  • <epichains_summary> validator: needs to be beefed up, currently only checks for the class
  • <epichains_tree> validator:
    • Move these checks, or add these checks, to the constructor as well
    • Add checks to assert that other class members are of the expected types, this will be useful if/when you implement an as.epichains_*() method.
  • format.epichains_summary():
    • What is the vector being printed? Is it the vector of chain sizes?
    • The maximum is not being printed as the list element name being accessed is wrong

R/simulate.r Outdated Show resolved Hide resolved
R/simulate.r Outdated Show resolved Hide resolved
R/simulate.r Outdated Show resolved Hide resolved
R/simulate.r Outdated Show resolved Hide resolved
R/simulate.r Outdated Show resolved Hide resolved
R/epichains-package.R Outdated Show resolved Hide resolved
R/epichains.R Show resolved Hide resolved
R/helpers.R Outdated Show resolved Hide resolved
R/likelihood.R Show resolved Hide resolved
R/simulate.r Show resolved Hide resolved
@jamesmbaazam
Copy link
Member Author

  • The fact that we have 3 very similar functions, with similar names, similar scopes and similar arguments feels confusing to me.
    From a conceptual point of view, simulate_tree() feels (is?) the same as simulate_from_pop(pop = Inf) and simulate_summary() feels like a downstream analysis function that should be applied to the output of simulate_tree().
    I understand they are separate for technical reasons. I understand that simulate_summary() exists to circumvent the memory issue that arises from the fact we're working with exponential growth processes. I understand simulate_from_pop() and simulate_tree() have slightly different implementations.
    But it's still probably a problem to let technical considerations inform the design of our user interface. Even if we had to, e.g., internally dispatch to simulate_from_pop() or simulate_tree(), there may be value in having a single user-facing wrapper function.
    Tangentially but on a related note, I don't follow why the output of simulate_from_pop() has fewer columns than the one from simulate_tree().

Thanks for raising this. We have decided, after some brainstorming, to combine simulate_tree() and simulate_tree_from_pop() into a single function and make simulate_summary() internal (with a better name). I will detail this in the design document, which has been in a draft state for a while now. Your review will be welcomed.

@codecov-commenter
Copy link

codecov-commenter commented Dec 15, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (30b68a0) 98.64% compared to head (41adfb9) 99.11%.
Report is 42 commits behind head on main.

❗ Current head 41adfb9 differs from pull request most recent head a3b8022. Consider uploading reports for the commit a3b8022 to get more accurate results

Files Patch % Lines
R/epichains.R 98.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #122      +/-   ##
==========================================
+ Coverage   98.64%   99.11%   +0.46%     
==========================================
  Files           8        8              
  Lines         518      562      +44     
==========================================
+ Hits          511      557      +46     
+ Misses          7        5       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jamesmbaazam jamesmbaazam reopened this Dec 15, 2023
@jamesmbaazam jamesmbaazam merged commit f5eed5b into main Dec 15, 2023
8 checks passed
@jamesmbaazam jamesmbaazam deleted the review branch December 15, 2023 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants