Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to convert a data.frame back into an epichains object #277

Open
bquilty25 opened this issue Aug 29, 2024 · 3 comments
Open

Add ability to convert a data.frame back into an epichains object #277

bquilty25 opened this issue Aug 29, 2024 · 3 comments

Comments

@bquilty25
Copy link

I'm doing an analysis where I use simulate_chains() to generate an epidemic, which is then "pruned" afterwards to simulate the effect of testing and isolation.

This involves using as.data.frame() on the resulting epichains object, applying a set of functions, and then summarising the results using the tidyverse. However it would be nice if there were the ability to convert the data.frame back into an epichains object, so that it can be summarised using functions within the package as well as be interoperable with epicontacts for plotting, if possible.

@jamesmbaazam
Copy link
Member

Thanks for raising this @bquilty25. This is indeed a nice workflow to have. I will address this issue in the coming week and point you to how to access the new feature.

@sbfnk
Copy link
Contributor

sbfnk commented Sep 3, 2024

Are you sue you need to call as.data.frame()? The epichains objects are data frames, too, so I would have thought it's possible to use tidyverse functions directly (I think).

@jamesmbaazam
Copy link
Member

jamesmbaazam commented Sep 3, 2024

Here's my initial assessment.

  • You don't need to change it to a <data.frame> for use with tidyverse but even if you do, the attributes are retained. In that case, we just need a simple as.epichains() method to to convert back to <epichains> (this is in the scope of this issue).

  • You lose the attributes only when you invalidate the class by removing a protected column, hence, you can't print() currently as it does not transfer to the next method. Here, we need to transfer to the next printing method (See related issues: Use next print() method when <epichains> is invalidated #278 and Add custom subsetting method #274).

library(epichains)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(32)
chains_pois_offspring <- simulate_chains(
    n_chains = 10,
    statistic = "size",
    offspring_dist = rpois,
    stat_threshold = 10,
    generation_time = function(n) rep(3, n),
    lambda = 2
)

# Print fresh object
chains_pois_offspring
#> `<epichains>` object
#> 
#> < epichains head (from first known infector) >
#> 
#>    chain infector infectee generation time
#> 11     1        1        2          2    3
#> 12     1        1        3          2    3
#> 13     2        1        2          2    3
#> 14     2        1        3          2    3
#> 15     3        1        2          2    3
#> 16     3        1        3          2    3
#> 
#> 
#> Number of chains: 10
#> Number of infectors (known): 9
#> Number of generations: 4
#> Use `as.data.frame(<object_name>)` to view the full output in the console.
# attributes of the fresh object
attributes(chains_pois_offspring)
#> $names
#> [1] "chain"      "infector"   "infectee"   "generation" "time"      
#> 
#> $class
#> [1] "epichains"  "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129
#> 
#> $n_chains
#> [1] 10
#> 
#> $statistic
#> [1] "size"
#> 
#> $offspring_dist
#> function (n, lambda) 
#> .Call(C_rpois, n, lambda)
#> <bytecode: 0x1113f18e8>
#> <environment: namespace:stats>
#> 
#> $stat_threshold
#> [1] 10
#> 
#> $track_pop
#> [1] FALSE

# change it to a data.frame and inspect the attributes
chains_df <- as.data.frame(chains_pois_offspring)
attributes(chains_df)
#> $names
#> [1] "chain"      "infector"   "infectee"   "generation" "time"      
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
#>  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
#>  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
#>  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
#>  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
#>  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
#> [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
#> [127] 127 128 129
#> 
#> $n_chains
#> [1] 10
#> 
#> $statistic
#> [1] "size"
#> 
#> $offspring_dist
#> function (n, lambda) 
#> .Call(C_rpois, n, lambda)
#> <bytecode: 0x1113f18e8>
#> <environment: namespace:stats>
#> 
#> $stat_threshold
#> [1] 10
#> 
#> $track_pop
#> [1] FALSE
 
# which attributes are lost?
setdiff(names(attributes(chains_pois_offspring)), names(attributes(chains_df)))
#> character(0)


## Use original epichains object in a tidyverse pipeline and remove a protected column
obj_modified <- chains_pois_offspring |> 
    mutate(isolated = ifelse(generation > 2, TRUE, FALSE)) |> 
    select(-c(generation, infector))


# what is the class?
class(obj_modified)
#> [1] "epichains"  "data.frame"

# Which attributes are lost?
setdiff(names(attributes(chains_pois_offspring)), names(attributes(obj_modified)))
#> [1] "n_chains"       "statistic"      "offspring_dist" "stat_threshold"
#> [5] "track_pop"

# Can we print the modified object?
try(obj_modified)
#> Error in .validate_epichains(x): object does not contain the correct columns

Created on 2024-09-03 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants