Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: function power_prune() does not behave correctly when SNPs in a given outcome-study subset have varying sample sizes #557

Open
phageghost opened this issue Aug 29, 2024 · 0 comments
Labels

Comments

@phageghost
Copy link
Contributor

phageghost commented Aug 29, 2024

Please make sure that this is a bug! If you have questions about how to use TwoSampleMR please use the Discussions function instead.

Describe the bug (required)

When running power_prune() with method 2 and assuming a continuous outcome distribution, outcome data which has different values for samplesize produces a mismatch in size between an intermediate data.frame and the iv.se vector used to populate a column in that data.frame.

Describe the current behaviour you observe (required)

bmi_exp_dat <- extract_instruments(outcomes = 'ieu-a-2', opengwas_jwt=JWT_TOKEN)
ao <- available_outcomes(opengwas_jwt = JWT_TOKEN)
chd_studies = subset(ao, trait == 'Coronary heart disease')
chd_out_dat <- extract_outcome_data(snps = bmi_exp_dat$SNP, outcomes = chd_studies$id, opengwas_jwt=JWT_TOKEN)
dat <- harmonise_data(
exposure_dat = bmi_exp_dat,
outcome_dat = chd_out_dat
)
dat <- power_prune(dat, method = 2, dist.outcome = "continuous")

Extracting data for 79 SNP(s) from 5 GWAS(s)

Finding proxies for 10 SNPs in outcome ieu-a-9

Extracting data for 10 SNP(s) from 1 GWAS(s)

Finding proxies for 47 SNPs in outcome ieu-a-6

Extracting data for 47 SNP(s) from 1 GWAS(s)

Finding proxies for 1 SNPs in outcome ebi-a-GCST000998

Extracting data for 1 SNP(s) from 1 GWAS(s)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ebi-a-GCST000998 (ebi-a-GCST000998)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-6 (ieu-a-6)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-8 (ieu-a-8)

Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-9 (ieu-a-9)

[1] 1
[1] "Body mass index Coronary heart disease"
[1] "identifying best powered summary set: Body mass index || id:ieu-a-2 ieu-a-2 Coronary heart disease || id:ebi-a-GCST000998 ebi-a-GCST000998"
[1] "identifying best powered summary set: Body mass index || id:ieu-a-2 ieu-a-2 Coronary heart disease || id:ieu-a-6 ieu-a-6"
Error in $<-.data.frame(*tmp*, "iv.se", value = c(0.0513059534662941, : replacement has 47 rows, data has 61
Traceback:

  1. power_prune(dat, method = 2, dist.outcome = "continuous")
  2. $<-(*tmp*, "iv.se", value = c(0.0513059534662941, 0.0514040302606308,
    . 0.0513042674065994, 0.0513422440048546, 0.0513025815131199, 0.0513506947075708,
    . 0.0513017386287024, 0.0627769011116508, 0.051311855984565, 0.0513625327048284,
    . 0.0513169169076932, 0.051333797472922, 0.0512966821944926, 0.0514133598706217,
    . 0.0513143862589525, 0.0515334001685128, 0.0513819986293536, 0.0714917801900777,
    . 0.0513456237852549, 0.0513186042148732, 0.0513034244390844, 0.0513067965584807,
    . 0.0513557671326049, 0.0512882581256275, 0.0513219793286036, 0.0629381381083169,
    . 0.0512924696412343, 0.0513101693429018, 0.0513253551083434, 0.0513312643261322,
    . 0.0513879274355384, 0.0513000529844945, 0.0514286374686093, 0.0513076396922313,
    . 0.0513583039088733, 0.0513380202177548, 0.0513160733165124, 0.0514583829379987,
    . 0.0513354864458105, 0.0513084828675495, 0.0513371755854206, 0.0512916272551145,
    . 0.0747620792897615, 0.0512975248296999, 0.0627722684149263, 0.0569081998241544,
    . 0.051317760540479))
  3. $<-.data.frame(*tmp*, "iv.se", value = c(0.0513059534662941,
    . 0.0514040302606308, 0.0513042674065994, 0.0513422440048546, 0.0513025815131199,
    . 0.0513506947075708, 0.0513017386287024, 0.0627769011116508, 0.051311855984565,
    . 0.0513625327048284, 0.0513169169076932, 0.051333797472922, 0.0512966821944926,
    . 0.0514133598706217, 0.0513143862589525, 0.0515334001685128, 0.0513819986293536,
    . 0.0714917801900777, 0.0513456237852549, 0.0513186042148732, 0.0513034244390844,
    . 0.0513067965584807, 0.0513557671326049, 0.0512882581256275, 0.0513219793286036,
    . 0.0629381381083169, 0.0512924696412343, 0.0513101693429018, 0.0513253551083434,
    . 0.0513312643261322, 0.0513879274355384, 0.0513000529844945, 0.0514286374686093,
    . 0.0513076396922313, 0.0513583039088733, 0.0513380202177548, 0.0513160733165124,
    . 0.0514583829379987, 0.0513354864458105, 0.0513084828675495, 0.0513371755854206,
    . 0.0512916272551145, 0.0747620792897615, 0.0512975248296999, 0.0627722684149263,
    . 0.0569081998241544, 0.051317760540479))
  4. stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
    . "replacement has %d rows, data has %d"), N, nrows), domain = NA)

Describe the behaviour you expect (required)

Return a pruned dataframe without errors

R code to reproduce the issue (required)

See above

Contribute a solution (optional)

PR 556

System information

  • Ubuntu 24.04 LTS
  • R version 4.4.1 (2024-06-14) -- "Race for Your Life"

Additional context

This was discovered while elaborating an example in the documentation to test the power_prune() function (which doesn't have a complete end-to-end example since the rest of the example on that doc page only deals with a single outcome subset).

@phageghost phageghost added the bug label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
@phageghost and others