Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Independent Enrichment Analysis] Bug in background? #740

Open
kvittingseerup opened this issue Dec 9, 2021 · 2 comments
Open

[Independent Enrichment Analysis] Bug in background? #740

kvittingseerup opened this issue Dec 9, 2021 · 2 comments
Assignees

Comments

@kvittingseerup
Copy link

kvittingseerup commented Dec 9, 2021

I've recently started using your Appyter for Independent Enrichment Analysis to analyze the Enrichr catalog with a costume background.

But because I kept getting very large odds ratios and very small p-values I got suspicious. Therefore I tested the first 10 genes of the 2019 Human WIkiPathway NRF2 pathway WP2884 using a background of the 20 first genes in the gene set. The result can be found here. Ass seen from the Notebook the odds ratio for the is NRF2 pathway WP2884 is calculated to be Inf and the p-value is 6.56e-32. That does not seem like it should be the case if the background was considered?

Did I input the genes wrongly or something similar?

@lachmann12
Copy link
Contributor

lachmann12 commented Dec 10, 2021

Inf odds ratios are possible if the input gene set is a subset of a gene set in the gene set library. This is due to the formula given a contingency table:

a b
c d

odds ratio = ad/bc

in case of a subset bc will be 0. In the Enrichr code this is handled by dividing by max(1, bc), which will result in a very large value.

Not sure if there are some other issues with the background correction, though.

@kvittingseerup
Copy link
Author

I think the problem is more illustrated by the p-value. With the dataset I mention above the fisher.test would (in Rcode) look something like:

m1 <- matrix(c(0,0,10,10), ncol = 2, byrow = F)
broom::tidy( fisher.test(m1) )
  estimate  p.value conf.low conf.high method                             alternative
         0       1        0       Inf Fisher's Exact Test for Count Data two.sided  

If on the other hand the background was not used you would end up with something like:

m2 <- matrix(c(2e4,0,10,10), ncol = 2, byrow = F)
broom::tidy( fisher.test(m2) )
  estimate  p.value conf.low conf.high method                             alternative
       Inf 6.50e-32    3103.       Inf Fisher's Exact Test for Count Data two.sided  

I've tested this with some of my real data and the odds ratio and p-values reported by Independent Enrichment Analysis is very similar to what I get when using a fisher test with all known genes as background instead of the provided subset.

You can also see this by running the example dataset you provide as both foreground and background (Appyter found here). There I still get very significant results with very high OR even though there should be no enrichment.

@u8sand u8sand changed the title Bug in background of Independent Enrichment Analysis? [Independent Enrichment Analysis] Bug in background? Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants