Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAR indexes contain alternative haplotypes #331

Open
mblue9 opened this issue Sep 23, 2020 · 3 comments
Open

STAR indexes contain alternative haplotypes #331

mblue9 opened this issue Sep 23, 2020 · 3 comments

Comments

@mblue9
Copy link

mblue9 commented Sep 23, 2020

Hello,

I'm helping some researchers analyse some zebrafish data and just discovered the zebrafish STAR index (danRer11) in EU contains the alternative haplotypes. Afaik the alts shouldn't be included, see STAR manual here https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Generally, patches and alternative haplotypes should not be included in the genome.

as it means reads that map to those regions will get false low mapping scores and discarded. In the samples I'm looking at it seems to be ~10% mapping to the alts.

Is it possible to remove the alts from the indexes?

Human hg38 STAR looks like it also contains the alts.

@bgruening
Copy link
Member

:( not good. We have been using the regular UCSC genomes as we do always :(

@mblue9
Copy link
Author

mblue9 commented Sep 25, 2020

:( not good. We have been using the regular UCSC genomes as we do always :(

Thanks for the reply! I know it's extra hassle :( but would it be possible to make a version that excludes the alts from the UCSC genomes before indexing?

My understanding is that most people would want the version without alts for STAR (and most other aligners) as Devon Ryan says in this post.

https://www.biostars.org/p/330596/

STAR manual recommends to exclude haplotypes and patches from reference genome while keeping unplaced scaffolds when aligning RNA-seq reads. Is the same recommended for aligning ChIP-seq reads with Bowtie2? Do Bowtie2 index take that into account?

The recommendations for STAR apply to all aligners except BWA mem and novoalign. This is also true for all types of sequencing experiments.

@wm75
Copy link
Member

wm75 commented Sep 25, 2020

Related blog post of Heng Li on the issue with the human genome: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use.
So maybe we could even optimize hg19 a bit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants