Skip to content

Commit

Permalink
add extra exercise
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb committed Apr 29, 2024
1 parent c9480ff commit 65a5180
Showing 1 changed file with 32 additions and 5 deletions.
37 changes: 32 additions & 5 deletions docs/comparing-metagenomes.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,19 +121,46 @@ If you plot this via MDS, you'll see a clear separation:
Points to discuss:

* what does this all mean, in ~microbial terms? Hint: ask Mani to
revist how the test data sets were generated!
revist how the test data sets were generated! Alternatively,
go on to the next section!

## Extra: examining taxonomy

<!--
If we quickly run our [taxonomy analysis](single-metagenomes-taxonomy.md) on
one of the other samples, we can maybe start to see some of the reasons for
the differences in diversity but not richness:

## Comparing based on taxonomy
```
mamba activate tax
sourmash scripts fastgather ../data/tutorial_other/CD240.sig.zip \
../databases/gtdb-rs214-k31.zip -o CD240.x.gtdb-rs214.fastgather.csv -c 16
sourmash gather ../data/tutorial_other/CD240.sig.zip \
../databases/gtdb-rs214-k31.zip -o CD240.x.gtdb-rs214.gather.csv \
--picklist CD240.x.gtdb-rs214.fastgather.csv:match_name:ident
sourmash tax metagenome -g CD240.x.gtdb-rs214.gather.csv \
-t ../single-metag/gtdb-rs214.lineages.sqldb -F human
```
mamba create -y -n workshop-r r-base r-tidyverse r-vegan r-ape r-rcolorbrewer

You should see:
```
sample name proportion cANI lineage
----------- ---------- ---- -------
CD240 42.2% 94.0% d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides uniformis
CD240 19.5% 94.5% d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
CD240 12.6% 94.1% d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides;s__Parabacteroides distasonis
CD240 11.7% 91.2% d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Ruminococcus_E;s__Ruminococcus_E bromii_B
CD240 11.4% - unclassified
CD240 2.6% 91.4% d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii_D
```

That's right - both samples have similar species, but the abundances of those
species are quite different.

-->
Note that in this case that's not an accident: the dataset was created
specifically to contain only five species ;).

---

Expand Down

0 comments on commit 65a5180

Please sign in to comment.