Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaAslin2-Tutorial #5173

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

renu-pal
Copy link

@renu-pal renu-pal commented Jul 16, 2024

Tutorial draft on Maaslin2. Would love any suggestions or changes on this.

@shiltemann shiltemann marked this pull request as ready for review July 30, 2024 09:57
@shiltemann
Copy link
Member

Hi @renu-pal, thanks for your contribution! I have taken it out of draft mode so that our tests can run and people will know they can review it :)

@renu-pal
Copy link
Author

Hi @renu-pal, thanks for your contribution! I have taken it out of draft mode so that our tests can run and people will know they can review it :)

Sounds great! :)

@shiltemann shiltemann requested a review from a team August 8, 2024 09:31
Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! would be nice to extand a bit, show different options and some more output

>
{: .agenda}

# Get the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the data come from, what does it contain, was it used in other studies, are there other studies that suggest ideal maaslin2 parameters for this kind of data ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find studies which used both HMP2 data as well as Maaslin2 tool. So instead I mentioned studies which used Maaslin2 tool with the parameters used. If it does not feel right , then please let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its enought to add that this was the initial data used to demonstrate maaslin2 ?

@paulzierep
Copy link
Collaborator

Can you look into the linting issues as well ?

@bgruening
Copy link
Member

@renu-pal do you need any help here?

@renu-pal
Copy link
Author

renu-pal commented Sep 8, 2024

@renu-pal do you need any help here?

Definitely @bgruening :) . Can you please go through the tutorial and let me know if you find anything wrong.

>
{: .agenda}

# Get the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- **Sensitivity** measures how well the methods detect true signals, higher values lead to better performance.
- **False discovery rate (FDR)** measures the proportion of false positives among detected signals (lower FDR is better).
- MaAsLin2 is the clear standout for both differential abundance detection and multivariable association detection, showing high sensitivity and maintaining a low FDR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add something like:
In this regard, MaAsLin2 can be seen as a Swiss army knife for differential analysis of microbiome data. With some text processing various omics data types could be used as input e.g. from these GTN tutorials:

then check the existing GTN toturial and add those that provide matching data

  • bla
  • blumb

description: Galaxy Training Network Material
synopsis: Galaxy Training Network Material. See https://training.galaxyproject.org
items:
- name: The new topic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: The new topic
- name: The new topic

?

| **Phyloseq + DESeq2**| Strong for RNA-seq and transcriptomics; integrates with Phyloseq | Lacks compositionality awareness | While DESeq2 works for microbiome data, MaAsLin2 offers more suitable options for compositional data and covariate handling. |
| **Limma-Voom** | Effective for RNA-seq and microarray data, handles low counts | Not tailored for compositional microbiome data | Limma-Voom is well-suited for gene expression, but MaAsLin2 better accounts for the unique characteristics of microbiome data. |

- ANCOM-BC and MaAsLin2, outperform general-purpose tools like DESeq2 and limma-voom when it comes to microbiome data. This is due to their handling of the compositional nature of microbiome data and the sparsity typical of microbial datasets.[PMID: 36617187](https://pubmed.ncbi.nlm.nih.gov/36617187/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please exchange all links to studies with citations as shown here:

The currently available studies used Illumina sequencing, generating short reads. Longer read lengths, generated by third-generation sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), make it **easier and more practical to identify strains with fewer reads**. MinION (from Oxford Nanopore) is a portable, real-time device for ONT sequencing. Several proof-of-principle studies have shown the **utility of ONT long-read sequencing from metagenomic samples for pathogen identification** ({% cite CIUFFREDA20211497 %}).

- **Taxonomy (or features) file**: \
This file is tab-delimited.\
Formatted with features as columns and samples as rows.\
The transposition of this format is also okay.\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ? Can our wrapper work with both ?


MaAsLin2 requires the following input files:

- **Taxonomy (or features) file**: \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Taxonomy (or features) file**: \
- **Features file**: \

Would like to keep it more generic. Maybe add some examples like: (OTU/ASV abundance table, MAGs abundance matrix, taxonomy table, gene count matrix at the end of the list

>
{: .agenda}

# Get the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its enought to add that this was the initial data used to demonstrate maaslin2 ?

Formatted with features as columns and samples as rows.\
The transposition of this format is also okay.

The Taxonomy file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Taxonomy file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis.
The feature file can contain samples not included in the metadata file (or vice versa). For both cases, those samples not included in both files will be removed from the analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants