Skip to content

Commit

Permalink
fixed broken links into readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Militeee committed Apr 12, 2024
1 parent 6ebe35b commit 3748925
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 179 deletions.
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

MIDAA is a package designed for performing Deep Archetypal Analysis on multiomics data. The documentation can be find here [https://sottorivalab.github.io/midaa/](https://sottorivalab.github.io/midaa/)

#### *The package is under active development, expect breaking changes (we just changed the tool name ;) ) and incomplete documentat for a bit*
#### *I'll try my best to speed this up, if something is broken or you need help please open an issue, dontt be shy!*
<br />

*The package is under active development, expect breaking changes (we just changed the tool name ;) ) and incomplete documentation for a bit*
*I'll try my best to speed this up, if something is broken or you need help please open an issue, do not be shy!*
<br/><br/>
<img src="https://github.com/sottorivalab/daario/blob/69f8399cadfcb10ba1bc483cd4405b823efda64c/logo.png?raw=true" width="200px" align="left">


Expand All @@ -19,7 +18,7 @@ git clone https://github.com/sottorivalab/midaa.git
# you need poetry installed
poetry install
```
<br />
<br/><br/>


## Quick Start
Expand All @@ -36,9 +35,9 @@ midaa leans the matrices $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{Z}$ in an amort
The network is implemented in a Variational Autoencdoer fashion, so we have and encoding and decoding function as well as probabilistic definition of the matrix factorization problem above.
Both the encoder and the decoder have a shared portion where data fusion occurs and an independent piece where modality specific encoding and decoding takes place.

If you are happy with that we have some cool tutorials that will show you how to use MIDAA on real [multi-omics data](https://sottorivalab.github.io/midaa/scMulti_multimodal.ipynb).
If you are happy with that we have some cool tutorials that will show you how to use MIDAA on real [multi-omics data](https://sottorivalab.github.io/daario/scMulti_multimodal.ipynb).

Otherwise, the best way to start is to read [this](https://sottorivalab.github.io/midaa/midaa_long_form.html) or the companion [paper](https://www.biorxiv.org/content/10.1101/2024.04.05.588238v1) and understand what MIDAA actually does in details and what are the [parameters](https://sottorivalab.github.io/midaa/implementation_and_parameters.ipynb) you can play with.
Otherwise, the best way to start is to read [this](https://sottorivalab.github.io/daario/midaa_long_form.html) or the companion [paper](https://www.biorxiv.org/content/10.1101/2024.04.05.588238v1) and understand what MIDAA actually does in details and what are the [parameters](https://sottorivalab.github.io/daario/implementation_and_parameters.ipynb) you can play with.


A minimal example to run the tool:
Expand Down
6 changes: 3 additions & 3 deletions docs/implementation_and_parameters.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@
"source": [
"Let me also introduce the 4 main parameters of MIDAA:\n",
"* The input data, it should be provided as a list of numpy arrays, one for each modality\n",
"* A normalization factor, this is especially useful when you work with raw counts. The normalization factors are modality specific and are applied before computing the likelihood. For instance if we call $\\beta$ the output of the last layer of the decoder and the normalization factors as $\\nu$$ and our likelihood of choice is Poisson then the rate of the Poisson is gonna be computed as $exp(\\beta) * \\nu$. \n",
"* A normalization factor, this is especially useful when you work with raw counts. The normalization factors are modality specific and are applied before computing the likelihood. For instance if we call $\\beta$ the output of the last layer of the decoder and the normalization factors as $\\nu$ and our likelihood of choice is Poisson then the rate of the Poisson is gonna be computed as $exp(\\beta) * \\nu$. \n",
"* The likelihood used to compute the reconstruction loss of the data, we currently support: Gaussian (G), Poisson (P), Negative Binomial (NB), Categorical (C), Bernoulli (B), Beta (Beta). Again likelihoods are modality specific and are list of strings\n",
"* The number of archetypes to fit"
]
Expand All @@ -104,7 +104,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training Parameters\n",
"## Training Parameters\n",
"\n",
"The two main parameters you can change are the number of steps and the learning rate. In our simulations we find that learning rates around 1e-3 and 1e-4 work well. Regarding the number of steps, they really depend on the problem but generally >500 is enough to get a decent model. We just want to highlight that for us a step menas a complete epoch, so the number of actual gradient iterations will be dependent on the batch size and number of samples."
]
Expand Down Expand Up @@ -383,7 +383,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Network parameters "
"## Network parameters "
]
},
{
Expand Down
6 changes: 2 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@
:hidden:
midaa_long_form.ipynb
scRNA_single_modality.ipynb
multiomics_10x_single_cell.ipynb
multimodal_bulk_CLL.ipynb
implementation_and_parameters.ipynb
scMulti_multimodal.ipynb
changelog.md
contributing.md
conduct.md
Expand Down
File renamed without changes.
185 changes: 20 additions & 165 deletions docs/scMulti_multimodal.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@
"# MIDAA 101 (on 10X multiome)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10x Multiomics analysis\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
Expand Down Expand Up @@ -799,7 +806,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generating new data\n",
"## Generating new data\n",
"\n",
"But want I would like to show in this last part of the tutorial which is less straightforward to use is how we can use our model in a generative fashion to simulate synthetic data. The idea is that we specify some coordinates in the simplex and then we sample from teh corresponding Dirichlet distribution. For instance in this example here we will sample a dataste mostly composed by 2 archetypes"
]
Expand Down Expand Up @@ -906,7 +913,7 @@
},
{
"cell_type": "code",
"execution_count": 101,
"execution_count": 111,
"metadata": {},
"outputs": [
{
Expand All @@ -930,91 +937,38 @@
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>seqnames</th>\n",
" <th>start</th>\n",
" <th>end</th>\n",
" <th>width</th>\n",
" <th>strand</th>\n",
" <th>score</th>\n",
" <th>replicateScoreQuantile</th>\n",
" <th>groupScoreQuantile</th>\n",
" <th>Reproducibility</th>\n",
" <th>GroupReplicate</th>\n",
" <th>...</th>\n",
" <th>distToGeneStart</th>\n",
" <th>peakType</th>\n",
" <th>distToTSS</th>\n",
" <th>nearestTSS</th>\n",
" <th>GC</th>\n",
" <th>idx</th>\n",
" <th>count</th>\n",
" <th>selected</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>chr17:58281664-58282164</th>\n",
" <td>chr17</td>\n",
" <td>58281664</td>\n",
" <td>58282164</td>\n",
" <td>501</td>\n",
" <td>*</td>\n",
" <td>218.00801</td>\n",
" <td>0.988</td>\n",
" <td>0.979</td>\n",
" <td>2</td>\n",
" <td>C13._.cd34_multiome_rep1</td>\n",
" <td>...</td>\n",
" <td>12058</td>\n",
" <td>Promoter</td>\n",
" <td>978</td>\n",
" <td>uc061tab.1</td>\n",
" <td>0.5329</td>\n",
" <td>8061</td>\n",
" <td>1988.0</td>\n",
" <td>True</td>\n",
" <td>-0.009705</td>\n",
" <td>0.389574</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" seqnames start end width strand score \\\n",
"chr17:58281664-58282164 chr17 58281664 58282164 501 * 218.00801 \n",
"\n",
" replicateScoreQuantile groupScoreQuantile \\\n",
"chr17:58281664-58282164 0.988 0.979 \n",
"\n",
" Reproducibility GroupReplicate ... \\\n",
"chr17:58281664-58282164 2 C13._.cd34_multiome_rep1 ... \n",
"\n",
" distToGeneStart peakType distToTSS nearestTSS \\\n",
"chr17:58281664-58282164 12058 Promoter 978 uc061tab.1 \n",
"\n",
" GC idx count selected mean std \n",
"chr17:58281664-58282164 0.5329 8061 1988.0 True -0.009705 0.389574 \n",
"\n",
"[1 rows x 21 columns]"
" distToGeneStart peakType\n",
"chr17:58281664-58282164 12058 Promoter"
]
},
"execution_count": 101,
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We look at the gene promoters \n",
"ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MPO\"]"
"ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MPO\"][[\"distToGeneStart\", \"peakType\"]]"
]
},
{
"cell_type": "code",
"execution_count": 102,
"execution_count": 109,
"metadata": {},
"outputs": [
{
Expand All @@ -1038,143 +992,44 @@
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>seqnames</th>\n",
" <th>start</th>\n",
" <th>end</th>\n",
" <th>width</th>\n",
" <th>strand</th>\n",
" <th>score</th>\n",
" <th>replicateScoreQuantile</th>\n",
" <th>groupScoreQuantile</th>\n",
" <th>Reproducibility</th>\n",
" <th>GroupReplicate</th>\n",
" <th>...</th>\n",
" <th>distToGeneStart</th>\n",
" <th>peakType</th>\n",
" <th>distToTSS</th>\n",
" <th>nearestTSS</th>\n",
" <th>GC</th>\n",
" <th>idx</th>\n",
" <th>count</th>\n",
" <th>selected</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>chr2:66433317-66433817</th>\n",
" <td>chr2</td>\n",
" <td>66433317</td>\n",
" <td>66433817</td>\n",
" <td>501</td>\n",
" <td>*</td>\n",
" <td>31.32676</td>\n",
" <td>0.992</td>\n",
" <td>0.968</td>\n",
" <td>2</td>\n",
" <td>C12._.cd34_multiome_rep1</td>\n",
" <td>...</td>\n",
" <td>115</td>\n",
" <td>Promoter</td>\n",
" <td>96</td>\n",
" <td>uc057hgx.1</td>\n",
" <td>0.5868</td>\n",
" <td>6123</td>\n",
" <td>1997.0</td>\n",
" <td>True</td>\n",
" <td>-0.027297</td>\n",
" <td>0.135696</td>\n",
" </tr>\n",
" <tr>\n",
" <th>chr2:66434847-66435347</th>\n",
" <td>chr2</td>\n",
" <td>66434847</td>\n",
" <td>66435347</td>\n",
" <td>501</td>\n",
" <td>*</td>\n",
" <td>205.71082</td>\n",
" <td>0.989</td>\n",
" <td>0.979</td>\n",
" <td>2</td>\n",
" <td>C6._.cd34_multiome_rep1</td>\n",
" <td>...</td>\n",
" <td>1645</td>\n",
" <td>Promoter</td>\n",
" <td>35</td>\n",
" <td>uc057hgx.1</td>\n",
" <td>0.5449</td>\n",
" <td>6125</td>\n",
" <td>2707.0</td>\n",
" <td>True</td>\n",
" <td>-0.015174</td>\n",
" <td>0.114867</td>\n",
" </tr>\n",
" <tr>\n",
" <th>chr2:66481155-66481655</th>\n",
" <td>chr2</td>\n",
" <td>66481155</td>\n",
" <td>66481655</td>\n",
" <td>501</td>\n",
" <td>*</td>\n",
" <td>203.22998</td>\n",
" <td>0.988</td>\n",
" <td>0.978</td>\n",
" <td>2</td>\n",
" <td>C6._.cd34_multiome_rep1</td>\n",
" <td>...</td>\n",
" <td>47953</td>\n",
" <td>Intronic</td>\n",
" <td>14620</td>\n",
" <td>uc057hgx.1</td>\n",
" <td>0.4671</td>\n",
" <td>6142</td>\n",
" <td>2630.0</td>\n",
" <td>True</td>\n",
" <td>-0.018967</td>\n",
" <td>0.189116</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" seqnames start end width strand score \\\n",
"chr2:66433317-66433817 chr2 66433317 66433817 501 * 31.32676 \n",
"chr2:66434847-66435347 chr2 66434847 66435347 501 * 205.71082 \n",
"chr2:66481155-66481655 chr2 66481155 66481655 501 * 203.22998 \n",
"\n",
" replicateScoreQuantile groupScoreQuantile \\\n",
"chr2:66433317-66433817 0.992 0.968 \n",
"chr2:66434847-66435347 0.989 0.979 \n",
"chr2:66481155-66481655 0.988 0.978 \n",
"\n",
" Reproducibility GroupReplicate ... \\\n",
"chr2:66433317-66433817 2 C12._.cd34_multiome_rep1 ... \n",
"chr2:66434847-66435347 2 C6._.cd34_multiome_rep1 ... \n",
"chr2:66481155-66481655 2 C6._.cd34_multiome_rep1 ... \n",
"\n",
" distToGeneStart peakType distToTSS nearestTSS \\\n",
"chr2:66433317-66433817 115 Promoter 96 uc057hgx.1 \n",
"chr2:66434847-66435347 1645 Promoter 35 uc057hgx.1 \n",
"chr2:66481155-66481655 47953 Intronic 14620 uc057hgx.1 \n",
"\n",
" GC idx count selected mean std \n",
"chr2:66433317-66433817 0.5868 6123 1997.0 True -0.027297 0.135696 \n",
"chr2:66434847-66435347 0.5449 6125 2707.0 True -0.015174 0.114867 \n",
"chr2:66481155-66481655 0.4671 6142 2630.0 True -0.018967 0.189116 \n",
"\n",
"[3 rows x 21 columns]"
" distToGeneStart peakType\n",
"chr2:66433317-66433817 115 Promoter\n",
"chr2:66434847-66435347 1645 Promoter\n",
"chr2:66481155-66481655 47953 Intronic"
]
},
"execution_count": 102,
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MEIS1\"]"
"ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MEIS1\"][[\"distToGeneStart\", \"peakType\"]]"
]
},
{
Expand Down

0 comments on commit 3748925

Please sign in to comment.