Skip to content

Commit

Permalink
documentation in progress
Browse files Browse the repository at this point in the history
  • Loading branch information
Militeee committed Apr 10, 2024
1 parent 1625e69 commit 21f2e72
Show file tree
Hide file tree
Showing 10 changed files with 663 additions and 77 deletions.
32 changes: 25 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Deep Archetypal Analysis for Representation and Learning of Omics data (DAARIO)

DARRIO is a package designed for performing Deep Archetypal Analysis on multiomics data. This package is designed to help researchers, bioinformaticians, and data scientists uncover the hidden archetypes in complex, high-dimensional multiomics datasets.
DARRIO is a package designed for performing Deep Archetypal Analysis on multiomics data. The documentation can be find here [https://sottorivalab.github.io/daario/](https://sottorivalab.github.io/daario/)

<img src="https://github.com/sottorivalab/daario/logo.png" width="400px" align="left">

*The package is under active development, expect breaking changes and incomplete documentat for a bit*
*I'll try my best to speed this up, if something is broken or you need help please open an issue, dontt be shy!*
Expand All @@ -20,18 +22,19 @@ poetry install

DAARIO encodes you multi-modal data into a latent simplex:

```{math}
$$
\mathbf{Z^*} = \mathbf{A} \mathbf{B} \mathbf{Z}
```
$$


DAARIO leans the matrices {math}`\mathbf{A}`, {math}`\mathbf{B}` and {math}`\mathbf{Z}` in an amortized fashion, namely we learn a function that takes in input the different data modalities {math}`\mathbf{X_g}` indexed by {math}`g` and learns the 3 matrices. As you could have got from the name, we parametrize the function as a neural network. The network is implemented in a Variational Autoencdoer fashion, so we have and encoding and decoding function as well as probabilistic definition of the matrix factorization problem above.
DAARIO leans the matrices $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{Z}$ in an amortized fashion, namely we learn a function that takes in input the different data modalities $\mathbf{X_g}$ indexed by $g` and learns the 3 matrices. As you could have got from the name, we parametrize the function as a neural network.
The network is implemented in a Variational Autoencdoer fashion, so we have and encoding and decoding function as well as probabilistic definition of the matrix factorization problem above.
Both the encoder and the decoder have a shared portion where data fusion occurs and an independent piece where modality specific encoding and decoding takes place.


If you are happy with that we have some cool tutorials that will show you how to use DAARIO on real data.
If you are happy with that we have some cool tutorials that will show you how to use DAARIO on real [single modality](https://sottorivalab.github.io/daario/scRNA_single_modality.html) and [multimodal](https://sottorivalab.github.io/daario/scMulti_multimodal.ipynb) data.

Otherwise the best way to start is to understand what DAARIO actually does in details and what are the parameters you can play with.
Otherwise the best way to start is to read [this](https://sottorivalab.github.io/daario/daario_long_form.html) or the companion [paper](https://www.biorxiv.org/content/10.1101/2024.04.05.588238v1) and understand what DAARIO actually does in details and what are the [parameters](https://sottorivalab.github.io/daario/implementation_and_parameters.ipynb) you can play with.


A minimal example to run the tool:
Expand Down Expand Up @@ -60,7 +63,7 @@ aa_result = daa.fit_deepAA(

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

## ToDOs (slow but steady) :chart_with_upwards_trend:
## ToDOs (slow but steady): 🔨

- [ ] Final API Documnetation
- [ ] Tutorial math on AA
Expand All @@ -71,6 +74,21 @@ Interested in contributing? Check out the contributing guidelines. Please note t
- [ ] Provide some module builders
- [ ] Test batch/covariate correction in latent space

## Citation

If you have used DAARIO in your research, consider citing:
```{bibtex}
@article {milite2024,
author = {Salvatore Milite and Giulio Caravagna and Andrea Sottoriva},
title = {Interpretable Multi-Omics Data Integration with Deep Archetypal Analysis},
year = {2024},
doi = {10.1101/2024.04.05.588238},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/04/09/2024.04.05.588238},
journal = {bioRxiv}
}
```

## License

`daario` was created by Salvatore Milite. It is licensed under the terms of the MIT license.
Expand Down
12 changes: 10 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@

nb_execution_mode = "off"


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
Expand All @@ -23,7 +22,7 @@
"autoapi.extension",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx.ext.githubpages",
"sphinx.ext.githubpages"
]
autoapi_dirs = ["../src"]

Expand All @@ -38,3 +37,12 @@
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"


myst_enable_extensions = [
"amsmath",
"colon_fence",
"deflist",
"dollarmath",
"html_image"
]
400 changes: 394 additions & 6 deletions docs/daario_long_form.ipynb

Large diffs are not rendered by default.

38 changes: 38 additions & 0 deletions docs/implementation_and_parameters.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Finding your way in DAARIOs interface"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The idea with which we conceived DAARIO was to give great flexibility in structuring the network and tuning most of the hyperparameters both at the level of architecture and inference. The very definition of the training interface is actually quite scary. But don't despair this notebook is for exactly what knobs move what. As nice as it is to have easy-to-use tools with few parameters, I am convinced that knowing exactly what you are running in great detail allows you to get better results and (maybe) learn something new.\n",
"\n",
"Let us begin with a brief idea of how the package is structured: \n",
"* First we have an interface function that allows us to do an entire training cycle at fixed parameters and takes care of almost everything.\n",
"* The probabilistic model is defined in Pyro and has a model function that describes the generative process and a driving function that describes the variational distributions for inference\n",
"* The two most important parts of the model, i.e., the decoder and encoder are implemented as modules of PyTorch \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
227 changes: 167 additions & 60 deletions docs/scRNA_single_modality.ipynb

Large diffs are not rendered by default.

Binary file added logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions notebooks/test_encoder_decoder.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -895,9 +895,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:scdeepaa]",
"display_name": "scdeepaa",
"language": "python",
"name": "conda-env-scdeepaa-py"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand Down
19 changes: 19 additions & 0 deletions scMulti_multimodal.ipynb.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
1 change: 1 addition & 0 deletions src/daario/Interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,7 @@ def handle_model_matrix(params_run, model_matrix, reconstruct_input_and_side, A,
return params_run

def calculate_loss(params_run, fix_Z, deepAA, input_matrix, model_matrix, normalization_factor, side_matrices, loss_weights_reconstruction, loss_weights_side):

if fix_Z:
input_loss, side_loss, weights_reconstruction, weights_side, input_loss_no_reg, side_loss_no_reg, total_loss, Z_loss_no_reg, Z_loss = deepAA.model(input_matrix,
model_matrix, normalization_factor, side_matrices, loss_weights_reconstruction, loss_weights_side)
Expand Down
7 changes: 7 additions & 0 deletions src/daario/Utils_net.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
import math


def build_multimodal_linear_encoder():
pass

def build_multimodal_linear_decoder():
pass


def build_conv_layer(in_channels, out_channels, flatten = False, kernel_size=3, stride=1, padding=1, pool_size=2, pool_stride=2):
if flatten:
layer = nn.Sequential(
Expand Down

0 comments on commit 21f2e72

Please sign in to comment.