documentation in progress

sottorivalab · Apr 10, 2024 · 21f2e72 · 21f2e72
1 parent 1625e69
commit 21f2e72
Show file tree

Hide file tree

Showing 10 changed files with 663 additions and 77 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,8 @@
 # Deep Archetypal Analysis for Representation and Learning of Omics data (DAARIO)
 
-DARRIO is a package designed for performing Deep Archetypal Analysis on multiomics data. This package is designed to help researchers, bioinformaticians, and data scientists uncover the hidden archetypes in complex, high-dimensional multiomics datasets.
+DARRIO is a package designed for performing Deep Archetypal Analysis on multiomics data. The documentation can be find here [https://sottorivalab.github.io/daario/](https://sottorivalab.github.io/daario/)
+
+<img src="https://github.com/sottorivalab/daario/logo.png" width="400px" align="left">
 
 *The package is under active development, expect breaking changes and incomplete documentat for a bit*
 *I'll try my best to speed this up, if something is broken or you need help please open an issue, dontt be shy!*
@@ -20,18 +22,19 @@ poetry install
 
 DAARIO encodes you multi-modal data into a latent simplex: 
 
-```{math}
+$$
 \mathbf{Z^*} =   \mathbf{A}  \mathbf{B}  \mathbf{Z} 
-```
+$$
 
 
-DAARIO leans the matrices {math}`\mathbf{A}`, {math}`\mathbf{B}` and {math}`\mathbf{Z}` in an amortized fashion, namely we learn a function that takes in input the different data modalities {math}`\mathbf{X_g}` indexed by {math}`g` and learns the 3 matrices. As you could have got from the name, we parametrize the function as a neural network. The network is implemented in a Variational Autoencdoer fashion, so we have and encoding and decoding function as well as probabilistic definition of the matrix factorization problem above.
+DAARIO leans the matrices $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{Z}$ in an amortized fashion, namely we learn a function that takes in input the different data modalities $\mathbf{X_g}$ indexed by $g` and learns the 3 matrices. As you could have got from the name, we parametrize the function as a neural network. 
+The network is implemented in a Variational Autoencdoer fashion, so we have and encoding and decoding function as well as probabilistic definition of the matrix factorization problem above.
 Both the encoder and the decoder have a shared portion where data fusion occurs and an independent piece where modality specific encoding and decoding takes place.
 
 
-If you are happy with that we have some cool tutorials that will show you how to use DAARIO on real data.
+If you are happy with that we have some cool tutorials that will show you how to use DAARIO on real [single modality](https://sottorivalab.github.io/daario/scRNA_single_modality.html) and [multimodal](https://sottorivalab.github.io/daario/scMulti_multimodal.ipynb) data.
 
-Otherwise the best way to start is to understand what DAARIO actually does in details and what are the parameters you can play with.
+Otherwise the best way to start is to read [this](https://sottorivalab.github.io/daario/daario_long_form.html) or the companion [paper](https://www.biorxiv.org/content/10.1101/2024.04.05.588238v1) and understand what DAARIO actually does in details and what are the [parameters](https://sottorivalab.github.io/daario/implementation_and_parameters.ipynb) you can play with.
 
 
 A minimal example to run the tool:
@@ -60,7 +63,7 @@ aa_result = daa.fit_deepAA(
 
 Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
 
-## ToDOs  (slow but steady) :chart_with_upwards_trend:
+## ToDOs  (slow but steady):  🔨
 
 - [ ] Final API Documnetation
 - [ ] Tutorial math on AA
@@ -71,6 +74,21 @@ Interested in contributing? Check out the contributing guidelines. Please note t
 - [ ] Provide some module builders
 - [ ] Test batch/covariate correction in latent space 
 
+## Citation 
+
+If you have used DAARIO in your research, consider citing:
+```{bibtex}
+@article {milite2024,
+	author = {Salvatore Milite and Giulio Caravagna and Andrea Sottoriva},
+	title = {Interpretable Multi-Omics Data Integration with Deep Archetypal Analysis},
+	year = {2024},
+	doi = {10.1101/2024.04.05.588238},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/early/2024/04/09/2024.04.05.588238},
+	journal = {bioRxiv}
+}
+```
+
 ## License
 
 `daario` was created by Salvatore Milite. It is licensed under the terms of the MIT license.

diff --git a/docs/conf.py b/docs/conf.py
@@ -12,7 +12,6 @@
 
 nb_execution_mode = "off"
 
-
 # -- General configuration ---------------------------------------------------
 
 # Add any Sphinx extension module names here, as strings. They can be
@@ -23,7 +22,7 @@
     "autoapi.extension",
     "sphinx.ext.napoleon",
     "sphinx.ext.viewcode",
-    "sphinx.ext.githubpages",
+    "sphinx.ext.githubpages"
 ]
 autoapi_dirs = ["../src"]
 
@@ -38,3 +37,12 @@
 # a list of builtin themes.
 #
 html_theme = "sphinx_rtd_theme"
+
+
+myst_enable_extensions = [
+    "amsmath",
+    "colon_fence",
+    "deflist",
+    "dollarmath",
+    "html_image"
+]
diff --git a/docs/daario_long_form.ipynb b/docs/daario_long_form.ipynb
diff --git a/docs/implementation_and_parameters.ipynb b/docs/implementation_and_parameters.ipynb
@@ -0,0 +1,38 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Finding your way in DAARIOs interface"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The idea with which we conceived DAARIO was to give great flexibility in structuring the network and tuning most of the hyperparameters both at the level of architecture and inference. The very definition of the training interface is actually quite scary. But don't despair this notebook is for exactly what knobs move what. As nice as it is to have easy-to-use tools with few parameters, I am convinced that knowing exactly what you are running in great detail allows you to get better results and (maybe) learn something new.\n",
+    "\n",
+    "Let us begin with a brief idea of how the package is structured: \n",
+    "* First we have an interface function that allows us to do an entire training cycle at fixed parameters and takes care of almost everything.\n",
+    "* The probabilistic model is defined in Pyro and has a model function that describes the generative process and a driving function that describes the variational distributions for inference\n",
+    "* The two most important parts of the model, i.e., the decoder and encoder are implemented as modules of PyTorch \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/scRNA_single_modality.ipynb b/docs/scRNA_single_modality.ipynb
diff --git a/logo.png b/logo.png
diff --git a/notebooks/test_encoder_decoder.ipynb b/notebooks/test_encoder_decoder.ipynb
@@ -895,9 +895,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python [conda env:scdeepaa]",
+   "display_name": "scdeepaa",
    "language": "python",
-   "name": "conda-env-scdeepaa-py"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {

diff --git a/scMulti_multimodal.ipynb.ipynb b/scMulti_multimodal.ipynb.ipynb
@@ -0,0 +1,19 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/src/daario/Interface.py b/src/daario/Interface.py
@@ -469,6 +469,7 @@ def handle_model_matrix(params_run, model_matrix, reconstruct_input_and_side, A,
     return params_run
 
 def calculate_loss(params_run, fix_Z, deepAA, input_matrix, model_matrix, normalization_factor, side_matrices, loss_weights_reconstruction, loss_weights_side):
+
     if fix_Z:
         input_loss, side_loss, weights_reconstruction, weights_side, input_loss_no_reg, side_loss_no_reg, total_loss, Z_loss_no_reg, Z_loss = deepAA.model(input_matrix, 
             model_matrix, normalization_factor, side_matrices, loss_weights_reconstruction, loss_weights_side)

diff --git a/src/daario/Utils_net.py b/src/daario/Utils_net.py
@@ -3,6 +3,13 @@
 import math
 
 
+def build_multimodal_linear_encoder():
+    pass
+
+def build_multimodal_linear_decoder():
+    pass
+
+
 def build_conv_layer(in_channels, out_channels, flatten = False, kernel_size=3, stride=1, padding=1, pool_size=2, pool_stride=2):
     if flatten:
         layer = nn.Sequential(