fixed broken links into readme

sottorivalab · Apr 12, 2024 · 3748925 · 3748925
1 parent 6ebe35b
commit 3748925
Show file tree

Hide file tree

Showing 5 changed files with 31 additions and 179 deletions.
diff --git a/README.md b/README.md
@@ -2,10 +2,9 @@
 
 MIDAA is a package designed for performing Deep Archetypal Analysis on multiomics data. The documentation can be find here [https://sottorivalab.github.io/midaa/](https://sottorivalab.github.io/midaa/)
 
-####  *The package is under active development, expect breaking changes (we just changed the tool name ;) ) and incomplete documentat for a bit*
-#### *I'll try my best to speed this up, if something is broken or you need help please open an issue, dontt be shy!*
-<br />
-
+*The package is under active development, expect breaking changes (we just changed the tool name ;) ) and incomplete documentation for a bit*
+*I'll try my best to speed this up, if something is broken or you need help please open an issue, do not be shy!*
+<br/><br/>
 <img src="https://github.com/sottorivalab/daario/blob/69f8399cadfcb10ba1bc483cd4405b823efda64c/logo.png?raw=true" width="200px" align="left">
 
 
@@ -19,7 +18,7 @@ git clone https://github.com/sottorivalab/midaa.git
 # you need poetry installed
 poetry install 
 ```
-<br />
+<br/><br/>
 
 
 ## Quick Start
@@ -36,9 +35,9 @@ midaa leans the matrices $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{Z}$ in an amort
 The network is implemented in a Variational Autoencdoer fashion, so we have and encoding and decoding function as well as probabilistic definition of the matrix factorization problem above.
 Both the encoder and the decoder have a shared portion where data fusion occurs and an independent piece where modality specific encoding and decoding takes place.
 
-If you are happy with that we have some cool tutorials that will show you how to use MIDAA on real [multi-omics data](https://sottorivalab.github.io/midaa/scMulti_multimodal.ipynb).
+If you are happy with that we have some cool tutorials that will show you how to use MIDAA on real [multi-omics data](https://sottorivalab.github.io/daario/scMulti_multimodal.ipynb).
 
-Otherwise, the best way to start is to read [this](https://sottorivalab.github.io/midaa/midaa_long_form.html) or the companion [paper](https://www.biorxiv.org/content/10.1101/2024.04.05.588238v1) and understand what MIDAA actually does in details and what are the [parameters](https://sottorivalab.github.io/midaa/implementation_and_parameters.ipynb) you can play with.
+Otherwise, the best way to start is to read [this](https://sottorivalab.github.io/daario/midaa_long_form.html) or the companion [paper](https://www.biorxiv.org/content/10.1101/2024.04.05.588238v1) and understand what MIDAA actually does in details and what are the [parameters](https://sottorivalab.github.io/daario/implementation_and_parameters.ipynb) you can play with.
 
 
 A minimal example to run the tool:

diff --git a/docs/implementation_and_parameters.ipynb b/docs/implementation_and_parameters.ipynb
@@ -82,7 +82,7 @@
    "source": [
     "Let me also introduce the 4 main parameters of MIDAA:\n",
     "* The input data, it should be provided as a list of numpy arrays, one for each modality\n",
-    "* A normalization factor, this is especially useful when you work with raw counts. The normalization factors are modality specific and are applied before computing the likelihood. For instance if we call $\\beta$ the output of the last layer of the decoder and the normalization factors as $\\nu$$ and our likelihood of choice is Poisson then the rate of the Poisson is gonna be computed as $exp(\\beta) * \\nu$.  \n",
+    "* A normalization factor, this is especially useful when you work with raw counts. The normalization factors are modality specific and are applied before computing the likelihood. For instance if we call $\\beta$ the output of the last layer of the decoder and the normalization factors as $\\nu$ and our likelihood of choice is Poisson then the rate of the Poisson is gonna be computed as $exp(\\beta) * \\nu$.  \n",
     "*  The likelihood used to compute the reconstruction loss of the data, we currently support: Gaussian (G), Poisson (P), Negative Binomial (NB), Categorical (C), Bernoulli (B), Beta (Beta). Again likelihoods are modality specific and are list of strings\n",
     "*  The number of archetypes to fit"
    ]
@@ -104,7 +104,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Training Parameters\n",
+    "## Training Parameters\n",
     "\n",
     "The two main parameters you can change are the number of steps and the learning rate. In our simulations we find that learning rates around 1e-3 and 1e-4 work well. Regarding the number of steps, they really depend on the problem but generally >500 is enough to get a decent model. We just want to highlight that for us a step menas a complete epoch, so the number of actual gradient iterations will be dependent on the batch size and number of samples."
    ]
@@ -383,7 +383,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Network parameters "
+    "## Network parameters "
    ]
   },
   {

diff --git a/docs/index.md b/docs/index.md
@@ -6,10 +6,8 @@
 :hidden:
 
 midaa_long_form.ipynb
-scRNA_single_modality.ipynb
-multiomics_10x_single_cell.ipynb
-multimodal_bulk_CLL.ipynb
-
+implementation_and_parameters.ipynb
+scMulti_multimodal.ipynb
 changelog.md
 contributing.md
 conduct.md

diff --git a/docs/daario_long_form.ipynb → docs/midaa_long_form.ipynb b/docs/daario_long_form.ipynb → docs/midaa_long_form.ipynb
diff --git a/docs/scMulti_multimodal.ipynb b/docs/scMulti_multimodal.ipynb
@@ -8,6 +8,13 @@
     "# MIDAA 101 (on 10X multiome)\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10x Multiomics analysis\n"
+   ]
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -799,7 +806,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Generating new data\n",
+    "## Generating new data\n",
     "\n",
     "But want I would like to show in this last part of the tutorial which is less straightforward to use is how we can use our model in a generative fashion to simulate synthetic data. The idea is that we specify some coordinates in the simplex and then we sample from teh corresponding Dirichlet distribution. For instance in this example here we will sample a dataste mostly composed by 2 archetypes"
    ]
@@ -906,7 +913,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 101,
+   "execution_count": 111,
    "metadata": {},
    "outputs": [
     {
@@ -930,91 +937,38 @@
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
-       "      <th>seqnames</th>\n",
-       "      <th>start</th>\n",
-       "      <th>end</th>\n",
-       "      <th>width</th>\n",
-       "      <th>strand</th>\n",
-       "      <th>score</th>\n",
-       "      <th>replicateScoreQuantile</th>\n",
-       "      <th>groupScoreQuantile</th>\n",
-       "      <th>Reproducibility</th>\n",
-       "      <th>GroupReplicate</th>\n",
-       "      <th>...</th>\n",
        "      <th>distToGeneStart</th>\n",
        "      <th>peakType</th>\n",
-       "      <th>distToTSS</th>\n",
-       "      <th>nearestTSS</th>\n",
-       "      <th>GC</th>\n",
-       "      <th>idx</th>\n",
-       "      <th>count</th>\n",
-       "      <th>selected</th>\n",
-       "      <th>mean</th>\n",
-       "      <th>std</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>chr17:58281664-58282164</th>\n",
-       "      <td>chr17</td>\n",
-       "      <td>58281664</td>\n",
-       "      <td>58282164</td>\n",
-       "      <td>501</td>\n",
-       "      <td>*</td>\n",
-       "      <td>218.00801</td>\n",
-       "      <td>0.988</td>\n",
-       "      <td>0.979</td>\n",
-       "      <td>2</td>\n",
-       "      <td>C13._.cd34_multiome_rep1</td>\n",
-       "      <td>...</td>\n",
        "      <td>12058</td>\n",
        "      <td>Promoter</td>\n",
-       "      <td>978</td>\n",
-       "      <td>uc061tab.1</td>\n",
-       "      <td>0.5329</td>\n",
-       "      <td>8061</td>\n",
-       "      <td>1988.0</td>\n",
-       "      <td>True</td>\n",
-       "      <td>-0.009705</td>\n",
-       "      <td>0.389574</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
-       "<p>1 rows × 21 columns</p>\n",
        "</div>"
       ],
       "text/plain": [
-       "                        seqnames     start       end  width strand      score  \\\n",
-       "chr17:58281664-58282164    chr17  58281664  58282164    501      *  218.00801   \n",
-       "\n",
-       "                         replicateScoreQuantile  groupScoreQuantile  \\\n",
-       "chr17:58281664-58282164                   0.988               0.979   \n",
-       "\n",
-       "                         Reproducibility            GroupReplicate  ...  \\\n",
-       "chr17:58281664-58282164                2  C13._.cd34_multiome_rep1  ...   \n",
-       "\n",
-       "                        distToGeneStart  peakType distToTSS  nearestTSS  \\\n",
-       "chr17:58281664-58282164           12058  Promoter       978  uc061tab.1   \n",
-       "\n",
-       "                             GC   idx   count  selected      mean       std  \n",
-       "chr17:58281664-58282164  0.5329  8061  1988.0      True -0.009705  0.389574  \n",
-       "\n",
-       "[1 rows x 21 columns]"
+       "                         distToGeneStart  peakType\n",
+       "chr17:58281664-58282164            12058  Promoter"
       ]
      },
-     "execution_count": 101,
+     "execution_count": 111,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "# We look at the gene promoters \n",
-    "ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MPO\"]"
+    "ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MPO\"][[\"distToGeneStart\", \"peakType\"]]"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 102,
+   "execution_count": 109,
    "metadata": {},
    "outputs": [
     {
@@ -1038,143 +992,44 @@
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
-       "      <th>seqnames</th>\n",
-       "      <th>start</th>\n",
-       "      <th>end</th>\n",
-       "      <th>width</th>\n",
-       "      <th>strand</th>\n",
-       "      <th>score</th>\n",
-       "      <th>replicateScoreQuantile</th>\n",
-       "      <th>groupScoreQuantile</th>\n",
-       "      <th>Reproducibility</th>\n",
-       "      <th>GroupReplicate</th>\n",
-       "      <th>...</th>\n",
        "      <th>distToGeneStart</th>\n",
        "      <th>peakType</th>\n",
-       "      <th>distToTSS</th>\n",
-       "      <th>nearestTSS</th>\n",
-       "      <th>GC</th>\n",
-       "      <th>idx</th>\n",
-       "      <th>count</th>\n",
-       "      <th>selected</th>\n",
-       "      <th>mean</th>\n",
-       "      <th>std</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>chr2:66433317-66433817</th>\n",
-       "      <td>chr2</td>\n",
-       "      <td>66433317</td>\n",
-       "      <td>66433817</td>\n",
-       "      <td>501</td>\n",
-       "      <td>*</td>\n",
-       "      <td>31.32676</td>\n",
-       "      <td>0.992</td>\n",
-       "      <td>0.968</td>\n",
-       "      <td>2</td>\n",
-       "      <td>C12._.cd34_multiome_rep1</td>\n",
-       "      <td>...</td>\n",
        "      <td>115</td>\n",
        "      <td>Promoter</td>\n",
-       "      <td>96</td>\n",
-       "      <td>uc057hgx.1</td>\n",
-       "      <td>0.5868</td>\n",
-       "      <td>6123</td>\n",
-       "      <td>1997.0</td>\n",
-       "      <td>True</td>\n",
-       "      <td>-0.027297</td>\n",
-       "      <td>0.135696</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>chr2:66434847-66435347</th>\n",
-       "      <td>chr2</td>\n",
-       "      <td>66434847</td>\n",
-       "      <td>66435347</td>\n",
-       "      <td>501</td>\n",
-       "      <td>*</td>\n",
-       "      <td>205.71082</td>\n",
-       "      <td>0.989</td>\n",
-       "      <td>0.979</td>\n",
-       "      <td>2</td>\n",
-       "      <td>C6._.cd34_multiome_rep1</td>\n",
-       "      <td>...</td>\n",
        "      <td>1645</td>\n",
        "      <td>Promoter</td>\n",
-       "      <td>35</td>\n",
-       "      <td>uc057hgx.1</td>\n",
-       "      <td>0.5449</td>\n",
-       "      <td>6125</td>\n",
-       "      <td>2707.0</td>\n",
-       "      <td>True</td>\n",
-       "      <td>-0.015174</td>\n",
-       "      <td>0.114867</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>chr2:66481155-66481655</th>\n",
-       "      <td>chr2</td>\n",
-       "      <td>66481155</td>\n",
-       "      <td>66481655</td>\n",
-       "      <td>501</td>\n",
-       "      <td>*</td>\n",
-       "      <td>203.22998</td>\n",
-       "      <td>0.988</td>\n",
-       "      <td>0.978</td>\n",
-       "      <td>2</td>\n",
-       "      <td>C6._.cd34_multiome_rep1</td>\n",
-       "      <td>...</td>\n",
        "      <td>47953</td>\n",
        "      <td>Intronic</td>\n",
-       "      <td>14620</td>\n",
-       "      <td>uc057hgx.1</td>\n",
-       "      <td>0.4671</td>\n",
-       "      <td>6142</td>\n",
-       "      <td>2630.0</td>\n",
-       "      <td>True</td>\n",
-       "      <td>-0.018967</td>\n",
-       "      <td>0.189116</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
-       "<p>3 rows × 21 columns</p>\n",
        "</div>"
       ],
       "text/plain": [
-       "                       seqnames     start       end  width strand      score  \\\n",
-       "chr2:66433317-66433817     chr2  66433317  66433817    501      *   31.32676   \n",
-       "chr2:66434847-66435347     chr2  66434847  66435347    501      *  205.71082   \n",
-       "chr2:66481155-66481655     chr2  66481155  66481655    501      *  203.22998   \n",
-       "\n",
-       "                        replicateScoreQuantile  groupScoreQuantile  \\\n",
-       "chr2:66433317-66433817                   0.992               0.968   \n",
-       "chr2:66434847-66435347                   0.989               0.979   \n",
-       "chr2:66481155-66481655                   0.988               0.978   \n",
-       "\n",
-       "                        Reproducibility            GroupReplicate  ...  \\\n",
-       "chr2:66433317-66433817                2  C12._.cd34_multiome_rep1  ...   \n",
-       "chr2:66434847-66435347                2   C6._.cd34_multiome_rep1  ...   \n",
-       "chr2:66481155-66481655                2   C6._.cd34_multiome_rep1  ...   \n",
-       "\n",
-       "                       distToGeneStart  peakType distToTSS  nearestTSS  \\\n",
-       "chr2:66433317-66433817             115  Promoter        96  uc057hgx.1   \n",
-       "chr2:66434847-66435347            1645  Promoter        35  uc057hgx.1   \n",
-       "chr2:66481155-66481655           47953  Intronic     14620  uc057hgx.1   \n",
-       "\n",
-       "                            GC   idx   count  selected      mean       std  \n",
-       "chr2:66433317-66433817  0.5868  6123  1997.0      True -0.027297  0.135696  \n",
-       "chr2:66434847-66435347  0.5449  6125  2707.0      True -0.015174  0.114867  \n",
-       "chr2:66481155-66481655  0.4671  6142  2630.0      True -0.018967  0.189116  \n",
-       "\n",
-       "[3 rows x 21 columns]"
+       "                        distToGeneStart  peakType\n",
+       "chr2:66433317-66433817              115  Promoter\n",
+       "chr2:66434847-66435347             1645  Promoter\n",
+       "chr2:66481155-66481655            47953  Intronic"
       ]
      },
-     "execution_count": 102,
+     "execution_count": 109,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MEIS1\"]"
+    "ad_atac_hv.var[ad_atac_hv.var[\"nearestGene\"] == \"MEIS1\"][[\"distToGeneStart\", \"peakType\"]]"
    ]
   },
   {