Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: various updates (rebase) #90

Merged
merged 5 commits into from
Aug 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 19 additions & 9 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ supercharges it by providing the following features:
!!! tip "Manage _ZARP_ run data and resources in one central, configurable location"

Once _ZARP-cli_ is [installed](guides/installation.md) and
[configured](guides/initialization.md), you may be able to _ZARP_ an RNA-Seq library
with a command like this:
[configured](guides/initialization.md), you may be able to _ZARP_ an RNA-Seq
library with a command like this:

```bash
zarp SRA1234567
Expand All @@ -34,13 +34,19 @@ zarp SRA1234567

## How does it work?

Briefly, when a _ZARP-cli_ run is triggered, a [_ZARP-cli_ configuration
> _"Any sufficiently advanced technology is indistinguishable from magic."_
> — Arthur C. Clarke

At the risk of demystifying the magic, let's take a look at how _ZARP-cli_
works:

Briefly, when the program is triggered, a [_ZARP-cli_ configuration
object](docstring/config.models.md#Config) is constructed from parsing [default
configuration
settings](guides/initialization.md#modifying-configuration-settings) and
[command-line options](guides/usage.md#command-line-options). A user-specified
list of [sample references](guides/usage.md#sample-references) of various
supported types is then attached to the configuration object and dereferenced
supported types is then attached to the configuration object and de-referenced
to construct a (potentially) sparse data frame of sample metadata. If
necessary, this data frame of samples is then successively completed by
applying various sample processor plugins that are built on tools such as
Expand All @@ -60,15 +66,19 @@ information is available to start a _ZARP_ run, the sample will be analyzed.

## How to cite


If you use _ZARP_ in your work (with or without _ZARP-cli_), please kindly cite
the following article:

**ZARP: An automated workflow for processing of RNA-seq data**
**ZARP: A user-friendly and versatile RNA-seq analysis workflow**
_Maria Katsantoni, Foivos Gypas, Christina J. Herrmann, Dominik Burri, Maciej
Bak, Paula Iborra, Krish Agarwal, Meric Ataman, Anastasiya Börsch, Mihaela
Zavolan, Alexander Kanitz_
bioRxiv 2021.11.18.469017
<https://doi.org/10.1101/2021.11.18.469017>
Bak, Paula Iborra, Krish Agarwal, Meric Ataman, Máté Balajti, Noè Pozzan, Niels
Schlusser, Youngbin Moon, Aleksei Mironov, Anastasiya Börsch, Mihaela Zavolan,
Alexander Kanitz_
**F1000Research 2024, 13:533**
<https://doi.org/10.12688/f1000research.149237.1>

[Download BibTeX citation :download: ](https://f1000research.com/articles/exportTo?versionId=163676&bibliographyReaderFormat=BIBTEX){ .md-button }

## Training materials

Expand Down
36 changes: 33 additions & 3 deletions docs/guides/examples.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,36 @@
# Examples

!!! warning "Under construction"
This section provides a growing collection of examples that demonstrate how to
use _ZARP-cli_ in various scenarios.

This section is under preparation and will list a number of real-world,
fully functional examples. Please stay tuned.
!!! info "Prerequisites"

The examples below assume that you have already [installed](./installation.md) and [initialized](./initialization.md) _ZARP-cli.

## Process samples deposited to SRA

Let's have ZARP-cli fetch two samples from SRA, infer all necessary metadata,
fetch the corresponding genome annotations and start a _ZARP_ workflow run on
them:

```sh
zarp SRR23590181 SRR23529108
```

??? tip "I want to verify the inferred metadata first!"

Set the `--execution-mode` parameter to `PREPARE_RUN` to run _ZARP-cli_ up
until the point of the actual _ZARP_ workflow execution:

```sh
zarp --execution-mode=PREPARE_RUN SRR23590181 SRR23529108
```

!!! info "More please!"

You will find a description of more elaborate use cases in the [ZARP
publication](https://doi.org/10.12688/f1000research.149237.1) and the
accompanying [supplementary materials][zarp-supplementary], published on
[Zenodo][zenodo]. The latter include detailed instructions, all necessary
input files and a selection of reference output files to validate your runs
against.
16 changes: 8 additions & 8 deletions docs/guides/initialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ The following simple command triggers the _ZARP-cli_ initialization mode:
zarp --init
```

An interactive screen will guide you through the process. Read
[on](#configuration-options) to find out more about what each of the available
An interactive screen will guide you through the process. [Read
on](#configuration-options) to find out more about what each of the available
options and suggested defaults mean.

??? question "Where is the configuration stored?"
Expand All @@ -39,14 +39,14 @@ The following configuration options are available.
| ------ | ----------- | ------- |
| `working_directory` | Root directory for _ZARP-cli_ runs; needs to be writable | `$HOME/.zarp` |
| `zarp_directory` | Path to the local copy of the [ZARP workflow repository][zarp] | `../zarp` relative to the location of the ZARP-cli repository |
| `execution_mode` | Trigger a full _ZARP-cli_ run (`RUN`), a dry run (`DRY_RUN`; external tools are not actually run, only logs what _would be_ run; useful for testing) or prepare a _ZARP_ run (`PREPARE_RUN`; _ZARP-cli_ is run normally, including all external tools, up until the point of the execution of the actual _ZARP_ workflow; use to manually check metadata table before execution) | `RUN` |
| `execution_mode` | Trigger a full _ZARP-cli_ run (`RUN`), a dry run (`DRY_RUN`; external tools are not actually run, only logs what _would be_ run; useful for testing) or prepare a _ZARP_ run (`PREPARE_RUN`; _ZARP-cli_ is run normally, including all external tools, up until the point of the execution of the actual _ZARP_ workflow; use to manually check metadata table before _ZARP_ execution) | `RUN` |
| `cores` | Number of CPU cores that Snakemake is run with when executing _ZARP_ and the auxiliary workflows (fetching libraries from [SRA][sra], inferring metadata) | `1` |
| `dependency_embedding` | Whether Snakemake should use `CONDA` or containers (`SINGULARITY`) to manage dependencies of each workflow step/rule (note that the auxiliary workflows currently have restrictions on which dependency embedding strategy can be used; if an unsupported scheme is suggested, a warning is emitted and the other one is enabled by default) | `CONDA` |
| `genome_assemblies_map` | A headerless 3-column semicolon-separated mapping table of organism/source trivial names (e.g., `homo_sapiens`), optional comma-separated aliases such as NCBI taxon IDs and/or organism/source short names (e.g., `7227,dmelanogaster`) and a corresponding genome assembly name (e.g., `GRCm39`); a table in the required format is shipped with _ZARP_cli_ in the location provided in the default location; which can be amended with additional aliases; note that for [`genomepy`][genomepy] to be able to pull genome annotations for organisms/sources that [HTSinfer][htsinfer] inferred, NCBI taxon ID aliases are _required_ | `./data/genome_assemblies.map` relative to the location of the ZARP-cli repository |
| `dependency_embedding` | Whether Snakemake should use `CONDA` or containers (`SINGULARITY`) to manage dependencies of each workflow step/rule | `CONDA` |
| `genome_assemblies_map` | A headerless 3-column semicolon-separated mapping table of organism/source trivial names (e.g., `homo_sapiens`), optional comma-separated aliases such as NCBI taxon IDs and/or organism/source short names (e.g., `7227,dmelanogaster`) and a corresponding genome assembly name (e.g., `GRCm39`); a table in the required format is shipped with _ZARP_cli_ in the the default location; it can be amended with additional aliases; note that for [`genomepy`][genomepy] to be able to pull genome annotations for organisms/sources that [HTSinfer][htsinfer] inferred, NCBI taxon ID aliases are _required_ | `./data/genome_assemblies.map` relative to the location of the ZARP-cli repository |
| `resources_version` | Whether to always download the latest available version of genome annotations for a given organism/source from Ensembl (enter `None`; default) or whether to use a specific version of the corresponding Ensembl database (e.g., `100`); note that the different Ensembl databases (e.g., for fungi, plants) use a different versioning scheme, so pinning a particular database version may lead to unexpected outcomes | `None` |
| `rule_config` | A configuration file for the _ZARP_ workflow that sets specific parameters for each workflow step ("rule"); see [ZARP][zarp] documentation for details | `None` |
| `profile` | Path to [Snakemake profile][snakemake-profiles] to be used for the _ZARP_ workflow. Use this to optimize _ZARP_ for your specific compute environment |
| `fragment_length_distribution_mean` | HTSinfer currently is unable to infer the mean of the fragment length distribution of RNA-seq libraries; however, this value is required for tools [`kallisto`][kallisto] and [`salmon`][salmon] -which are executed as part of _ZARP_- when run on single-ended libraries only (for paired-ended libraries, the tools are able to infer this parameter from the data); the value provided here is used as a fallback if the value was not determined experimentally (e.g., with [Bioanalyzer][bioanalyzer] instruments) and provided via a sample table | `300` |
| `profile` | Path to [Snakemake profile][snakemake-profiles] to be used for the _ZARP_ workflow; use this to optimize _ZARP_ for your specific compute environment |
| `fragment_length_distribution_mean` | HTSinfer currently is unable to infer the mean of the fragment length distribution of RNA-seq libraries; however, this value is required for tools [`kallisto`][kallisto] and [`salmon`][salmon] - which are executed as part of _ZARP_ - when run on single-ended libraries only (for paired-ended libraries, the tools are able to infer this parameter from the data); the value provided here is used as a fallback if the value was not determined experimentally (e.g., with [Bioanalyzer][bioanalyzer] instruments) and provided via a sample table | `300` |
| `fragment_length_distribution_sd` | Analogous to `fragment_length_distribution_mean` above, but this parameter is for the _standard deviation_ of the fragment length distribution | `100` |
| `author` | Name of the person or organization executing the _ZARP-cli_ runs; will be added to the _ZARP_ report | `None` |
| `email` | Email of the person or organization executing the _ZARP-cli_ runs; will be added to the _ZARP_ report | `None` |
Expand Down Expand Up @@ -98,4 +98,4 @@ dynamically**:
- [CLI arguments](./usage.md) for individual run- and sample-specific
parameters, if provided
- Sample-specific parameters specified in sample tables **(highest
precendence!)**
precedence!)**
10 changes: 5 additions & 5 deletions docs/guides/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Installation requires the following:
- [Mamba][mamba] (tested with `mamba 1.3.0`)
- [Singularity][singularity] (tested with `singularity 3.8.6`; not required
if you have root permissions on the machine you would like to install
_ZARP-cli_ on; in that case choose one of the `.root.` environment file
flavors [below](#3-install-app-dependencies))
_ZARP-cli_ on; in that case, choose one of the `.root.` environment file
flavors [below](#3-set-up-environment))

> Other versions, especially older ones, are not guaranteed to work.

Expand All @@ -37,7 +37,7 @@ git clone [email protected]:zavolanlab/zarp-cli.git
cd zarp-cli
```

### 3. Install app & dependencies
### 3. Set up environment

In the next step, you need to install the app with its dependencies. For that
purpose, there exist four different environment files. Use this decision matrix
Expand All @@ -50,7 +50,7 @@ to pick the most suitable one for you:
| | :check_mark: | `install/environment.dev.yml` |
| :check_mark: | :check_mark: | `install/environment.dev.root.yml` |

To set up the environment execute the call below, but do not forget to replace
To set up the environment, execute the call below, but do not forget to replace
the placeholder `ENVIRONMENT` with the appropriate file from the table above:

```sh
Expand All @@ -66,4 +66,4 @@ conda activate zarp-cli
```

You should now be good to go to proceed with
[initiliaztion](./initialization.md).
[initialization](./initialization.md).
4 changes: 3 additions & 1 deletion docs/includes/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,6 @@
[zarp-cli-issue-tracker]: <https://github.com/zavolanlab/zarp-cli/issues>
[zarp-issue-tracker]: <https://github.com/zavolanlab/zarp/issues>
[zarp-qa]: <https://github.com/zavolanlab/zarp/discussions>
[zavolab-gh]: <https://github.com/zavolanlab>
[zarp-supplementary]: <https://zenodo.org/records/10797372.
[zavolab-gh]: <https://github.com/zavolanlab>
[zenodo]: <https://zenodo.org/>
1 change: 1 addition & 0 deletions docs/overrides/.icons/download.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
62 changes: 0 additions & 62 deletions docs/overrides/.icons/twitter.svg

This file was deleted.

1 change: 1 addition & 0 deletions docs/overrides/.icons/x.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ markdown_extensions:
- md_in_html
- pymdownx.details
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
options:
custom_icons:
- docs/overrides/.icons
Expand All @@ -53,9 +53,9 @@ extra:
- icon: github
link: https://github.com/zavolanlab
name: Zavolab GitHub organization
- icon: twitter
link: https://twitter.com/ZavolanLab
name: Zavolab Twitter profile
- icon: x
link: https://x.com/ZavolanLab
name: Zavolab X profile
- icon: forum
link: https://github.com/zavolanlab/zarp/discussions
name: ZARP Q&A forum
Expand Down
Loading