diff --git a/docs/README.md b/docs/README.md index 666efe5..13c2261 100644 --- a/docs/README.md +++ b/docs/README.md @@ -18,8 +18,8 @@ supercharges it by providing the following features: !!! tip "Manage _ZARP_ run data and resources in one central, configurable location" Once _ZARP-cli_ is [installed](guides/installation.md) and -[configured](guides/initialization.md), you may be able to _ZARP_ an RNA-Seq library -with a command like this: +[configured](guides/initialization.md), you may be able to _ZARP_ an RNA-Seq +library with a command like this: ```bash zarp SRA1234567 @@ -34,13 +34,19 @@ zarp SRA1234567 ## How does it work? -Briefly, when a _ZARP-cli_ run is triggered, a [_ZARP-cli_ configuration +> _"Any sufficiently advanced technology is indistinguishable from magic."_ +> — Arthur C. Clarke + +At the risk of demystifying the magic, let's take a look at how _ZARP-cli_ +works: + +Briefly, when the program is triggered, a [_ZARP-cli_ configuration object](docstring/config.models.md#Config) is constructed from parsing [default configuration settings](guides/initialization.md#modifying-configuration-settings) and [command-line options](guides/usage.md#command-line-options). A user-specified list of [sample references](guides/usage.md#sample-references) of various -supported types is then attached to the configuration object and dereferenced +supported types is then attached to the configuration object and de-referenced to construct a (potentially) sparse data frame of sample metadata. If necessary, this data frame of samples is then successively completed by applying various sample processor plugins that are built on tools such as @@ -60,15 +66,19 @@ information is available to start a _ZARP_ run, the sample will be analyzed. ## How to cite + If you use _ZARP_ in your work (with or without _ZARP-cli_), please kindly cite the following article: -**ZARP: An automated workflow for processing of RNA-seq data** +**ZARP: A user-friendly and versatile RNA-seq analysis workflow** _Maria Katsantoni, Foivos Gypas, Christina J. Herrmann, Dominik Burri, Maciej -Bak, Paula Iborra, Krish Agarwal, Meric Ataman, Anastasiya Börsch, Mihaela -Zavolan, Alexander Kanitz_ -bioRxiv 2021.11.18.469017 - +Bak, Paula Iborra, Krish Agarwal, Meric Ataman, Máté Balajti, Noè Pozzan, Niels +Schlusser, Youngbin Moon, Aleksei Mironov, Anastasiya Börsch, Mihaela Zavolan, +Alexander Kanitz_ +**F1000Research 2024, 13:533** + + +[Download BibTeX citation :download: ](https://f1000research.com/articles/exportTo?versionId=163676&bibliographyReaderFormat=BIBTEX){ .md-button } ## Training materials diff --git a/docs/guides/examples.md b/docs/guides/examples.md index 7faa769..a376961 100644 --- a/docs/guides/examples.md +++ b/docs/guides/examples.md @@ -1,6 +1,36 @@ # Examples -!!! warning "Under construction" +This section provides a growing collection of examples that demonstrate how to +use _ZARP-cli_ in various scenarios. - This section is under preparation and will list a number of real-world, - fully functional examples. Please stay tuned. \ No newline at end of file +!!! info "Prerequisites" + + The examples below assume that you have already [installed](./installation.md) and [initialized](./initialization.md) _ZARP-cli. + +## Process samples deposited to SRA + +Let's have ZARP-cli fetch two samples from SRA, infer all necessary metadata, +fetch the corresponding genome annotations and start a _ZARP_ workflow run on +them: + +```sh +zarp SRR23590181 SRR23529108 +``` + +??? tip "I want to verify the inferred metadata first!" + + Set the `--execution-mode` parameter to `PREPARE_RUN` to run _ZARP-cli_ up + until the point of the actual _ZARP_ workflow execution: + + ```sh + zarp --execution-mode=PREPARE_RUN SRR23590181 SRR23529108 + ``` + +!!! info "More please!" + + You will find a description of more elaborate use cases in the [ZARP + publication](https://doi.org/10.12688/f1000research.149237.1) and the + accompanying [supplementary materials][zarp-supplementary], published on + [Zenodo][zenodo]. The latter include detailed instructions, all necessary + input files and a selection of reference output files to validate your runs + against. \ No newline at end of file diff --git a/docs/guides/initialization.md b/docs/guides/initialization.md index 5832056..19c74e1 100644 --- a/docs/guides/initialization.md +++ b/docs/guides/initialization.md @@ -11,8 +11,8 @@ The following simple command triggers the _ZARP-cli_ initialization mode: zarp --init ``` -An interactive screen will guide you through the process. Read -[on](#configuration-options) to find out more about what each of the available +An interactive screen will guide you through the process. [Read +on](#configuration-options) to find out more about what each of the available options and suggested defaults mean. ??? question "Where is the configuration stored?" @@ -39,14 +39,14 @@ The following configuration options are available. | ------ | ----------- | ------- | | `working_directory` | Root directory for _ZARP-cli_ runs; needs to be writable | `$HOME/.zarp` | | `zarp_directory` | Path to the local copy of the [ZARP workflow repository][zarp] | `../zarp` relative to the location of the ZARP-cli repository | -| `execution_mode` | Trigger a full _ZARP-cli_ run (`RUN`), a dry run (`DRY_RUN`; external tools are not actually run, only logs what _would be_ run; useful for testing) or prepare a _ZARP_ run (`PREPARE_RUN`; _ZARP-cli_ is run normally, including all external tools, up until the point of the execution of the actual _ZARP_ workflow; use to manually check metadata table before execution) | `RUN` | +| `execution_mode` | Trigger a full _ZARP-cli_ run (`RUN`), a dry run (`DRY_RUN`; external tools are not actually run, only logs what _would be_ run; useful for testing) or prepare a _ZARP_ run (`PREPARE_RUN`; _ZARP-cli_ is run normally, including all external tools, up until the point of the execution of the actual _ZARP_ workflow; use to manually check metadata table before _ZARP_ execution) | `RUN` | | `cores` | Number of CPU cores that Snakemake is run with when executing _ZARP_ and the auxiliary workflows (fetching libraries from [SRA][sra], inferring metadata) | `1` | -| `dependency_embedding` | Whether Snakemake should use `CONDA` or containers (`SINGULARITY`) to manage dependencies of each workflow step/rule (note that the auxiliary workflows currently have restrictions on which dependency embedding strategy can be used; if an unsupported scheme is suggested, a warning is emitted and the other one is enabled by default) | `CONDA` | -| `genome_assemblies_map` | A headerless 3-column semicolon-separated mapping table of organism/source trivial names (e.g., `homo_sapiens`), optional comma-separated aliases such as NCBI taxon IDs and/or organism/source short names (e.g., `7227,dmelanogaster`) and a corresponding genome assembly name (e.g., `GRCm39`); a table in the required format is shipped with _ZARP_cli_ in the location provided in the default location; which can be amended with additional aliases; note that for [`genomepy`][genomepy] to be able to pull genome annotations for organisms/sources that [HTSinfer][htsinfer] inferred, NCBI taxon ID aliases are _required_ | `./data/genome_assemblies.map` relative to the location of the ZARP-cli repository | +| `dependency_embedding` | Whether Snakemake should use `CONDA` or containers (`SINGULARITY`) to manage dependencies of each workflow step/rule | `CONDA` | +| `genome_assemblies_map` | A headerless 3-column semicolon-separated mapping table of organism/source trivial names (e.g., `homo_sapiens`), optional comma-separated aliases such as NCBI taxon IDs and/or organism/source short names (e.g., `7227,dmelanogaster`) and a corresponding genome assembly name (e.g., `GRCm39`); a table in the required format is shipped with _ZARP_cli_ in the the default location; it can be amended with additional aliases; note that for [`genomepy`][genomepy] to be able to pull genome annotations for organisms/sources that [HTSinfer][htsinfer] inferred, NCBI taxon ID aliases are _required_ | `./data/genome_assemblies.map` relative to the location of the ZARP-cli repository | | `resources_version` | Whether to always download the latest available version of genome annotations for a given organism/source from Ensembl (enter `None`; default) or whether to use a specific version of the corresponding Ensembl database (e.g., `100`); note that the different Ensembl databases (e.g., for fungi, plants) use a different versioning scheme, so pinning a particular database version may lead to unexpected outcomes | `None` | | `rule_config` | A configuration file for the _ZARP_ workflow that sets specific parameters for each workflow step ("rule"); see [ZARP][zarp] documentation for details | `None` | -| `profile` | Path to [Snakemake profile][snakemake-profiles] to be used for the _ZARP_ workflow. Use this to optimize _ZARP_ for your specific compute environment | -| `fragment_length_distribution_mean` | HTSinfer currently is unable to infer the mean of the fragment length distribution of RNA-seq libraries; however, this value is required for tools [`kallisto`][kallisto] and [`salmon`][salmon] -which are executed as part of _ZARP_- when run on single-ended libraries only (for paired-ended libraries, the tools are able to infer this parameter from the data); the value provided here is used as a fallback if the value was not determined experimentally (e.g., with [Bioanalyzer][bioanalyzer] instruments) and provided via a sample table | `300` | +| `profile` | Path to [Snakemake profile][snakemake-profiles] to be used for the _ZARP_ workflow; use this to optimize _ZARP_ for your specific compute environment | +| `fragment_length_distribution_mean` | HTSinfer currently is unable to infer the mean of the fragment length distribution of RNA-seq libraries; however, this value is required for tools [`kallisto`][kallisto] and [`salmon`][salmon] - which are executed as part of _ZARP_ - when run on single-ended libraries only (for paired-ended libraries, the tools are able to infer this parameter from the data); the value provided here is used as a fallback if the value was not determined experimentally (e.g., with [Bioanalyzer][bioanalyzer] instruments) and provided via a sample table | `300` | | `fragment_length_distribution_sd` | Analogous to `fragment_length_distribution_mean` above, but this parameter is for the _standard deviation_ of the fragment length distribution | `100` | | `author` | Name of the person or organization executing the _ZARP-cli_ runs; will be added to the _ZARP_ report | `None` | | `email` | Email of the person or organization executing the _ZARP-cli_ runs; will be added to the _ZARP_ report | `None` | @@ -98,4 +98,4 @@ dynamically**: - [CLI arguments](./usage.md) for individual run- and sample-specific parameters, if provided - Sample-specific parameters specified in sample tables **(highest - precendence!)** + precedence!)** diff --git a/docs/guides/installation.md b/docs/guides/installation.md index acfe524..f244a13 100644 --- a/docs/guides/installation.md +++ b/docs/guides/installation.md @@ -11,8 +11,8 @@ Installation requires the following: - [Mamba][mamba] (tested with `mamba 1.3.0`) - [Singularity][singularity] (tested with `singularity 3.8.6`; not required if you have root permissions on the machine you would like to install - _ZARP-cli_ on; in that case choose one of the `.root.` environment file - flavors [below](#3-install-app-dependencies)) + _ZARP-cli_ on; in that case, choose one of the `.root.` environment file + flavors [below](#3-set-up-environment)) > Other versions, especially older ones, are not guaranteed to work. @@ -37,7 +37,7 @@ git clone git@github.com:zavolanlab/zarp-cli.git cd zarp-cli ``` -### 3. Install app & dependencies +### 3. Set up environment In the next step, you need to install the app with its dependencies. For that purpose, there exist four different environment files. Use this decision matrix @@ -50,7 +50,7 @@ to pick the most suitable one for you: | | :check_mark: | `install/environment.dev.yml` | | :check_mark: | :check_mark: | `install/environment.dev.root.yml` | -To set up the environment execute the call below, but do not forget to replace +To set up the environment, execute the call below, but do not forget to replace the placeholder `ENVIRONMENT` with the appropriate file from the table above: ```sh @@ -66,4 +66,4 @@ conda activate zarp-cli ``` You should now be good to go to proceed with -[initiliaztion](./initialization.md). +[initialization](./initialization.md). diff --git a/docs/includes/references.md b/docs/includes/references.md index e861c4d..8b36df5 100644 --- a/docs/includes/references.md +++ b/docs/includes/references.md @@ -19,4 +19,6 @@ [zarp-cli-issue-tracker]: [zarp-issue-tracker]: [zarp-qa]: -[zavolab-gh]: \ No newline at end of file +[zarp-supplementary]: +[zenodo]: \ No newline at end of file diff --git a/docs/overrides/.icons/download.svg b/docs/overrides/.icons/download.svg new file mode 100644 index 0000000..34fdf86 --- /dev/null +++ b/docs/overrides/.icons/download.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/overrides/.icons/twitter.svg b/docs/overrides/.icons/twitter.svg deleted file mode 100644 index 04e0462..0000000 --- a/docs/overrides/.icons/twitter.svg +++ /dev/null @@ -1,62 +0,0 @@ - - - - \ No newline at end of file diff --git a/docs/overrides/.icons/x.svg b/docs/overrides/.icons/x.svg new file mode 100644 index 0000000..0bd9e7f --- /dev/null +++ b/docs/overrides/.icons/x.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 1fd66a6..d197486 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -31,8 +31,8 @@ markdown_extensions: - md_in_html - pymdownx.details - pymdownx.emoji: - emoji_index: !!python/name:materialx.emoji.twemoji - emoji_generator: !!python/name:materialx.emoji.to_svg + emoji_index: !!python/name:material.extensions.emoji.twemoji + emoji_generator: !!python/name:material.extensions.emoji.to_svg options: custom_icons: - docs/overrides/.icons @@ -53,9 +53,9 @@ extra: - icon: github link: https://github.com/zavolanlab name: Zavolab GitHub organization - - icon: twitter - link: https://twitter.com/ZavolanLab - name: Zavolab Twitter profile + - icon: x + link: https://x.com/ZavolanLab + name: Zavolab X profile - icon: forum link: https://github.com/zavolanlab/zarp/discussions name: ZARP Q&A forum