Skip to content

Commit

Permalink
Merge pull request #448 from toriapetrova/main
Browse files Browse the repository at this point in the history
Working on issues #436, #440, #441, #442, #443, #447
  • Loading branch information
Brilator authored Oct 7, 2024
2 parents e58f808 + 4c0c241 commit 823d27f
Show file tree
Hide file tree
Showing 22 changed files with 374 additions and 16 deletions.
2 changes: 2 additions & 0 deletions src/docs/_sidebars/mainSidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@
### [DataPLANT account](/docs/guides/datahub_account.html)
### [Invite collaborators to your ARC](/docs/guides/datahub_InviteCollaborators.html)
### [Sharing ARCs via the DataHUB](/docs/guides/datahub-arc-sharing.html)
### [Adding a LICENSE to your ARC](/docs/guides/datahub-license.html)

## [Work with your ARC](/docs/guides/index-WorkWithYourARC.html)
### [Using ARCs with Galaxy](/docs/guides/ARCs-galaxy.html)
Expand All @@ -139,6 +140,7 @@
### [Adding external data to the ARC](/docs/guides/arc_AddingExternalData.html)
### [ARCs in Enabling Platforms](/docs/guides/ARC-enablingPlatforms.html)
### [Publication to ARC](/docs/guides/publicationToARC.html)
### [Working with branches](/docs/guides/arc_WorkingWithBranches.html)

## [Troubleshooting](/docs/guides/index-Troubleshooting.html)
### [Git Troubleshooting & Tips](/docs/guides/git-troubleshooting.html)
Expand Down
8 changes: 8 additions & 0 deletions src/docs/fundamentals/VersionControlGit.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ By taking chronological snapshots of a complete project (termed "git repository"

![Git and Git Platforms](./../img/git_github_gitlab.png)

## Git branches

Branches in Git allow users to work on and develop new features of their projects without affecting or changing other branches in the repository.

Imagine you're writing a book. The main path you're working on is the main storyline, which we'll call the main branch. But at some point, you get a new idea for an alternative storyline. Instead of changing the main storyline right away, you create a new path (or branch). Now, you can write and develop this new idea without affecting the main storyline. If you like how the alternative idea turns out, you can merge it back into the main storyline. If not, you can simply discard it, and the main storyline remains untouched.

In Git, branches help you experiment and develop different features or ideas in isolation, without messing up the main codebase. In addition, multiple people can work on different tasks within the same project without interfering with each other. You can switch between branches, merge them together, or delete them when they're no longer needed.

## Git platforms: GitHub and GitLab

Although Git could be used locally as a standalone tool, its full power is unfolded via git platforms such as [GitHub] and [GitLab]. Similar to the typical cloud services for file sharing and collaboration, these platforms function as remote share-points for git repositories. They allow data access management (permission control) to share data privately with selected collaborators or the public. Individual contributions and changes by multiple collaborators can be tracked. On top of versioned data sharing, additional features, such as discussing and tracking project tasks and contributions, and wiki-based documentation render these git platforms very valuable for project and research (data) management. Consequently, they nowadays enjoy great popularity outside of software development.
Expand Down
29 changes: 16 additions & 13 deletions src/docs/guides/BestPracticesForDataAnnotation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,29 +23,32 @@ In this guide, we will take a closer look at some experimental scenarios that ev
</a>
</div>

## Annotation of time series experiments

As an example, we will walk you through a simple ARC and discuss good practices for annotation along the way. In our first scenario we take a look at the annotation of time course patterns. Let's imagine a study in which our plant of interest (plant1), Arabidopsis thaliana (Characteristic [Organism]), was exposed to salt stress (Component [Ingredient]) for a given time. To investigate the cellular response, you harvested samples at various time points after exposure to the stressor: S1 is harvested right away, S2 after 10 minutes, and so on. This information was stored within the isa.study.xlsx file.

## Annotation of biological and technical replicates
<img src="./../img/ISA_AnnotationPattern_TimeSeries.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

In our first scenario we focus on annotating the origin and relationship between biological and technical replicates within a fictional study. We started with three biological replicates (Plant A, Plant B, and Plant C) of the model organism *Arabidopsis thaliana* (Characteristic [Organism]), which were grown under particular conditions (Characteristic [growth day length]). Harvesting of the plants or particular parts resulted in three samples: S1, S2, and S3. These information were stored within the isa.study.xlsx file.
You should use the Factor building block in such a case to annotate the time after exposure and thereby the sampling point in the isa.study.xlsx file, as this time period will ultimately result in the given output, when all remaining parameters for treatment and analysis were identical.

Subsequent proccesing steps, mostly omitted here for better clarity, are stored within one or multiple isa.assay.xlsx files. In our scenario, three technical replicates of each sample were analyzed via LC/MS (Parameter [instrument model]), generating nine raw data files.
## Annotation of mixed samples

![replicates](./../img/ISA_AnnotationPattern_Replicates.svg)
This example can be of relevance when you are carrying out labeling experiments or when you are spiking your samples with an internal standard for absolute quantification. The isa.assay.xlsx file below displays the best practice for annotating the mixing of experimental samples with a reference prior to LC/MS analysis (Component [Instrumentation]). By listing every data file twice, it becomes clear that the analyzed files originated from the combination of an experimental sample and a reference, e.g. spiking of S1 with the reference resulted in the output file name S1R.wiff.

**It is very important to group these technical replicates and thus annotate their common origin.** If you would falsely name the individual technical replicates as A, B and C, you could run into trouble during your computational analysis.
<img src="./../img/ISA_AnnotationPattern_MixingSamples.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

## Annotation of time series experiments
In this rather simple scenario we take a look at the annotation of time course patterns. Let's imagine a study in which our plant (Sample A) was exposed to stress (high light, salt, ...) for a given time. To investigate the cellular response, you harvested samples at various time points after exposure to the stressor: S1 is harvested after 5 minutes, S2 after 10 minutes, and so on.
## Annotation of biological/technical replicates and subprocessing

![TimeSeries](./../img/ISA_AnnotationPattern_TimeSeries.svg)
In the following scenario we focus on annotating the origin and relationship between biological/technical replicates and managing subprocesses within an assay. We start with the five samples (S1, S2, ..., S5), originating from the isa.study.xlsx file. We want to perform a transcriptomics analysis but before that we have to extract RNA from our samples. Some processes and lab procedures cannot be described coherently in one isa table, even though they are part of the same assay. In such case, you can split subprocesses into different sheets of the same isa table. For example, here the extraction of RNA is described in the first sheet with an output being the RNA samples (rna_sample1, ...). Then on the next sheet is the table representing the seqiencing itself. As you can see, the input becomes the output from the previous sheet, thus, indicating the processes are sequential. In the same manner, samples and processes are being connected across studies and assays within an ARC. While performing the RNA sequencing, three technical replicates per sample are generated (Characteristic [technical replicate]). However, each replicate results in its own data file.

You should use the Factor building block in such a case to annotate the time after exposure and thereby the sampling point in the isa.study.xlsx file, as this time period will ultimately result in the given output, when all remaining parameters for treatment and analysis were identical.
<img src="./../img/ISA_AnnotationPattern_Replicates.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

## Annotation of mixed samples
This example can be of relevance when you are carrying out labeling experiments or when you are spiking your samples with an internal standard for absolute quantification. The isa.assay.xlsx file below displays the best practice for annotating the mixing of experimental samples with a reference prior to LC/MS analysis.
:bulb: We recommend to rename the sheets according to the subprocess they are describing.

## Connecting inputs and outputs

![Spiking](./../img/ISA_AnnotationPattern_MixingSamples.svg)
A key objective of the ARC is to trace each finding or result back to its specific biological experiment. Achieving this requires linking dataset files to their corresponding individual samples. To accomplish this, we follow a sequence of processes with defined inputs and outputs. Certain inputs and outputs may need to be reused or reproduced, while some processes may need to be applied to different inputs.

By listing every raw data file twice, it becomes clear that the analyzed samples originated from the combination of an experimental sample and a reference, e.g. spiking of S1 with the reference resulted in the data file S1R.wiff.
This is a graph representing the structure of the ARC created for these examples. As you can see, samples flow seamlessly through studies and assays, ending with data files as an output. You should always aim for such connection of input and output throughout all isa tables.

<img src="./../img/mermaid-graph.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">
39 changes: 39 additions & 0 deletions src/docs/guides/arc_WorkingWithBranches.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
layout: docs
title: Working with branches
date: 2024-08-17
author:
- name: Viktoria Petrova
github: toriapetrova
add toc: true
add support: true
add sidebar: _sidebars/mainSidebar.md
---


<!-- TODO article about licensing -->

## ARC branches

Branches in Git allow users to work on and develop new features of their projects without affecting or changing other branches in the repository. If you want to know more about branches check out the [Version control & Git](./../fundamentals/VersionControlGit.html) article.

In the example below, you can see how branches have been used to work on developing an ARC in parallel. The ARC is created within the "main" branch and some metadata and microscopy images are uploaded. A new branch called "plant material" is generated to describe the process of growing the plants, later used in the experimental assays, in a study. Moreover, another branch named "RNA-seq" deals with the description of the actual sequencing assay and the data generated from it. After completion, branches are merged into "main".

:warning: Don't forget to sync your branch with the "parent" branch to avoid merging conflicts.

<img src="./../img/git-branches.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

## ARCitect

In ARCitect you can create a new branch or switch to already existing ones by navigating to "Commit" on the left sidebar (1), then clicking on the dropdown menu (2) and selecting "Add Branch" (3) or the name of an already existing branch respectively (4).

<img src="./../img/ARCitect_branches.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

## ARC Commander

You can work on your ARC locally and once you are done you can commit your changes to a different branch.

```bash
arc sync -b SecondBranchName
```
This will create a commit with your newest changes and push the commit to a new branch with the given name. When you finished editing your ARC, you can merge your progress into the main branch.
17 changes: 17 additions & 0 deletions src/docs/guides/arc_WorkingWithLargeDataFiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,27 @@ In addition you can set a threshold (2) in megabytes (MB) for what you consider

<img src="./../img/ARCitect-lfs-threshold.drawio.svg" style="width:100%;display: block;margin: auto;padding: 30px 0px;">

You can also easily check which files in your ARC are flagged as LFS, by looking in the ARCitect tree panel (1).

<img src="./../img/ARCitect-lfs-flag.drawio.svg" style="width:100%;display: block;margin: auto;padding: 30px 0px;">

If you haven't downloaded the LFS file you can only open its pointer file. Unfortunately, this pointer file cannot be displayed in ARCitect but if you try to open it with a text editor (e.g. Notepad) it looks something like this:

```bash
version https://git-lfs.github.com/spec/v1
oid sha256:dfc4d259bb70ab93915fe6fd91df33017b09f9208d94b48d7c9a789dd35d65bc
size 22973898
```

Finally, you can individually download large files via right-click -> "Download LFS File" (1)

<img src="./../img/ARCitect-download-lfs-file-right-click.drawio.svg" style="width:100%;display: block;margin: auto;padding: 30px 0px;">

or you can also choose to download all large files from a directory by right clicking on the folder in the panel tree (1) and then "Download LFS Files" (2).

<img src="./../img/ARCitect-download-lfs-from-directory.drawio.svg" style="width:100%;display: block;margin: auto;padding: 30px 0px;">


### ARC Commander

By default, the ARC Commander tracks the following files via LFS:
Expand Down
30 changes: 30 additions & 0 deletions src/docs/guides/datahub-license.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
layout: docs
title: Adding a LICENSE to your ARC
date: 2024-08-16
author:
- name: Viktoria Petrova
github: toriapetrova
add toc: true
add support: true
add sidebar: _sidebars/mainSidebar.md
---

## Why is a LICENSE important?

Licenses in are essential for defining how others can use, modify, and distribute the code or data within a project. When you create and share an ARC, a license provides the formal framework that protects the data creators’ rights while clarifying the terms of use for the content. For example, some licenses allow free use and modification with few restrictions (like MIT or Apache 2.0), while others may require derivative works to also be open source (such as the GPL). Therefore, a license is crucial for fostering collaboration while ensuring legal protection and clarity for both creators and users.

## Adding a LICENSE to your ARC

In the DataHUB, a license is essentially just a standardized text file.
To add a `LICENSE` to your ARC

1. navigate to your ARC in the DataHUB,
2. click on "Add LICENSE" on the right sidebar menu(1),
3. use a provided license template or enter the license text.

<img src="./../img/datahub-add-license.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

:bulb: We recommend to use a CC-BY license, which is not offered as a template by DataHUB. For a CC-BY 4.0 license, you can copy the legal code from https://creativecommons.org/licenses/by/4.0/legalcode.txt.

:bulb: Remember to sync your local ARC (via ARC Commander or ARCitect) after creating a `LICENSE` file in the DataHUB.
6 changes: 6 additions & 0 deletions src/docs/guides/publicationToARC.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,14 @@ This guide assumes you know
1. this must be according to the publisher's license usually found on the publisher / journal website of the publication
2. See e.g. https://git.nfdi4plants.org/brilator/Facultative-CAM-in-Talinum/-/blob/main/LICENSE for a CC-BY 4.0 license

- To add a `LICENSE`, navigate to your ARC in DataHUB and click on "Add LICENSE" on the right sidebar menu(1).

<img src="./../img/datahub-add-license.drawio.svg" style="width:100%;display: block;margin: auto; padding: 30px 0px;">

:bulb: We recommend to focus on open access / CC-BY publications and datasets, unless you explicitly know, whether and how to re-use the data published elsewhere.

:warning: Don't forget to sync your local ARC (via ARC Commander or ARCitect) after creating a `LICENSE` file in DataHUB.

## ISA - investigation / isa.investigation.xlsx

- Add Title: publication title
Expand Down
57 changes: 57 additions & 0 deletions src/docs/img/ARCitect-download-lfs-from-directory.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions src/docs/img/ARCitect-lfs-flag.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
86 changes: 86 additions & 0 deletions src/docs/img/ARCitect_branches.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/docs/img/ARCitect_branches.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions src/docs/img/ISA_AnnotationPattern_MixingSamples.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion src/docs/img/ISA_AnnotationPattern_MixingSamples.svg

This file was deleted.

8 changes: 8 additions & 0 deletions src/docs/img/ISA_AnnotationPattern_Replicates.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion src/docs/img/ISA_AnnotationPattern_Replicates.svg

This file was deleted.

7 changes: 7 additions & 0 deletions src/docs/img/ISA_AnnotationPattern_TimeSeries.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion src/docs/img/ISA_AnnotationPattern_TimeSeries.svg

This file was deleted.

Binary file modified src/docs/img/ISA_AnnotationPatterns.pptx
Binary file not shown.
39 changes: 39 additions & 0 deletions src/docs/img/datahub-add-license.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions src/docs/img/git-branches.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/docs/img/git-branches.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions src/docs/img/mermaid-graph.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 823d27f

Please sign in to comment.