data_pipeline

Project Overview

This project provides a data processing pipeline that uses various scripts and tools to download, process and import to lamindb. The pipeline includes steps from data download to quality control and final data storage.

Directory Structure

.
├── R
│ ├── convertAnn.R
│ └── scfetch
├── bash
│ ├── download_xlsx.sh
│ └── run_data_pipeline.sh
├── python
│ ├── 1-qc.py
│ └── 2-lamindb-aws.py
└── run_data_pipeline.sh

Dependencies

Conda
docker
scfetch
LaminDB
R
Python
...

Installation Steps

Clone the repository:

git clone https://github.com/Kang-chen/data_pipeline
cd data_pipeline

Install necessary dependencies:

TODO
Ensure Docker and Conda environments are properly configured:
- Docker
- Conda

Usage Instructions

To execute the data processing pipeline, run the following command:

bash -i ../run_data_pipeline.sh GSE161382
bash -i ../run_data_pipeline.sh GSE161382 3

Option Explanation

GSE161382 is the source_id parameter, representing the dataset to be processed. 3 is the start_step parameter, indicating that the pipeline should start from step 3. If omitted, the pipeline will start from step 1 by default.

Error Handling

If an error occurs during execution, the script will terminate and display an error message in the terminal. You can check the log files or the error output for more detailed information.

Contributing

Contributions are welcome! Feel free to submit pull requests or report issues to help improve the project.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
R		R
bash		bash
python		python
.gitignore		.gitignore
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
README.md		README.md
run_data_pipeline.sh		run_data_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data_pipeline

Project Overview

Directory Structure

Dependencies

Installation Steps

Usage Instructions

Option Explanation

Error Handling

Contributing

About

Releases

Packages

Languages

Mimianmy/data_pipeline

Folders and files

Latest commit

History

Repository files navigation

data_pipeline

Project Overview

Directory Structure

Dependencies

Installation Steps

Usage Instructions

Option Explanation

Error Handling

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages