scLEAF: Large Language Models Enhance Single-cell Multi-omics Biology

Introduction

scLEAF is a versatile framework for single-cell multi-omics data analysis, which transfers cell representations to the LLM text space.

Getting Started

Requirements

Python 3.10, PyTorch>=1.21.0, numpy>=1.24.0, are required for the current codebase.

LLM Embeddings

1. Cell-level Text Embeddings

We use the Vicuna-7B model to extract the cell-level text embeddings. Download embeddings from https://drive.google.com/drive/folders/1aArcZjDckc7my9gPvVqN0h8X-7a0brLV.

2. Feature-level Text Embeddings

The original embeddings can be downloaded from https://sites.google.com/yale.edu/scelmolib. We also provide the preprocessed version in https://drive.google.com/drive/folders/1aArcZjDckc7my9gPvVqN0h8X-7a0brLV.

Datasets

CITE-seq and ASAP-seq Data

Download dataset from https://github.com/SydneyBioX/scJoint/blob/main/data.zip.

Cell Type Annotation

Pre-training on CITE-seq Data

sh pretrain_cite.sh

Fine-tuning on CITE-seq Data

sh finetune_cite.sh

Pre-training on ASAP-seq Data

sh pretrain_asap.sh

Fine-tuning on ASAP-seq Data

sh finetune_asap.sh

Acknowledgement

Our codebase is built based on scCLIP, timm, transformers, and Pytorch Lightning. We thank the authors for the nicely organized code!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
imgs		imgs
scclip		scclip
README.md		README.md
finetune_asap.sh		finetune_asap.sh
finetune_cite.sh		finetune_cite.sh
pretrain_asap.sh		pretrain_asap.sh
pretrain_cite.sh		pretrain_cite.sh
train_asap.py		train_asap.py
train_cite.py		train_cite.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scLEAF: Large Language Models Enhance Single-cell Multi-omics Biology

Introduction

Getting Started

Requirements

LLM Embeddings

1. Cell-level Text Embeddings

2. Feature-level Text Embeddings

Datasets

CITE-seq and ASAP-seq Data

Cell Type Annotation

Pre-training on CITE-seq Data

Fine-tuning on CITE-seq Data

Pre-training on ASAP-seq Data

Fine-tuning on ASAP-seq Data

Acknowledgement

About

Releases

Packages

Languages

zfkarl/scLEAF

Folders and files

Latest commit

History

Repository files navigation

scLEAF: Large Language Models Enhance Single-cell Multi-omics Biology

Introduction

Getting Started

Requirements

LLM Embeddings

1. Cell-level Text Embeddings

2. Feature-level Text Embeddings

Datasets

CITE-seq and ASAP-seq Data

Cell Type Annotation

Pre-training on CITE-seq Data

Fine-tuning on CITE-seq Data

Pre-training on ASAP-seq Data

Fine-tuning on ASAP-seq Data

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages