Skip to content
/ scLEAF Public

A pytorch implementation for paper "scLEAF: Large Language Models Enhance Single-cell Multi-omics Biology"

Notifications You must be signed in to change notification settings

zfkarl/scLEAF

Repository files navigation

scLEAF: Large Language Models Enhance Single-cell Multi-omics Biology

Introduction

scLEAF is a versatile framework for single-cell multi-omics data analysis, which transfers cell representations to the LLM text space.

image

Getting Started

Requirements

  • Python 3.10, PyTorch>=1.21.0, numpy>=1.24.0, are required for the current codebase.

LLM Embeddings

1. Cell-level Text Embeddings

We use the Vicuna-7B model to extract the cell-level text embeddings. Download embeddings from https://drive.google.com/drive/folders/1aArcZjDckc7my9gPvVqN0h8X-7a0brLV.

2. Feature-level Text Embeddings

The original embeddings can be downloaded from https://sites.google.com/yale.edu/scelmolib. We also provide the preprocessed version in https://drive.google.com/drive/folders/1aArcZjDckc7my9gPvVqN0h8X-7a0brLV.

Datasets

CITE-seq and ASAP-seq Data

Download dataset from https://github.com/SydneyBioX/scJoint/blob/main/data.zip.

Cell Type Annotation

Pre-training on CITE-seq Data
sh pretrain_cite.sh 
Fine-tuning on CITE-seq Data
sh finetune_cite.sh 
Pre-training on ASAP-seq Data
sh pretrain_asap.sh 
Fine-tuning on ASAP-seq Data
sh finetune_asap.sh 

Acknowledgement

Our codebase is built based on scCLIP, timm, transformers, and Pytorch Lightning. We thank the authors for the nicely organized code!

About

A pytorch implementation for paper "scLEAF: Large Language Models Enhance Single-cell Multi-omics Biology"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published