Skip to content

Homolig is a python algorithm designed to compare the physiotypic properties of TCR and BCRs. Created by Alexander Girgis, Emily Davis-Marcisak, and Theron Palmer.

License

Notifications You must be signed in to change notification settings

FertigLab/Homolig

Repository files navigation

What is Homolig?

Homolig is a python package to physiochemically compare immune receptor and epitope sequences. There are two modules: (1) Compute pairwise sequence distance between sequences / Assign Clusters / Write UMAP (2) Generate descriptive statistics on physiochemical properties of selected sequences.

Getting Started

Installation

To get a local copy of Homolig up and running follow these simple example steps. (Once publicly available, this will be installable directly from pip without the need for cloning.)

  1. Clone the repository
git clone https://github.com/FertigLab/Homolig
  1. Activate a pre-made virtual environment with all homolig dependencies pre-installed (optional, but recommended):
cd Homolig 
source ./homolig-venv/bin/activate
  1. Pip install
cd Homolig 
python3 -m pip install .

You should now be able to import homolig within python. Alternatively, you may call a wrapper script from the terminal as shown below (recommended).

Documentation

Homolig uses the IMGT/V-QUEST reference directory release 202214-2 (05 April 2022).

File format

The input file is a comma separated file containing the TCR or BCR CDR3 amino acid sequence and varible gene name in IMGT format.

For cases where paired alpha and beta chain information is available:

CDR3.beta.aa TRBV CDR3.alpha.aa TRAV
CASSAGTSPTDTQYF TRBV6-4*01 CAVMDSSYKLIF TRAV1-2*01

Basic Usage: Pairwise Distances

Recommended usage for Module 1: Pairwise distances is through the wrapper homolig_wrapper.py, located in ./homolig/.

python3 $WRAPPERDIR/homolig_wrapper.py  -h
usage: homolig_wrapper.py [-h] [-i INPUT] [-s SEQ] [-c CHAINS] [-m METRIC] [-sp SPECIES] [-mode MODE] [-i2 INPUT2] [-o OUTPUT] [-v VERBOSE]
                          [-g SAVE_GERMLINE] [-si SAVE_REFORMATTED_INPUT]

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input .csv file
  -s SEQ, --seq SEQ     Sequence type. May be one of [tcr, bcr, seq].
  -c CHAINS, --chains CHAINS
                        Chain locus. May be one of [alpha, beta, heavy, light]. Can be omitted if --seq == 'seq'.
  -m METRIC, --metric METRIC
                        Distance matrix used in comparisons. Default is aadist.
  -sp SPECIES, --species SPECIES
                        Species for which to query V gene sequences. Default is human.
  -mode MODE, --mode MODE
                        Either 'pairwise' sequence comparison or 'axb' between two sequence groups.
  -i2 INPUT2, --input2 INPUT2
                        Second sequence group with which to compare first file.
  -o OUTPUT, --output OUTPUT
                        Desired output file path/filename. Defaults to input file directory.
  -v VERBOSE, --verbose VERBOSE
                        Specify verbosity during execution.
  -g SAVE_GERMLINE, --save_germline SAVE_GERMLINE
                        Save CDR alignments separately during execution.
  -si SAVE_REFORMATTED_INPUT, --save_reformatted_input SAVE_REFORMATTED_INPUT
                        Save input file after renaming V genes (may be useful in post-analysis).

To cluster the results of a Homolig run, you may use the wrapper clusterHomolig.py:

python3 $WRAPPERDIR/clusterHomolig.py  -h
usage: clusterHomolig.py [-h] [-i INPUT] [-c NUM_CLUSTERS] [-o OUTPUT]

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input .h5ad file
  -c NUM_CLUSTERS, --num_clusters NUM_CLUSTERS
                        Expected number of clusters. May be any integer.
  -o OUTPUT, --output OUTPUT
                        Desired output file path/filename. Defaults to input file directory.

To generate a UMAP reduction based on the NxN distance matrix, you may use the wrapper homolig_write_umap.py:

python3 $WRAPPERDIR/clusterHomolig.py  -h
usage: clusterHomolig.py [-h] [-i INPUT] [-c NUM_CLUSTERS] [-o OUTPUT]

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input .h5ad file
  -c NUM_CLUSTERS, --num_clusters NUM_CLUSTERS
                        Expected number of clusters. May be any integer.
  -o OUTPUT, --output OUTPUT
                        Desired output file path/filename. Defaults to input file directory.

Basic Usage: Descriptive Module

Functions to describe the physiochemical properties of arbitrary sequence groups are written in R. Please see functions in ./homolig/_rcode/score-sequences.r. For a walkthrough of basic usage see testcode-r-characterization-module.rmd.

Citation

If you use this software, please cite our manuscript:

Contact

Please send feedback to Alex Girgis - [email protected]

License

Distributed under the MIT License. See LICENSE for more information.

About

Homolig is a python algorithm designed to compare the physiotypic properties of TCR and BCRs. Created by Alexander Girgis, Emily Davis-Marcisak, and Theron Palmer.

Resources

License

Stars

Watchers

Forks

Packages

No packages published