Skip to content
generated from fastai/nbdev_template

A library to analyze and explore protein sequences using BERT models

License

Notifications You must be signed in to change notification settings

tijeco/berteome

Repository files navigation

berteome

Install

pip install berteome

Getting started

Berteome makes use of the masked language model of BERT to determine predictions for all residues in a protein sequence.

The main berteome library can be imported as follows:

from berteome import berteome

The modelLoader class can be used to show what models are supported by berteome.

berteome_models = berteome.modelLoader()
berteome_models.supported_models
['Rostlab/prot_bert',
 'facebook/esm2_t33_650M_UR50D',
 'facebook/esm1b_t33_650M_UR50S']

All of these models are distributed through huggingface, and berteome makes great use of it’s API.

Load model

To load prot_bert model, run the following:

bert_tokenizer, bert_model = berteome_models.load_model("Rostlab/prot_bert")
Downloading:   0%|          | 0.00/81.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/86.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/361 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.68G [00:00<?, ?B/s]

Some weights of the model checkpoint at Rostlab/prot_bert were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

The language models utilized by berteome were trained using a masked token approach. In this approach, a random amino acid is masked in a protein and the model is trained to predict what the amino acid should be. These models do this on an incredibly large amount of protein sequences, to the point that they begin to learn the language of protein sequence space as we currently know it. For instance, it can start to learn, which residues are unlikely to exist at a given point in a protein. Using these models, you can place a mask at any given residue in the protein, and the model will generate a probability score for all the possible amino acids that could go there.

berteome allows the user to take the models and begin to really investigate these predictions for a given protein sequence, by masking every single residue in the protein sequence and predicting the probabilities for all the possible amino acids. The result is a nice, easy to work with pandas data frame. To make this dataframe for a very simple peptide sequence (MENDEL), do the following:

mendel_berteome = berteome.modelPredDF("MENDEL",bert_tokenizer, bert_model)
mendel_berteome.df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
wt wtIndex wtScore n_effective topAA topAAscore A C D E ... M N P Q R S T V W Y
0 M 1 0.076602 16.680519 E 0.118906 0.036697 0.011504 0.048245 0.118906 ... 0.076602 0.072661 0.024722 0.038672 0.043105 0.070280 0.056544 0.049927 0.007781 0.021699
1 E 2 0.074830 17.599154 L 0.106501 0.045721 0.015662 0.041921 0.074830 ... 0.043581 0.062667 0.025277 0.036911 0.055543 0.064425 0.049955 0.056789 0.012691 0.029893
2 N 3 0.041990 14.518531 E 0.184364 0.043564 0.009685 0.162590 0.184364 ... 0.041484 0.041990 0.019992 0.025515 0.029433 0.048106 0.030303 0.054742 0.007430 0.024924
3 D 4 0.049748 17.561047 L 0.109088 0.042083 0.013244 0.049748 0.086194 ... 0.040080 0.060822 0.032024 0.039689 0.046228 0.062323 0.044901 0.058937 0.010875 0.026596
4 E 5 0.086915 17.921406 L 0.090807 0.046641 0.018770 0.079822 0.086915 ... 0.028962 0.062234 0.023879 0.030534 0.040489 0.065195 0.044938 0.068038 0.012156 0.038034
5 L 6 0.060736 16.068075 E 0.152547 0.038191 0.009217 0.065189 0.152547 ... 0.040042 0.096484 0.020712 0.035022 0.046888 0.049071 0.046247 0.048276 0.010486 0.022727

6 rows × 26 columns

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">

<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style>
  <script>
    const buttonEl =
      document.querySelector('#df-e5f727fc-7e2a-4d82-b367-c0539274e080 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-e5f727fc-7e2a-4d82-b367-c0539274e080');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

This dataframe is where the true berteomic magic begins. Each row corresponds to each residue in the input protein sequence.

Here is a breakdown of some the columns in the dataframe.

  • wt represents the actual amino acid at the given position `
  • wtIndex is just a one-based index of the residue which makes plotting easier, may not stick around forever though..-
  • wtScore is a very interesting and important value. For a given protein, one would hope that the model would predict that the masked residue would be the same as the wild-type in the sequence. This column gives us the actual probability that the model provided for the wild type residue at that position.
  • n_effective is a measure of site-specific variability which gives a proxy of how many amino acids could occupy that site and is defined as $N_{eff}(i) = exp(-\sum p_{ji} \ln p_{ji})$
  • topAA is the top scoring amino acid at a given position in the protein
  • topAAscore is the score of the top scoring amino acid at a given position in the protein

The remaining columns are simply the probabilities of each possible amino acid generated by the model when placing a mask at every residue in the input protein.

Score sequence

The average score for the wild type sequence and the top sequence are recorded as following using the scoreSeq() function

print(mendel_berteome.wtSeq, mendel_berteome.wtSeqScore)
MENDEL 0.06513695385878104
print(mendel_berteome.topAASeq, mendel_berteome.topAASeqScore)
ELELLE 0.127035315825644

To test the score of another given protein of the same length as the input provide it to scoreSeq()

mendel_berteome.scoreSeq("LEDNEM")
0.08294879426692443

Amino acid correlation

For a given berteome dataframe, to investigate how correlated the predictions of the different amino acids are to each other, the aa_correlation() can be used to generate a correlation dataframe

mendel_berteome.aa_correlation()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
A C D E F G H I K L M N P Q R S T V W Y
A 1.000000 0.728715 0.235810 -0.389880 0.879478 0.295939 0.745629 0.281994 -0.521591 0.733512 -0.720194 -0.611639 0.079973 -0.433475 -0.010752 0.051076 -0.411044 0.833235 0.585926 0.854028
C 0.728715 1.000000 -0.335086 -0.816555 0.854112 0.231240 0.948531 0.774243 -0.042334 0.466360 -0.382031 -0.235096 0.369489 0.063834 0.313217 0.638680 0.247711 0.876376 0.736407 0.923179
D 0.235810 -0.335086 1.000000 0.765980 0.084237 -0.105943 -0.311785 -0.822663 -0.909457 0.087421 -0.275042 -0.581996 -0.599214 -0.924922 -0.890910 -0.671449 -0.903984 0.053589 -0.545103 -0.021774
E -0.389880 -0.816555 0.765980 1.000000 -0.555584 -0.275365 -0.756599 -0.960437 -0.445062 -0.449607 0.096590 -0.027763 -0.732526 -0.612387 -0.710517 -0.797275 -0.600745 -0.555534 -0.767346 -0.570185
F 0.879478 0.854112 0.084237 -0.555584 1.000000 0.456554 0.850721 0.485917 -0.477467 0.699526 -0.622552 -0.579098 0.359107 -0.254099 -0.072739 0.316781 -0.244826 0.988906 0.546931 0.916871
G 0.295939 0.231240 -0.105943 -0.275365 0.456554 1.000000 0.469717 0.397913 -0.077729 0.311335 -0.730916 0.058536 0.495873 0.101611 0.103227 -0.197846 -0.268709 0.464575 0.501189 0.351613
H 0.745629 0.948531 -0.311785 -0.756599 0.850721 0.469717 1.000000 0.780563 -0.042422 0.403466 -0.613977 -0.096189 0.331730 0.020781 0.334186 0.428619 0.133945 0.884543 0.852824 0.949147
I 0.281994 0.774243 -0.822663 -0.960437 0.485917 0.397913 0.780563 1.000000 0.529266 0.250584 -0.168636 0.251695 0.680964 0.638904 0.732240 0.718683 0.641502 0.519188 0.816000 0.560873
K -0.521591 -0.042334 -0.909457 -0.445062 -0.477467 -0.077729 -0.042422 0.529266 1.000000 -0.363205 0.430718 0.773594 0.335643 0.889435 0.850884 0.411444 0.872260 -0.447166 0.317516 -0.325412
L 0.733512 0.466360 0.087421 -0.449607 0.699526 0.311335 0.403466 0.250584 -0.363205 1.000000 -0.360750 -0.779562 0.554163 -0.037801 0.062683 0.196178 -0.320043 0.588138 0.326964 0.436263
M -0.720194 -0.382031 -0.275042 0.096590 -0.622552 -0.730916 -0.613977 -0.168636 0.430718 -0.360750 1.000000 0.161152 0.038821 0.444465 0.054119 0.430616 0.575785 -0.620563 -0.596729 -0.652699
N -0.611639 -0.235096 -0.581996 -0.027763 -0.579098 0.058536 -0.096189 0.251695 0.773594 -0.779562 0.161152 1.000000 -0.116807 0.493941 0.512855 -0.030990 0.583399 -0.486083 0.203204 -0.307123
P 0.079973 0.369489 -0.599214 -0.732526 0.359107 0.495873 0.331730 0.680964 0.335643 0.554163 0.038821 -0.116807 1.000000 0.711244 0.444691 0.584905 0.362911 0.320277 0.353894 0.130555
Q -0.433475 0.063834 -0.924922 -0.612387 -0.254099 0.101611 0.020781 0.638904 0.889435 -0.037801 0.444465 0.493941 0.711244 1.000000 0.778685 0.589468 0.823070 -0.252472 0.281362 -0.272660
R -0.010752 0.313217 -0.890910 -0.710517 -0.072739 0.103227 0.334186 0.732240 0.850884 0.062683 0.054119 0.512855 0.444691 0.778685 1.000000 0.432145 0.713224 -0.081317 0.706977 0.066199
S 0.051076 0.638680 -0.671449 -0.797275 0.316781 -0.197846 0.428619 0.718683 0.411444 0.196178 0.430616 -0.030990 0.584905 0.589468 0.432145 1.000000 0.762126 0.338214 0.276284 0.313994
T -0.411044 0.247711 -0.903984 -0.600745 -0.244826 -0.268709 0.133945 0.641502 0.872260 -0.320043 0.575785 0.583399 0.362911 0.823070 0.713224 0.762126 1.000000 -0.191172 0.265531 -0.089424
V 0.833235 0.876376 0.053589 -0.555534 0.988906 0.464575 0.884543 0.519188 -0.447166 0.588138 -0.620563 -0.486083 0.320277 -0.252472 -0.081317 0.338214 -0.191172 1.000000 0.557270 0.944823
W 0.585926 0.736407 -0.545103 -0.767346 0.546931 0.501189 0.852824 0.816000 0.317516 0.326964 -0.596729 0.203204 0.353894 0.281362 0.706977 0.276284 0.265531 0.557270 1.000000 0.694812
Y 0.854028 0.923179 -0.021774 -0.570185 0.916871 0.351613 0.949147 0.560873 -0.325412 0.436263 -0.652699 -0.307123 0.130555 -0.272660 0.066199 0.313994 -0.089424 0.944823 0.694812 1.000000

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">

<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style>
  <script>
    const buttonEl =
      document.querySelector('#df-b5a45e62-8d46-44b2-8dd4-42bd09618271 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-b5a45e62-8d46-44b2-8dd4-42bd09618271');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Most probable variants

berteome can also be used to generate single residue substitution variants for the top k amino acids for a given residue in a protein. To generate the top 3 mutational variants for MENDEL the generate submodule can be loaded and used as follows:

from berteome import generate
generate.top_k_variants(mendel_berteome, 3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
sub seq
0 0subE EENDEL
1 0subK KENDEL
2 0subN NENDEL
3 1subL MLNDEL
4 1subK MKNDEL
5 1subI MINDEL
6 2subE MEEDEL
7 2subD MEDDEL
8 2subL MELDEL
9 3subL MENLEL
10 3subK MENKEL
11 3subE MENEEL
12 4subL MENDLL
13 4subD MENDDL
14 4subI MENDIL
15 5subE MENDEE
16 5subK MENDEK
17 5subN MENDEN

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">

<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style>
  <script>
    const buttonEl =
      document.querySelector('#df-2a466348-412a-470f-9ddd-c41f12392126 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-2a466348-412a-470f-9ddd-c41f12392126');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

This returns a dataframe with L x k possible single amino acid variants. - sub is the substitution id that indicates which residue was substitued with what amino acid following the pattern {residue_number}sub{substituted_amino_acid} - seq is the new variant sequence.

Random sequences

If you’d like to take the amino acid probabilities at each residue position to randomly generate proteins from the probability dataframe provided by berteome, you can use n_random_seqs

generate.n_random_seqs(mendel_berteome, 10)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
seq score
0 WYPDRI 0.035417
1 AIWDFM 0.042938
2 VIATPE 0.064649
3 APVMHK 0.048056
4 WRTYTY 0.031315
5 YESFPH 0.037034
6 YDEGGA 0.065425
7 PHTVQL 0.037249
8 FHNHWM 0.025564
9 ERAPYK 0.066202

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24" width="24px">

<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; } </style>
  <script>
    const buttonEl =
      document.querySelector('#df-5fb450f0-be93-455e-8ffe-acb8fa0bd578 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-5fb450f0-be93-455e-8ffe-acb8fa0bd578');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • seq is the randomly generated sequence
  • score is the average score of the amino acids chosen in the randomly generated sequence

Plots

from berteome import berteome_plot

If you would like to visualize what how wtScore varies across the sequence, do the following:

berteome_plot.wtScore_plot(mendel_berteome)
(<Figure size 432x288 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fc806ab2460>)

Additionally, you can plot the n_effective to visualize sites that the model infers as having a lower likelyhood of possible substitutions.

berteome_plot.n_effective_plot(mendel_berteome)
(<Figure size 432x288 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7fc80699d070>)

berteome also provides a method for visually inspecting the correlations of the amino acid predictions

berteome_plot.aa_correlation_plot(mendel_berteome)
<seaborn.matrix.ClusterGrid at 0x7fc80640bb80>

If you would like to get a visual of the berteome predictions in the form of a seqlogo, that can also be accomplished! Doing so potentially reqires having a few additional dependencies installed, something along the lines of:

!apt install ghostscript
!apt-get install -y pdf2svg
berteome_plot.seqlogo_plot(mendel_berteome)

Development

To build the library run the following

nbdev export

Then, pip install in a development environment

pip install -e '.[dev]'

I do quite a bit of work on a chromebook, which allows for doing stuff on github through codespace and also on google colab. To install a particular commit hash of berteome you can do the following:

!pip uninstall berteome
Found existing installation: berteome 0.1.5
Uninstalling berteome-0.1.5:
  Would remove:
    /usr/local/lib/python3.8/dist-packages/berteome-0.1.5.dist-info/*
    /usr/local/lib/python3.8/dist-packages/berteome/*
Proceed (y/n)? y
  Successfully uninstalled berteome-0.1.5
!pip install "berteome @ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2"
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2
  Cloning https://github.com/tijeco/berteome (to revision 1e104ce687ed38a21ff72e6b58960aff6e0be6a2) to /tmp/pip-install-luvjcq66/berteome_93d26519699d4c9c816c0799b9962b68
  Running command git clone --filter=blob:none --quiet https://github.com/tijeco/berteome /tmp/pip-install-luvjcq66/berteome_93d26519699d4c9c816c0799b9962b68
  Running command git rev-parse -q --verify 'sha^1e104ce687ed38a21ff72e6b58960aff6e0be6a2'
  Running command git fetch -q https://github.com/tijeco/berteome 1e104ce687ed38a21ff72e6b58960aff6e0be6a2
  Running command git checkout -q 1e104ce687ed38a21ff72e6b58960aff6e0be6a2
  Resolved https://github.com/tijeco/berteome to commit 1e104ce687ed38a21ff72e6b58960aff6e0be6a2
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (22.0.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (21.3)
Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.3.5)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.21.6)
Collecting seqlogo
  Downloading seqlogo-5.29.8.tar.gz (28 kB)
  Preparing metadata (setup.py) ... done
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.8/5.8 MB 42.6 MB/s eta 0:00:00
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (from berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.13.0+cu116)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2022.7)
Collecting weblogo
  Downloading weblogo-3.7.12-py3-none-any.whl (571 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 571.7/571.7 KB 39.2 MB/s eta 0:00:00
Collecting ghostscript
  Downloading ghostscript-0.7-py2.py3-none-any.whl (25 kB)
Requirement already satisfied: pytest in /usr/local/lib/python3.8/dist-packages (from seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (3.6.4)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (4.4.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (3.8.2)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 78.9 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 182.4/182.4 KB 19.9 MB/s eta 0:00:00
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2022.6.2)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (4.64.1)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2.25.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.15.0)
Requirement already satisfied: setuptools>=38.6.0 in /usr/local/lib/python3.8/dist-packages (from ghostscript->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (57.4.0)
Requirement already satisfied: py>=1.5.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.11.0)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (22.2.0)
Requirement already satisfied: atomicwrites>=1.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.4.1)
Requirement already satisfied: pluggy<0.8,>=0.5 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (0.7.1)
Requirement already satisfied: more-itertools>=4.0.0 in /usr/local/lib/python3.8/dist-packages (from pytest->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (9.0.0)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (4.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->transformers->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (2.10)
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from weblogo->seqlogo->berteome@ git+https://github.com/tijeco/berteome@1e104ce687ed38a21ff72e6b58960aff6e0be6a2) (1.7.3)
Building wheels for collected packages: berteome, seqlogo
  Building wheel for berteome (setup.py) ... done
  Created wheel for berteome: filename=berteome-0.1.5-py3-none-any.whl size=18184 sha256=5e861df9e62c18af9645aea4b0e8032d8d03e3aacbd11dd66a8567aa1bfe2e6d
  Stored in directory: /root/.cache/pip/wheels/b4/3a/ca/cdd13884728b51fc6a0b5a4d093d746507172dacf725c147dd
  Building wheel for seqlogo (setup.py) ... done
  Created wheel for seqlogo: filename=seqlogo-5.29.8-py2.py3-none-any.whl size=19417 sha256=dccf1fe6c88ff6821b6c5ffca1fe1fcb6a69cd35d05e55bcf250a8ad81538c3c
  Stored in directory: /root/.cache/pip/wheels/e7/f2/16/c7eb18def88636c56ccc5bf482af7ba59a135dc0eb437a125d
Successfully built berteome seqlogo
Installing collected packages: tokenizers, ghostscript, weblogo, huggingface-hub, transformers, seqlogo, berteome
Successfully installed berteome-0.1.5 ghostscript-0.7 huggingface-hub-0.11.1 seqlogo-5.29.8 tokenizers-0.13.2 transformers-4.25.1 weblogo-3.7.12

About

A library to analyze and explore protein sequences using BERT models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages