ValueError: Unknown residues in the input sequence. #1

sarah872 · 2020-12-03T11:39:51Z

Hi,
I am running razor on my proteins as:

python3 razor.py -f proteins.fasta -o test

They come from an assembled transcriptome/ORFs called by transdecoder.

I am getting the following error:

Multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 64, in global_worker
    return _func(x)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 116, in wrapper
    **kwargs
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/data_types/series.py", line 20, in worker
    return series.apply(func, *args, **kwargs)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2327, in pandas._libs.lib.map_infer
  File "razor.py", line 139, in <lambda>
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "razor.py", line 71, in razor_predict
    newObj = detector.RAZOR(seq=seq, max_scan=max_scan)
  File "/scratch/user/razor/Razor/libs/detector.py", line 31, in __init__
    self.seq = functions.validate(seq, self.max_scan)
  File "/scratch/user/razor/Razor/libs/functions.py", line 77, in validate
    "Unknown residues in the input "
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "razor.py", line 159, in <module>
    main()
  File "razor.py", line 139, in main
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 462, in closure
    map_result,
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 396, in get_workers_result
    results = map_result.get()
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.

I tried to run a check using seqkit seq -v -V proteins.fasta, but that doesn't find the culprit residue. Do you have any other idea what I could try?

The text was updated successfully, but these errors were encountered:

bkb3 · 2020-12-03T23:05:05Z

Hi @sarah872 ,

The problem is because of the Unknown residues in the input sequence. We've pushed a fix (aa63c3a).
By default, we check for non standard residues within the first 95 residues. If your sequence has unknown residues within first 95 residues, you can use the parameter -m to reduce the maximum length to check/scan.

Let me know if it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Unknown residues in the input sequence. #1

ValueError: Unknown residues in the input sequence. #1

sarah872 commented Dec 3, 2020

bkb3 commented Dec 3, 2020 •

edited

Loading

ValueError: Unknown residues in the input sequence. #1

ValueError: Unknown residues in the input sequence. #1

Comments

sarah872 commented Dec 3, 2020

bkb3 commented Dec 3, 2020 • edited Loading

bkb3 commented Dec 3, 2020 •

edited

Loading