Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Unknown residues in the input sequence. #1

Open
sarah872 opened this issue Dec 3, 2020 · 1 comment
Open

ValueError: Unknown residues in the input sequence. #1

sarah872 opened this issue Dec 3, 2020 · 1 comment

Comments

@sarah872
Copy link

sarah872 commented Dec 3, 2020

Hi,
I am running razor on my proteins as:

python3 razor.py -f proteins.fasta -o test

They come from an assembled transcriptome/ORFs called by transdecoder.

I am getting the following error:

Multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 64, in global_worker
    return _func(x)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 116, in wrapper
    **kwargs
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/data_types/series.py", line 20, in worker
    return series.apply(func, *args, **kwargs)
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandas/core/series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2327, in pandas._libs.lib.map_infer
  File "razor.py", line 139, in <lambda>
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "razor.py", line 71, in razor_predict
    newObj = detector.RAZOR(seq=seq, max_scan=max_scan)
  File "/scratch/user/razor/Razor/libs/detector.py", line 31, in __init__
    self.seq = functions.validate(seq, self.max_scan)
  File "/scratch/user/razor/Razor/libs/functions.py", line 77, in validate
    "Unknown residues in the input "
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "razor.py", line 159, in <module>
    main()
  File "razor.py", line 139, in main
    df['Analysis_'] = df['Sequence'].parallel_apply(lambda x: razor_predict(x, m))
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 462, in closure
    map_result,
  File "/home/00_scripts/py3-venv/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 396, in get_workers_result
    results = map_result.get()
  File "/apps/python3/3.7.0/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: Unknown residues in the input sequence.
 Only standard amino acid codes are allowed.

I tried to run a check using seqkit seq -v -V proteins.fasta, but that doesn't find the culprit residue. Do you have any other idea what I could try?

@bkb3
Copy link
Contributor

bkb3 commented Dec 3, 2020

Hi @sarah872 ,

The problem is because of the Unknown residues in the input sequence. We've pushed a fix (aa63c3a).
By default, we check for non standard residues within the first 95 residues. If your sequence has unknown residues within first 95 residues, you can use the parameter -m to reduce the maximum length to check/scan.

Let me know if it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants