Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fail: KeyError 'MIM Number' #106

Merged
merged 1 commit into from
Oct 15, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions omim2obo/parsers/omim_txt_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,8 +224,22 @@ def get_hgnc_map(filename, symbol_col, mim_col='MIM Number') -> Dict:
"""Get HGNC Map"""
map = {}
input_path = os.path.join(DATA_DIR, filename)
df = pd.read_csv(input_path, delimiter='\t', comment='#').fillna('')
df[mim_col] = df[mim_col].astype(int) # these were being read as floats
try:
df = pd.read_csv(input_path, delimiter='\t', comment='#').fillna('')
df[mim_col] = df[mim_col].astype(int) # these were being read as floats
# TODO: Need a better solution than this. Which should be: When these files are downloaded, should uncomment header
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue

I'm not sure why this suddenly started happening, where the header is missing from the files. This is what is causing the KeyError causing the action to fail.

The solution

The best way to fix this would be to do things at download time for all the files. But in the interest of time, I fixed it at the point of failure.
Need a better solution than this. Which should be: When these files are downloaded, should uncomment header

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What issue is this solving? Sorry I was in a big hole, but getting back to life now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all right. Thanks for checking in on this. I wanted more time to look more deeply into the root cause and fix the issue upstream in a better way, but basically I did a quick fix.

Basically what's happening is that when the TSVs are downloaded, there are a few comment lines, including the header is commented out. So we do a special operation to uncomment that header . For some reason it suddenly it wasn't getting uncommented

except KeyError:
with open(input_path, 'r') as f:
lines = f.readlines()
header = lines[3]
if not header.startswith('# Chromosome'):
raise RuntimeError(f'Error parsing header for: {input_path}')
lines[3] = header[2:]
with open(input_path, 'w') as f:
f.writelines(lines)
finally:
df = pd.read_csv(input_path, delimiter='\t', comment='#').fillna('')
df[mim_col] = df[mim_col].astype(int) # these were being read as floats

for index, row in df.iterrows():
symbol = row[symbol_col]
Expand All @@ -237,7 +251,7 @@ def get_hgnc_map(filename, symbol_col, mim_col='MIM Number') -> Dict:
return map


def parse_mim2gene(lines, filename='mim2gene.tsv', filename2='genemap2.tsv') -> Tuple[Dict, Dict, Dict]:
def parse_mim2gene(lines: List[str], filename='mim2gene.tsv', filename2='genemap2.tsv') -> Tuple[Dict, Dict, Dict]:
"""Parse OMIM # 2 gene file
todo: ideally replace this whole thing with pandas
todo: How to reconcile inconsistent mim#::hgnc_symbol mappings?
Expand Down
Loading