This document describes how to collect feedback for SemEHR annotations for training machine learning models for improving baseline results.
- Convert SemEHR annotation results to eHOST format. This step uses
ann_converter.py
at the root folder of this repository.python ann_converter.py text_file semehr_ann_file cui2label_mapping_file output_xml_file
text_file
is the full text filesemehr_ann_file
is the SemEHR result json filecui2lable_mapping_file
is a json file to map UMLS CUI to a label, see below as an example for mapping two CUIs toIschemic stroke
.
{ "C0948008": "Ischemic stroke", "C3178801": "Ischemic stroke", "C0859253": "Haemorrhage stroke" }
- Annotation process: use eHOST to load the outputs and ask the annotators to do three things:
delete
not relevant annotationsadd
missed annotations using relevant labels from the above mapping filecorrect
mislabelled annotations by changing the class to a correct label