What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline with Spark NLP for Healthcare? #324
-
What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline? The use case is displaying the original document with clinical NER highlighting via the sparknlp_display.NerVisualizer, but with PHI masked. Ideally, I'd like to mask after running clinical NER so that any deid errors don't effect the NER. But then the entity begin and end won't line up, which would throw off the visualization. Is there another solution besides doing masking before running clinical NER? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Instead of masking you could also do obfuscation - since obfuscation replaces real data with fake data, the context will remain mostly the same and will have little effect on NER results. Another solution could be: |
Beta Was this translation helpful? Give feedback.
Instead of masking you could also do obfuscation - since obfuscation replaces real data with fake data, the context will remain mostly the same and will have little effect on NER results.
Another solution could be:
Run clinical NER
Run Deid > mask
get the results of the pipeline and re-assemble the masked results , and like you said, when masking happens, you can add/remove padding to align indexes, then you can use the raw_text parameter in the NerVisualizer to pass in this text directly. When you use raw_text="TEXT" , you don't need to supply document_col in the visualize, as it would directly use provided text to visualise.