What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline with Spark NLP for Healthcare? #324

JustHeroo · 2021-08-25T18:09:14Z

JustHeroo
Aug 25, 2021
Collaborator

What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline? The use case is displaying the original document with clinical NER highlighting via the sparknlp_display.NerVisualizer, but with PHI masked. Ideally, I'd like to mask after running clinical NER so that any deid errors don't effect the NER. But then the entity begin and end won't line up, which would throw off the visualization. Is there another solution besides doing masking before running clinical NER?

Answered by JustHeroo

Aug 25, 2021

Instead of masking you could also do obfuscation - since obfuscation replaces real data with fake data, the context will remain mostly the same and will have little effect on NER results.

Another solution could be:
Run clinical NER
Run Deid > mask
get the results of the pipeline and re-assemble the masked results , and like you said, when masking happens, you can add/remove padding to align indexes, then you can use the raw_text parameter in the NerVisualizer to pass in this text directly. When you use raw_text="TEXT" , you don't need to supply document_col in the visualize, as it would directly use provided text to visualise.

View full answer

JustHeroo · 2021-08-25T18:12:36Z

JustHeroo
Aug 25, 2021
Collaborator Author

Instead of masking you could also do obfuscation - since obfuscation replaces real data with fake data, the context will remain mostly the same and will have little effect on NER results.

Another solution could be:
Run clinical NER
Run Deid > mask
get the results of the pipeline and re-assemble the masked results , and like you said, when masking happens, you can add/remove padding to align indexes, then you can use the raw_text parameter in the NerVisualizer to pass in this text directly. When you use raw_text="TEXT" , you don't need to supply document_col in the visualize, as it would directly use provided text to visualise.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline with Spark NLP for Healthcare? #324

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline with Spark NLP for Healthcare? #324

JustHeroo Aug 25, 2021 Collaborator

Replies: 1 comment

JustHeroo Aug 25, 2021 Collaborator Author

JustHeroo
Aug 25, 2021
Collaborator

JustHeroo
Aug 25, 2021
Collaborator Author