Skip to content

What is best practice for running both deid_ner/mask and another clinical NER in the same pipeline with Spark NLP for Healthcare? #324

Discussion options

You must be logged in to vote

Instead of masking you could also do obfuscation - since obfuscation replaces real data with fake data, the context will remain mostly the same and will have little effect on NER results.

Another solution could be:
Run clinical NER
Run Deid > mask
get the results of the pipeline and re-assemble the masked results , and like you said, when masking happens, you can add/remove padding to align indexes, then you can use the raw_text parameter in the NerVisualizer to pass in this text directly. When you use raw_text="TEXT" , you don't need to supply document_col in the visualize, as it would directly use provided text to visualise.

Replies: 1 comment

Comment options

JustHeroo
Aug 25, 2021
Collaborator Author

You must be logged in to vote
0 replies
Answer selected by JustHeroo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant