Skip to content

Automatically pseudo-anonymise name of people in Cour des Comptes's jurisprudence

License

Notifications You must be signed in to change notification settings

armgilles/parisfellows_anonymize

Repository files navigation

parisfellows_anonyme

Automatically pseudo-anonymise name of people in Cour des Comptes's jurisprudence

  • We explore 138 documents.
  • We have more than 12 k different words.
  • We have more 420 k words (with 3147 positive / others are negative)

How to :

Donwload data from this link then dezip it. You should see a directory dataon root.

**Run script : **

  • python reading_doc_files.py --> Create data.csv file with all features and structure
  • python trainning.py --> Train the model and give some metrics
  • get_prediction.py --> Read & processs a .docx (line 220) to anonymise it in ouput directory.

Create ouput files :

  • [name_of_file]_log.csv : Log of this file (warning is a bool)
  • [name_of_file].txt : Return the text with anonymise result.
  • [name_of_file].html : Return the text in html balise with color (green seems OK, Red mean warning this could be a error).

result of html file :

image

About

Automatically pseudo-anonymise name of people in Cour des Comptes's jurisprudence

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages