Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

25 detect sensitive fields #134

Merged
merged 3 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ dev-testing/.DS_Store
.env
.venv
venv
formfyxer/keys/**
16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# CHANGELOG


## Version v0.3.0

### Added
* Add warning when sensitive fields are detected by @codestronger in https://github.com/SuffolkLITLab/RateMyPDF/issues/25

### Changed
N/A

### Fixed
N/A

**Full Changelog**: https://github.com/SuffolkLITLab/FormFyxer/compare/v0.2.0...v0.3.0

## Version v0.2.0

### Added
Expand All @@ -22,7 +36,7 @@

### Fixed

* If GPT-3 says the readability is too high (i.e. high likelyhood we have garabage), we will use ocrmypydf to re-evaluate the text in a PDF (https://github.com/SuffolkLITLab/FormFyxer/commit/a6dcd9872d2d0a6542f687aa46b1b9b00f16d3e5)
* If GPT-3 says the readability is too high (i.e. high likelihood we have garbage), we will use ocrmypydf to re-evaluate the text in a PDF (https://github.com/SuffolkLITLab/FormFyxer/commit/a6dcd9872d2d0a6542f687aa46b1b9b00f16d3e5)
* Adds more actionable information to the stats returned from `parse_form` (https://github.com/SuffolkLITLab/FormFyxer/pull/83):
* Gives more context for citations in found in the text: https://github.com/SuffolkLITLab/FormFyxer/pull/83/commits/b62bd41958fc1bd0373b7698adde1a234779f77a

Expand Down
34 changes: 29 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,12 @@ Functions from `pdf_wrangling` are found on [our documentation site](https://suf
- [Parameters:](#parameters-10)
- [Returns:](#returns-10)
- [Example:](#example-10)
- [formfyxer.get\_sensitive\_fields(fields)](#formfyxerget_sensitive_fieldsfields)
- [Parameters:](#parameters-11)
- [Returns:](#returns-11)
- [Example:](#example-11)
- [License](#license)


### formfyxer.re_case(text)
Reformats snake_case, camelCase, and similarly-formatted text into individual words.
#### Parameters:
Expand All @@ -99,9 +102,9 @@ A string where words combined by cases like snake_case are split back into indiv


### formfyxer.regex_norm_field(text)
Given an auto-generated field name (e.g., those applied by a PDF editor's find form feilds function), this function uses regular expressions to replace common auto-generated field names for those found in our [standard field names](https://suffolklitlab.org/docassemble-AssemblyLine-documentation/docs/label_variables/).
Given an auto-generated field name (e.g., those applied by a PDF editor's find form fields function), this function uses regular expressions to replace common auto-generated field names for those found in our [standard field names](https://suffolklitlab.org/docassemble-AssemblyLine-documentation/docs/label_variables/).
#### Parameters:
* **text : str** A string of words, such as that found in an auto-generated field name (e.g., those applied by a PDF editor's find form feilds function).
* **text : str** A string of words, such as that found in an auto-generated field name (e.g., those applied by a PDF editor's find form fields function).
#### Returns:
Either the original string/field name, or if a standard field name is found, the standard field name.
#### Example:
Expand All @@ -124,7 +127,7 @@ A snake_case string summarizing the input sentence.
#### Example:
```python
>>> import formfyxer
>>> reformat_field("this is a variable where you fill out your name")
>>> formfyxer.reformat_field("this is a variable where you fill out your name")
'variable_fill_name'
```
[back to top](#formfyxer)
Expand Down Expand Up @@ -345,7 +348,7 @@ A string with a proposed plain language rewrite.
### formfyxer.describe_form(text)
An OpenAI-enabled tool that will write a draft plain language description for a form. In order to use this feature **you must edit the `openai_org.txt` and `openai_key.txt` files found in this package to contain your OpenAI credentials**. You can sign up for an account and get your token on the [OpenAI signup](https://beta.openai.com/signup).

Given a string conataining the full text of a court form, this function will return its a draft description of the form written in plain language.
Given a string containing the full text of a court form, this function will return its a draft description of the form written in plain language.

#### Parameters:
* **text : str** text.
Expand Down Expand Up @@ -444,6 +447,27 @@ An object grouping together similar field names.
[back to top](#formfyxer)



### formfyxer.get_sensitive_data_types(fields, fields_old)
Given a list of fields, identify those related to sensitive information and return a dictionary with the sensitive fields grouped by type. A list of the old field names can also be provided. These fields should be in the same order. Passing the old field names allows the sensitive field algorithm to match more accurately. The return value will not contain the old field name, only the corresponding field name from the first parameter.

The sensitive field types are: Bank Account Number, Credit Card Number, Driver's License Number, and Social Security Number.
#### Parameters:
* **fields : List[str]** List of field names.
#### Returns:
List of sensitive fields found within the fields passed in.
#### Example:
```python
>>> import formfyxer
>>> formfyxer.get_sensitive_data_types(["users1_name", "users1_address", "users1_ssn", "users1_routing_number"])
{'Social Security Number': ['users1_ssn'], 'Bank Account Number': ['users1_routing_number']}
>>> formfyxer.get_sensitive_data_types(["user_ban1", "user_credit_card_number", "user_cvc", "user_cdl", "user_social_security"], ["old_bank_account_number", "old_credit_card_number", "old_cvc", "old_drivers_license", "old_ssn"])
{'Bank Account Number': ['user_ban1'], 'Credit Card Number': ['user_credit_card_number', 'user_cvc'], "Driver's License Number": ['user_cdl'], 'Social Security Number': ['user_social_security']}
```
[back to top](#formfyxer)



## License
[MIT](https://github.com/SuffolkLITLab/FormFyxer/blob/main/LICENSE)

Expand Down
Loading