Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

25 detect sensitive fields #134

Merged
merged 3 commits into from
Jun 19, 2024
Merged

25 detect sensitive fields #134

merged 3 commits into from
Jun 19, 2024

Conversation

codestronger
Copy link
Contributor

Updates for detecting sensitive fields: SuffolkLITLab/RateMyPDF#25

@codestronger
Copy link
Contributor Author

Oh, didn't expect it would publish the package on a PR branch, especially w/ failing tests. I used 0.3.0a2 as the version, so doubt anyone will grab this. Not sure what the status of 0.3.0 is... Seems like we're in some sort of alpha still?

Code in docx_wrangling & pdf_wrangling is causing the mypy failures. Didn't make any changes to those files. Seems like the changes are from the pdf_context_extract merge... Any ideas if those are crucial and need fixing? I'm not familiar w/ mypy but I can dig into it if it's a priority.

@BryceStevenWilley
Copy link
Contributor

That GH action is a bit misleading, nothing is published until you make a GH tag. The action still runs to make sure it can package correctly, but it doesn't publish the new version (you can see that here). You can check https://pypi.org/project/formfyxer/#history to see that there's not a new version there. IMO 0.3.0a2 is fine, it still need more testing to make sure it can be used as a dependency without issue (i.e. the problems we had when redeploying RateMyPDF were all issues with FormFyxer).

It'd be nice if you could look into the mypy issues, but if you can't, I can fix them this weekend. Those issues pop up because our version of mypy increments, and it gets better at finding mismatching types that could cause issues.

@BryceStevenWilley
Copy link
Contributor

Realizing the mypy issues might be because it looks like you merged https://github.com/SuffolkLITLab/FormFyxer/tree/pdf_context_extract into your branch, which wasn't in main before. Idk how well reviewed that code is (even though I wrote it, lol). Might be worth getting it, but I'd want to make a separate PR and review for that, which should also fix up some of the mypy issues. I can also do that this weekend.

Copy link
Contributor

@BryceStevenWilley BryceStevenWilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things I think you should change, unless @nonprofittechy has a better vision for the feature that I don't know about yet.

CHANGELOG.md Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
formfyxer/lit_explorer.py Outdated Show resolved Hide resolved
Copy link
Member

@nonprofittechy nonprofittechy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I agree with Bryce's feedback within the limits of what we have time to do. Also we should take him up on the offer to finish getting tests passing this weekend. You can ping me again if useful or @BryceStevenWilley, you should also be able to add the approving review when you've finished the tests.

@BryceStevenWilley
Copy link
Contributor

So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert 2c01ae0 and 9bfea33 from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History?

@codestronger
Copy link
Contributor Author

So mypy on main and #137 is fixed. Looking a bit closer, I don't think this PR needs #137 though? @codestronger, can you confirm that and if so, drop or revert 2c01ae0 and 9bfea33 from this PR, i.e. https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History?

Sounds good. I can rebase this PR on the current main. @nonprofittechy merged the pdf_context_extract branch back in Feb and it's been live as far as I'm aware. Others might have pulled it down too. Was it safe to rewrite it out of the history? At minimum, we should merge the new version of the pdf_context_extract before deploying my changes, so that we don't lose any functionality from a deploy.

@nonprofittechy
Copy link
Member

I don't remember why I merged this branch in February, unfortunately.

@codestronger
Copy link
Contributor Author

I don't remember why I merged this branch in February, unfortunately.

@BryceStevenWilley I'll proceed with the plan that assumes pdf_context_extract was an accidental merge. We discussed the various downstream users of FormFyxer and as far as we can tell, no one is depending on the features in that branch.

I also plan to officially take the 0.3.0 version number for this branch. Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed. Now that we've separated out pdf_context_extract, maybe we don't need all of those yet.

Copy link
Contributor

@BryceStevenWilley BryceStevenWilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff looks much better to me! +1 for changing to 0.3.01, and +1 for merging.

Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed

I think those are still necessary, those dependencies started being used in an earlier commit I think.

Footnotes

  1. We were using alphas because this package has been notoriously difficult to use as a dependency, and without proper testing it can cause issues. I think these changes will be tested enough at this point.

… dictionary of sensitive data types, with a list of the matching field names.
@codestronger
Copy link
Contributor Author

Diff looks much better to me! +1 for changing to 0.3.01, and +1 for merging.

Will also review the dependency updates I made since that was done to bring in everything being used by the code that was deployed

I think those are still necessary, those dependencies started being used in an earlier commit I think.

Footnotes

  1. We were using alphas because this package has been notoriously difficult to use as a dependency, and without proper testing it can cause issues. I think these changes will be tested enough at this point.

Ah, okay! I'll leave the dependencies alone then. In any case, we'll need them in the future.

@codestronger codestronger merged commit c50a8a0 into main Jun 19, 2024
3 checks passed
@codestronger codestronger deleted the 25-detect-sensitive-fields branch June 19, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants