Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading Code for Webscraping, EDA, and Viz #33

Merged
merged 28 commits into from
Nov 16, 2023
Merged

Conversation

alankagiri
Copy link
Contributor

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@averyschoen averyschoen self-requested a review October 23, 2023 18:16
Copy link
Collaborator

@averyschoen averyschoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Alan, in addition to the in-line feedback that I gave please make the following adjustments:

  • Work within the file organization of the repository, get rid of the folder DATA_271_Data_Clinic_I, and organize your files into the correct folders
  • Designate 1 clean notebook that methodically answers each question that Trevor asked
  • Look at all of the data, not just 2003 -- work to combine as much of your data as possible
  • The web scraping code should be a .py script that will download all of the data into a digestible format for the notebook
  • Update the notebook readme w/ your file name and 1-2 sentence description of what the notebook is

DATA_271_Data_Clinic_I/Pennsylvania_filer.ipynb Outdated Show resolved Hide resolved
utils/PA_Data_Web_Scraper.py Outdated Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
utils/PA_constants.py Show resolved Hide resolved
Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:35Z
----------------------------------------------------------------

Adjust to use markdown headings to organize


Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:36Z
----------------------------------------------------------------

use a for-loop to clean this up


Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:37Z
----------------------------------------------------------------

allyears is not a descriptive name for a df, perhaps PA_contributions would be more appropriate


Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:38Z
----------------------------------------------------------------

why are the _x and _y columns? If they are supposed to be the same join on them or drop them (do exploration before deciding a path forward. If they are different change _x and _y to be the file source (i.e. filer or contributor)


averyschoen commented on 2023-11-07T18:22:39Z
----------------------------------------------------------------

Clean up the column names to be some consistent format

Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:39Z
----------------------------------------------------------------

adjust the axis label to year from year, no need to print out table if the barplot shows that information, use full names for all labels (put a space in FilerType)


Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:40Z
----------------------------------------------------------------

Similar comment as above, use proper label names, standardize capitalisation on the title, remove the table print out, whty are there a bunch of different unknowns (there should only be one and it should all be grouped together).


Copy link

review-notebook-app bot commented Oct 31, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-10-31T14:56:41Z
----------------------------------------------------------------

these could be deprecated abbreviations, don't group into unknown, use chatgpt as inspiration for what they might have been


Copy link
Collaborator

@averyschoen averyschoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Alan. Let me know if you have questions about the in-line feedback I have left here.

notebooks/PA_EDA.ipynb Outdated Show resolved Hide resolved
utils/PA_EDA_Functions.py Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
utils/PA_constants.py Show resolved Hide resolved
Copy link
Collaborator

@averyschoen averyschoen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you're making these revisions, keep in mind that all of the functions that you are using will be helping you get into the schema we outlined in google docs. I sent you a slack message regarding the expectations for function annotations. Please address the merge conflicts and finish the EDA you began in step 5

utils/PA_Data_Web_Scraper.py Show resolved Hide resolved
utils/PA_EDA_Functions.py Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
utils/PA_EDA_Functions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Clean up the column names to be some consistent format


View entire conversation on ReviewNB

Copy link

review-notebook-app bot commented Nov 7, 2023

View / edit / reply to this conversation on ReviewNB

averyschoen commented on 2023-11-07T18:23:54Z
----------------------------------------------------------------

This is incomplete, please wrap this up asap


@averyschoen averyschoen self-requested a review November 15, 2023 23:48
@averyschoen averyschoen merged commit 01da03a into dev Nov 16, 2023
1 check passed
@averyschoen averyschoen deleted the PA_EDA_and_Schema branch November 16, 2023 22:25
trevorspreadbury pushed a commit that referenced this pull request Mar 22, 2024
basics of makefile and added classify fns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants