-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
F23 Final Merge #55
Merged
Merged
F23 Final Merge #55
Changes from 250 commits
Commits
Show all changes
333 commits
Select commit
Hold shift + click to select a range
3e07fb3
Merge branch 'dev' into michigan-campaign-eda
averyschoen fb2eb2c
Merge pull request #31 from dsi-clinic/michigan-campaign-eda
averyschoen 0ca6182
added skeleton for state cleaning base class
trevorspreadbury 1d354f7
Update README.md
nrposner 1b62f09
added proof of concept for requests based AZ scraper
trevorspreadbury 8c2d443
Merge pull request #39 from dsi-clinic/nrposner-patch-1
averyschoen 73d422b
Delete data/Contributions/Test/1998_mi_cfr_contributions.txt
averyschoen 83fe536
Delete notebooks/mi_campaign_eda.ipynb
necabotheking 257cf62
Delete notebooks/__init__.py
necabotheking b55a380
Delete data/MI_campaign_data.ipynb
necabotheking a6d3c84
update max line lengths
trevorspreadbury bab4e54
updated util and notebook README, revised EDA code based on TA input
yuzhouw313 16f8789
set isort and flake8 line lengths to 88 to deal with black 10% rule
trevorspreadbury c11ecfd
add scaffold of state cleaner class
trevorspreadbury 6a5df86
remove erroneous file
trevorspreadbury 3719523
Merge pull request #40 from dsi-clinic/cleaning/base_class
trevorspreadbury 6b2acfa
Merge remote-tracking branch 'origin/dev' into michigan-expenditure-eda
necabotheking 24413d5
updated docstrings for individual funcitons
nrposner 62ac675
added typing for args and kwargs in individual scrapers
nrposner 11b4db2
Create expenditure Constants
necabotheking 66b3c11
update
necabotheking 331c66b
Update README.md
necabotheking e7dd476
update MI expenditure EDA & constants
necabotheking 1405d9a
Update constants.py
necabotheking 4df1a92
Update mi_campaign_expenditure.ipynb
necabotheking 348fdd2
modifications to files after Avery's feedback
alankagiri a9cff26
resolving pull issue
alankagiri 1027778
Merge branch 'dev' into PA_EDA_and_Schema
alankagiri 354f549
update webscraper to include expenditure data and contribution data …
necabotheking 74f8f12
Update based on prior comments
necabotheking 207a58f
Update mi_campaign_webscraper.py
necabotheking 2a2185b
Update mi_campaign_webscraper.py
necabotheking aa03b0e
Update mi_campaign_webscraper.py
necabotheking caeb051
Update mi_campaign_webscraper.py
necabotheking 2c7e963
PA util functions
alankagiri 0e4d73f
created new crawler based on curl, base functionality established
nrposner e9ca764
some changes based on Trevor's input on util file
yuzhouw313 3ee5a0d
making progress on EDA since connected to cluster?
8974e71
Delete utils/az_web_crawler.py
nrposner 71dd5c3
notebook used to experiment with curl crawling
nrposner 657efed
Merge branch 'az_webcrawler_2' of https://github.com/dsi-clinic/2023-…
nrposner 6f9a72e
updated notebook readme
nrposner 3855804
adding state cleaner draft, utils file with cleaner functions, and up…
nrposner 522e0fe
Merge branch 'dev' into az_webcrawler_2
nrposner 725bebb
Seeing if linter test fails/passes
6e36229
one more check on linter test
eea6ae1
all linter tests passed
3d7790b
Delete DATA_271_Data_Clinic_I/Pennsylvania_Contributions.ipynb
alankagiri 5fa4ee2
Delete DATA_271_Data_Clinic_I directory
alankagiri d01ab3d
part 1 of EDA (not including expenditures) is done
a2b6556
Merge branch 'dev' into w4_MN_CompleteData_EDA
averyschoen 08378c5
Delete notebooks/az_webcrawler_3.ipynb
averyschoen 96a0fb1
Update clean.py
necabotheking 7ddc9cc
fixing in response to PR comments
nrposner 23336a9
Update constants.py
necabotheking 9586908
Removed VALUES_TO_CHECK
necabotheking f28a0d7
updated contribution notebook and util
yuzhouw313 afc3acf
previous commit has outdated EDA notebook
yuzhouw313 54bd8bc
Merge expenditure and contribution EDA
necabotheking e920267
fix linter
necabotheking 3a2abb5
small changes to constants
nrposner b5ed824
tried to add iteration
nrposner 9670d3d
experimented in notebook
nrposner 4ac13aa
adding notebook back in to fix merge
nrposner bbc6fab
uploading notebook readme
943fb35
Merge pull request #35 from dsi-clinic/michigan-web-scraper
averyschoen f5050f2
Merge branch 'dev' into michigan-expenditure-eda
necabotheking 70c87a8
Added pipeline implementation and edits to preprocess from team meeting
trevorspreadbury 8a9c726
Merge pull request #43 from dsi-clinic/create_statecleaner
trevorspreadbury 3f7514e
Implementing Wk6 feedback from Avery
c43d601
forgot to check linter tests. should work now
caef882
addressed Mon Avery's feedback and combined con and exp
yuzhouw313 27a69b9
updated basic curl crawler
nrposner 11561f1
Merge branch 'dev' into PA_EDA_and_Schema
alankagiri 2ab79e6
streamlined info table
nrposner 9b8c740
resolved all comments from Avery apart from EDA on expenditures
15f381b
Merge branch 'PA_EDA_and_Schema' of github.com:dsi-clinic/2023-fall-c…
6b01e8a
git push after resolving merge conflicts and linter test
c7aeb13
passed black test
yuzhouw313 034d462
need to access dev file
yuzhouw313 b7b5dbc
minor changes for the sake of merging
f145750
Delete notebooks/arizona_scraper_proof_of_concept.ipynb
averyschoen 56bb652
Delete notebooks/az_webcrawler_3.ipynb
averyschoen 789712e
saving work, no need for review
7e113e9
fixing averys requests
nrposner 2dc6fe5
bring branches up to date
nrposner 65c27d2
bring branches up to date
nrposner cf9b993
fixed rest of issues
nrposner ccf0d23
added dtype to cleaner
nrposner d01b863
Delete notebooks/az_webcrawler_3.ipynb
averyschoen 4b0d8d2
Delete utils/state_cleaner_draft.py
averyschoen 4c1cc16
Merge branch 'dev' into az_basic_crawler
averyschoen 00fec2d
Update clean.py
averyschoen 60a7498
Update clean.py
averyschoen ebb59a7
Merge pull request #44 from dsi-clinic/az_basic_crawler
averyschoen e2f0a57
Remove dropdown and simplify graphs
necabotheking 7f2277f
Merge branch 'dev' into michigan-expenditure-eda
necabotheking fac3335
Update constants.py
necabotheking e41525d
Merge pull request #41 from dsi-clinic/michigan-expenditure-eda
averyschoen 276a7e0
Merge branch 'dev' into PA_EDA_and_Schema
alankagiri 5e32d7e
major revelations about EDA... no need to look through yet
1fba923
major EDA revelations...no need to check yet
4565890
EDA with expenditure data done
ce30f96
first steps to cleaning
nrposner 5c37e58
adding state cleaner functionality and updating curl crawler
nrposner 87413d2
added state extraction functionality
nrposner 115ddc9
added state validation
nrposner a69b8bb
fixed case sensitivity for states
nrposner 404025a
first draft of MN abstract class, entity map not done
yuzhouw313 d45ab06
removed unnecessary crawler element, made more efficient
nrposner 3c88de0
just saving my work, no need for review
daeaee3
update method descriptions
521636a
Merge pull request #47 from dsi-clinic/abstractclassdescriptions
averyschoen 362c222
Merge branch 'dev' into MN_abstract_class
yuzhouw313 74b6e70
no need to check this commit, doing this before merging with dev
1f8db78
revised notebook changes
4e8c853
revised notebook changes
c9a0c4a
revised EDA after Avery's feedback
e7c38b3
linter tests passed after Avery's feedback
02e3a7b
final Eda
01da03a
Merge pull request #33 from dsi-clinic/PA_EDA_and_Schema
averyschoen 27cc164
minor update, commit to merge dev
yuzhouw313 b8a7756
second draft of MN abstract implement, added entity map
yuzhouw313 38503ee
restoring older commit to solve commit problems
be53e64
gitignore
5f30046
updated crawler, clean_utils, and clean to run smoothly from end to end
nrposner 70383a4
Update
necabotheking 40785c7
updated crawler and cleaner, almost complete end to end
nrposner 6d18723
Merge branch 'dev' into az_state_cleaner
nrposner 013c20f
changed class name to ArizonaCleaner
nrposner be9921c
Merge branch 'az_state_cleaner' of https://github.com/dsi-clinic/2023…
nrposner 828d76b
Merge branch 'dev' into w4_MN_CompleteData_EDA
yuzhouw313 0e916b4
Merge branch 'dev' into MN_abstract_class
yuzhouw313 09fe283
fixed linter issues and merging conflict
yuzhouw313 096d1ec
Merge remote-tracking branch 'refs/remotes/origin/MN_abstract_class' …
yuzhouw313 a5f23f6
Update constants.py
necabotheking fd2721f
fixed constant.py linter test
yuzhouw313 c5d375d
update raw data google drive link
yuzhouw313 9e4d021
updated some docstrings and info, addressing comments still in progress
nrposner ff4c83f
Delete utils/PA_constants.py
averyschoen 704e468
Merge pull request #36 from dsi-clinic/w4_MN_CompleteData_EDA
averyschoen 526b3a2
Implemented UUID mapping
necabotheking ca486e4
Delete notebooks/PennsylvaniaCleaner.py
averyschoen 83f29e1
commiting changes before merging
ccd06f8
Finished create_organizations and create_individuals()
necabotheking fdcbc47
finish MichiganCleaner() and rename EDA notebook
necabotheking d627c23
finished minnesota.py and tested in jupyter notebook, updated dev, ut…
yuzhouw313 a1d1e66
updated notebook descriptions
nrposner 1cf748f
updated AZ_EDA notebook to access needed data
nrposner 2d2cbc1
update on PACleaner thus far. Still working on create_Tables
b2c473e
Delete utils/pennsylvania_helper_functions.py
alankagiri 88d5ea8
made many changes for functionality and according to comments, employ…
nrposner a64d2a8
Delete utils/mn_state_cleaner.py
averyschoen 17633f4
Merge branch 'dev' of github.com:dsi-clinic/2023-fall-clinic-climate-…
fc74fe2
Merge branch 'Pennsylvania_State_Cleaner' of github.com:dsi-clinic/20…
73e7578
rework michigan cleaner and add ID_MAP output
necabotheking fc4de31
fix transactions bug and linter error
necabotheking 819fddc
preprocess done, create_tables almost done
54e5413
-linter check passed for pennsylvania.py
2ad79f5
-had to git rm PennsylvaniaCleaner.py to pass linter tests
6c09a75
moved the cleaner to its own file, updated crawler, cleaner, and add-ons
nrposner c28f285
updated filepaths and cleaner to run demo files
nrposner 9317cea
updated some docstrings, fixed some bugs, moved towards schema
nrposner 7c86a8c
changed name from arizona_cleaner to arizona
nrposner 475b096
added note about readme
nrposner af8605e
added utils readme
nrposner 1019f1e
updated readme
nrposner af60a51
remove functions and uncomment commented filepaths
necabotheking 6a1438c
improved code quality based on Nico's input and updated dev README
yuzhouw313 7b19a74
fixed minor issue in creating mapping table csv
yuzhouw313 aaad2c6
Merge MN_abstract_class into dev-f23
trevorspreadbury 01c104e
Merge remote-tracking branch 'origin/michigan-statecleaner' into dev-f23
trevorspreadbury 500e67a
updated filepaths and setup
nrposner 602862c
updated readme
nrposner b02ae29
Merge remote-tracking branch 'origin/Pennsylvania_State_Cleaner' into…
trevorspreadbury ecf212f
Merge remote-tracking branch 'origin/az_state_cleaner' into dev-f23
trevorspreadbury ac48cb2
ran minnesota on ipython with the whole dataset and produced right ou…
yuzhouw313 33a677a
updated readme
nrposner 9ee45a7
progress on pennsylvania_Cleaner
7b2f04f
progress on pennsylvania_Cleaner
e909b85
Delete utils/arizona_cleaner.py
averyschoen 28af64a
Delete utils/README.md
averyschoen 5c79dd6
Update README.md
averyschoen 51b490d
uncommented arizonacleaner in pipeline.py and imported
nrposner a832404
Update pipeline.py
averyschoen dcbf6ea
Update description in create_tables()
averyschoen 930b2b8
Update clean.py
averyschoen 261fa60
update for linter tests
40f92a1
Merge branch 'dev-f23' into MN_abstract_class
averyschoen 57390b3
Merge pull request #51 from dsi-clinic/MN_abstract_class
averyschoen 330ae38
Merge branch 'dev-f23' into az_state_cleaner
averyschoen 74bc643
Merge pull request #52 from dsi-clinic/az_state_cleaner
averyschoen 5f1a6fa
Delete notebooks/PA_EDA.ipynb
averyschoen f73bc10
update for linter
77b0d9f
addressed Avery's input and create 4 transaction table
yuzhouw313 05d41d1
fixed linter issue
yuzhouw313 70e241b
removed unused import
trevorspreadbury 0b64184
uploading statecleaner to dev-f23
257314c
uploading statecleaner to dev-f23
96ac8c8
uploading statecleaner to dev-f23
c862708
making sure latest updates show in dev-f23
d9299f8
Add 12/4 minnesota cleaner
yuzhouw313 4bb53ed
Merge pull request #50 from dsi-clinic/dev-f23
nrposner f6a6717
Merge branch 'dev' into MN_abstract_class
averyschoen 530a7d5
Merge pull request #46 from dsi-clinic/MN_abstract_class
averyschoen 173c16d
pushing modified clean.py before git checkout dev-f23
b4b25a1
completing merge to dev-f23
2563e27
uncommented other state cleaners
trevorspreadbury d970c92
Merge pull request #54 from dsi-clinic/dev-f23
trevorspreadbury 5ef6778
updated filepaths
nrposner f8e3dda
removed if main
nrposner e734cca
should pass linter now
nrposner a5b9e3a
Merge pull request #56 from dsi-clinic/arizona_corrections
averyschoen 3b250e6
Saving changes to old EDA before switching branches
2e54a77
forgot to check linter tests...these should be ok now
1ebe37f
Delete the duplicated minnesota.py
yuzhouw313 33ef2c6
Merge pull request #58 from dsi-clinic/yuzhouw313-patch-1
averyschoen 2835a15
remove pandas requirement and complete pipeline.py
necabotheking 7773d8c
Update requirements.txt with pipreqs
necabotheking 84bb956
Update README
necabotheking a35fba9
Update README.md
necabotheking 8650b2e
fix linter error
necabotheking 9c0df10
final revisions to pennsylvania_state_cleaner
75071e4
Merge branch 'dev' of github.com:dsi-clinic/2023-fall-clinic-climate-…
b5e6622
trying to solve linter test failure
dd44abe
updated docstrings and added transactions splitter
nrposner 1e1516f
Merge pull request #59 from dsi-clinic/README-edits
trevorspreadbury f806264
Merge pull request #61 from dsi-clinic/arizona_corrections
trevorspreadbury 21bb3f2
Merge branch 'dev' into new_pennsylvania_state_cleaner
trevorspreadbury 715887a
Merge pull request #60 from dsi-clinic/new_pennsylvania_state_cleaner
trevorspreadbury 897c21c
fix minnesota bugs that prevented pipeline from running
trevorspreadbury 0b08647
Update PA webscraper to save each year to separate directories
trevorspreadbury b90a96e
update PA data readme
trevorspreadbury 0860346
clean PA and eliminate bugs preventing pipeline from running
trevorspreadbury 9bc8d11
fix docstrings in StateCleaner
trevorspreadbury 4c25054
clean pennsylvania, re-order methods
trevorspreadbury 71a3e39
updated michigan code
trevorspreadbury fc28b3c
Merge remote-tracking branch 'origin/dev' into PA_EDA_and_Schema
trevorspreadbury 0780643
Merge pull request #57 from uchicago-dsi/PA_EDA_and_Schema
trevorspreadbury 035ac6b
move deprecated helper function for az into notebook
trevorspreadbury 2018dc6
moved scrapers into new scraper module
trevorspreadbury 414e32b
update scrapers, AZ is WIP
trevorspreadbury b1e483a
move az helper functions to arizoner cleaner
trevorspreadbury 34e439e
refactor functions to be more general for 'detailed' and other endpoints
trevorspreadbury c1867c4
fixed az scraper headers
trevorspreadbury 068e1d5
working arizona scraper
trevorspreadbury fc21f1b
update arizona cleaner to work on subset
trevorspreadbury aab3204
Merge branch 'dev' of github.com:dsi-clinic/2023-fall-clinic-climate-…
trevorspreadbury 6608c25
update states to return a single transactions table
trevorspreadbury a49c00b
fix pre-commit errors
trevorspreadbury File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -138,3 +138,4 @@ venv.bak/ | |
|
||
# data files | ||
*.avro | ||
data/*.txt |
Submodule 2023-fall-clinic-climate-cabinet
deleted from
9b0d34
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,15 @@ | ||
# 2023-fall-clinic-climate-cabinet | ||
|
||
## Project Background | ||
## Data Science Clinic Project Goals | ||
|
||
[Please add project background] | ||
1. Collect state's political campaign finance report data which should include | ||
recipient information, donor information, and transaction information. | ||
2. Preprocess, clean, and standardize the collected raw data across 4 states | ||
by implementing state cleaner abstract class | ||
3. Conduct Exploratory Data Analysis, facilitate the examination of | ||
the conribution made by green energy company versus that by fossil | ||
fuel company in terms of state's political campaign activity | ||
|
||
## Project Goals | ||
|
||
[Please add project background] | ||
|
||
## Usage | ||
|
||
|
@@ -31,29 +34,17 @@ If you prefer to develop inside a container with VS Code then do the following s | |
3. Click the blue or green rectangle in the bottom left of VS code (should say something like `><` or `>< WSL`). Options should appear in the top center of your screen. Select `Reopen in Container`. | ||
|
||
|
||
### Project Pipeline | ||
|
||
1. Collect state's finance campaign data either from web scraping (AZ, MI, PA) or direct download (MN) | ||
2. User can go to [this shared Google Drive]('https://drive.google.com/drive/u/2/folders/1HUbOU0KRZy85mep2SHMU48qUQ1ZOSNce') to download each state's data to their local repo following this format: repo_root / "data" / "raw" / <State Initial> / "file" | ||
3. Open in development container which installs all necessary packages. | ||
4. Use utils/pipeline.py to preprocess, clean, standardize, and create tables for each state and ultimately concatinate tables across 4 states into a comprehensive database | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "use utils/pipeline.py" -- how? as a module? as a script? make it clear for new users (ie 'run |
||
5. The final result should be an individual DataFrame, an organization DataFrame, and a list of transaction DataFrames. The tables combine all data in AZ, MI, MN, PA datasets | ||
6. For future reference, the above pipeline also stores the information mapping given id to our database id (generated via uuid) in a csv file in the format of (state)IDMap.csv in the output folder | ||
|
||
## Repository Structure | ||
|
||
### utils | ||
Project python code | ||
|
||
### notebooks | ||
Contains short, clean notebooks to demonstrate analysis. | ||
|
||
### data | ||
|
||
Contains details of acquiring all raw data used in repository. If data is small (<50MB) then it is okay to save it to the repo, making sure to clearly document how to the data is obtained. | ||
|
||
If the data is larger than 50MB than you should not add it to the repo and instead document how to get the data in the README.md file in the data directory. | ||
|
||
This [README.md file](/data/README.md) should be kept up to date. | ||
|
||
### output | ||
Should contain work product generated by the analysis. Keep in mind that results should (generally) be excluded from the git repository. | ||
|
||
## Team Members | ||
|
||
## Team Member | ||
Student Name: April Wang | ||
Student Email: [email protected] | ||
|
||
|
@@ -64,4 +55,4 @@ Student Name: Aïcha Camara | |
Student Email: [email protected] | ||
|
||
Student Name: Alan Kagiri | ||
Student Email: [email protected]. | ||
Student Email: [email protected]. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an either/or not sequential steps right? Make that clear