Skip to content

Commit

Permalink
Split python script (#57)
Browse files Browse the repository at this point in the history
* split-python-script

* split-python-script
  • Loading branch information
friendlymatthew authored Jan 22, 2024
1 parent 800b5b3 commit ebc3633
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 5 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/example-client.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
# Fetch the data in workspace
cd examples/workspace
python3 -m pip install -r requirements.txt
python3 fetch_data.py
python3 fetch_jsonl.py
cd -
# Build the index
Expand Down
3 changes: 2 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ cd workspace
# green tripdata
python3 -m pip install -r requirements.txt

python3 fetch_data.py
# fetch data with .jsonl format
python3 fetch_jsonl.py
```

Then run the indexing process:
Expand Down
11 changes: 11 additions & 0 deletions examples/workspace/fetch_csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Data taken from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

import io

import pandas as pd
import requests

response = requests.get('https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet')

df = pd.read_parquet(io.BytesIO(response.content))
df.to_csv('green_tripdata_2023-01.csv', index=False)
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,3 @@
response = requests.get('https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet')

pd.read_parquet(io.BytesIO(response.content)).to_json('green_tripdata_2023-01.jsonl', orient='records', lines=True)

df = pd.read_parquet(io.BytesIO(response.content))
df.to_csv('green_tripdata_2023-01.csv', index=False)

0 comments on commit ebc3633

Please sign in to comment.