Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: transitFeedSyncProcessor implmentation #760

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2a452a0
add feed sync base code
davidgamez Sep 23, 2024
9b77034
code clean up
davidgamez Sep 24, 2024
9a1f8c6
enhance documentation and add pubsub print script
davidgamez Sep 24, 2024
5ab1120
enhance debug file documentation
davidgamez Sep 24, 2024
1f99509
fix lint
davidgamez Sep 24, 2024
d6f0832
feat: Implement Transit Feed Sync
AlfredNwolisa Oct 8, 2024
da98956
feat: Implement Transit Feed Sync with feed status tracking and exter…
AlfredNwolisa Oct 8, 2024
46be537
feat: Implement Transit Feed Sync with feed status tracking and exter…
AlfredNwolisa Oct 8, 2024
acdd782
feat: Implemented process_sync(),get_data(),extract_feeds_data()
AlfredNwolisa Oct 8, 2024
d8433a5
updated dotenv_path
AlfredNwolisa Oct 8, 2024
2486dfa
Merge branch 'main' into feat/transitland-feed-sync
AlfredNwolisa Oct 10, 2024
cab14e8
1. updated external_id to refrence: feed_onestop_id from operator_one…
AlfredNwolisa Oct 10, 2024
f164c9d
Add URL check and pandas integration to feed sync
AlfredNwolisa Oct 16, 2024
86478e9
Merge remote-tracking branch 'origin/feat/transitland-feed-sync' into…
AlfredNwolisa Oct 16, 2024
d2c6d47
Added unit tests for TransitFeedSyncProcessor methods
AlfredNwolisa Oct 16, 2024
b0ca74f
Added unit tests for TransitFeedSyncProcessor methods
AlfredNwolisa Oct 16, 2024
7f34d23
Refactor code formatting
AlfredNwolisa Oct 16, 2024
a3855a2
Add missing newline at end of files
AlfredNwolisa Oct 17, 2024
c4060d2
Renamed variables and improved code formatting
AlfredNwolisa Oct 17, 2024
2a769d8
Reformatted the `get_data` method call for `operators_data` to improv…
AlfredNwolisa Oct 17, 2024
de385d6
Update feed sync logic and refactor tests
AlfredNwolisa Oct 18, 2024
f512066
Increase URL check timeout to 25 seconds.
AlfredNwolisa Oct 18, 2024
5b0d28f
Refactor and format feed_sync_dispatcher_transitland code
AlfredNwolisa Oct 21, 2024
59bc0f9
Refactor and format feed_sync_dispatcher_transitland code
AlfredNwolisa Oct 21, 2024
7c13963
Add source field and fix whitespace in feed_sync_dispatcher
AlfredNwolisa Oct 21, 2024
584df25
Update functions-python/feed_sync_dispatcher_transitland/README.md
AlfredNwolisa Oct 22, 2024
3416fa1
Update README.md
AlfredNwolisa Oct 22, 2024
76e6586
Merge branch 'main' into feat/transitland-feed-sync
davidgamez Oct 22, 2024
f929048
restore main.py
davidgamez Oct 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions functions-python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,24 @@ The function configuration file contains the following properties:
- `min_instance_count`: The minimum number of function instances that can be created in response to a load.
- `available_cpu_count`: The number of CPU cores that are available to the function.

# Local Setup

## Requirements
The requirements to run the functions locally might differ depending on the Google cloud dependencies. Please refer to each function to make sure all the requirements are met.

## Install the Google Cloud SDK
To be able to run the functions locally, the Google Cloud SDK should be installed. Please refer to the [Google Cloud SDK documentation](https://cloud.google.com/sdk/docs/install) for more information.

## Install the Google Cloud Emulators

```bash
gcloud components install cloud-datastore-emulator
```

- Install the Pub/Sub emulator
```bash
gcloud components install pubsub-emulator
```

# Useful scripts
- To locally execute a function use the following command:
Expand Down
10 changes: 10 additions & 0 deletions functions-python/feed_sync_dispatcher_transitland/.coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[run]
omit =
*/test*/*
*/database_gen/*
*/dataset_service/*
*/helpers/*

[report]
exclude_lines =
if __name__ == .__main__.:
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Environment variables for tokens function to run locally. Delete this line after rename the file.
FEEDS_DATABASE_URL=postgresql://postgres:postgres@localhost:5432/MobilityDatabase
PROJECT_ID=my-project-id
PUBSUB_TOPIC_NAME=my-topic
TRANSITLAND_API_KEY=your-api-key
79 changes: 79 additions & 0 deletions functions-python/feed_sync_dispatcher_transitland/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Batch Datasets
This directory contains the GCP serverless function that triggers the sync feeds in transitland.
The function publish one Pub/Sub message per transitland feed to be synced.
```json
{
"message": {
"data":
{
external_id="feeds_onestop_id",
feed_id="feed_id",
execution_id=execution_id,
feed_url="feed_url",
spec="spec",
auth_info_url="auth_info_url",
auth_param_name="auth_param_name",
type="type",
operator_name="operator_name",
country="country",
state_province="state_province",
city_name="city_name",
source="TLD",
payload_type=payload_type
}
}
}
```

# Function configuration
The function is configured using the following environment variables:
- `PUBSUB_TOPIC`: The Pub/Sub topic to publish the messages to.
- `PROJECT_ID`: The GCP Project id.
- `TRANSITLAND_API_KEY`: The Transitland API key(secret).

# Local development
The local development of this function follows the same steps as the other functions.

Install Google Pub/Sub emulator, please refer to the [README.md](../README.md) file for more information.

## Python requirements

- Install the requirements
```bash
pip install -r ./functions-python/feed_sync_dispatcher_transitland/requirements.txt
```

## Test locally with Google Cloud Emulators

- Execute the following commands to start the emulators:
```bash
gcloud beta emulators pubsub start --project=test-project --host-port='localhost:8043'
```

- Create a Pub/Sub topic in the emulator:
```bash
curl -X PUT "http://localhost:8043/v1/projects/test-project/topics/feed-sync-transitland"
```

- Start function
```bash
export PUBSUB_EMULATOR_HOST=localhost:8043 && ./scripts/function-python-run.sh --function_name feed_sync_dispatcher_transitland
```

- [Optional]: Create a local subscription to print published messages:
```bash
./scripts/pubsub_message_print.sh feed-sync-transitland
```

- Execute function
```bash
curl http://localhost:8080
```

- To run/debug from your IDE use the file `main_local_debug.py`

# Test
- Run the tests
```bash
./scripts/api-tests.sh --folder functions-python/feed_sync_dispatcher_transitland
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "feed-sync-dispatcher-transitland",
"description": "Feed Sync Dispatcher for Transitland",
"entry_point": "feed_sync_dispatcher_transitland",
"timeout": 540,
"memory": "512Mi",
"trigger_http": true,
"include_folders": ["database_gen", "helpers"],
"secret_environment_variables": [
{
"key": "FEEDS_DATABASE_URL"
}
],
"ingress_settings": "ALLOW_INTERNAL_AND_GCLB",
"max_instance_request_concurrency": 20,
"max_instance_count": 10,
"min_instance_count": 0,
"available_cpu": 1
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Code to be able to debug locally without affecting the runtime cloud function


# Requirements:
# - Google Cloud SDK installed
# - Make sure to have the following environment variables set in your .env.local file
# - Local database in running state
# - Follow the instructions in the README.md file
#
# Usage:
# - python feed_sync_dispatcher_transitland/main_local_debug.py

from src.main import feed_sync_dispatcher_transitland
from dotenv import load_dotenv

# Load environment variables from .env.local
load_dotenv(dotenv_path=".env.local_test")

if __name__ == "__main__":

class RequestObject:
def __init__(self, headers):
self.headers = headers

request = RequestObject({"X-Cloud-Trace-Context": "1234567890abcdef"})
feed_sync_dispatcher_transitland(request)
20 changes: 20 additions & 0 deletions functions-python/feed_sync_dispatcher_transitland/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Common packages
functions-framework==3.*
google-cloud-logging
psycopg2-binary==2.9.6
aiohttp~=3.10.5
asyncio~=3.4.3
urllib3~=2.2.2
requests~=2.32.3
attrs~=23.1.0
pluggy~=1.3.0
certifi~=2024.8.30
pandas

# SQL Alchemy and Geo Alchemy
SQLAlchemy==2.0.23
geoalchemy2==0.14.7

# Google specific packages for this function
google-cloud-pubsub
cloudevents~=1.10.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Faker
pytest~=7.4.3
Empty file.
Loading
Loading