Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: transitFeedSyncProcessor implmentation #760

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

AlfredNwolisa
Copy link

@AlfredNwolisa AlfredNwolisa commented Oct 10, 2024

Summary:

Summarize the changes in the pull request including how it relates to any issues (include the #number, or link them).

Expected behavior:

Explain and/or show screenshots for how you expect the pull request to work in your testing (in case other devices exhibit different behavior).

Testing tips:

Provide tips, procedures and sample files on how to test the feature.
Testers are invited to follow the tips AND to try anything they deem relevant outside the bounds of the testing tips.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

davidgamez and others added 10 commits September 23, 2024 16:30
- Added logic to determine if a feed is "new" or requires an "update" based on database checks.
- Integrated new `payload_type` field in `TransitFeedSyncPayload` to track the status of each feed ("new" or "update").
- Implemented checks for `external_id` in `public.externalid` table and corresponding `feed_url` in `public.feed` table.
- Filtered out feeds with HTTP status codes 404 and 500, as well as those from Japan and France.
- Added methods for fetching and extracting data from the TransitLand API, combining operator and feed data to prepare payloads.
…nal ID checks

- Added logic to determine if a feed is "new" or requires an "update" based on database checks.
- Integrated new `payload_type` field in `TransitFeedSyncPayload` to track the status of each feed ("new" or "update").
- Implemented checks for `external_id` in `public.externalid` table and corresponding `feed_url` in `public.feed` table.
- Filtered out feeds with HTTP status codes 404 and 500, as well as those from Japan and France.
- Added methods for fetching and extracting data from the TransitLand API, combining operator and feed data to prepare payloads.
…nal ID checks

- Added logic to determine if a feed is "new" or requires an "update" based on database checks.
- Integrated new `payload_type` field in `TransitFeedSyncPayload` to track the status of each feed ("new" or "update").
- Implemented checks for `external_id` in `public.externalid` table and corresponding `feed_url` in `public.feed` table.
- Filtered out feeds with HTTP status codes 404 and 500, as well as those from Japan and France.
- Added methods for fetching and extracting data from the TransitLand API, combining operator and feed data to prepare payloads.
- Added logic to determine if a feed is "new" or requires an "update" based on database checks.
- Integrated new `payload_type` field in `TransitFeedSyncPayload` to track the status of each feed ("new" or "update").
- Implemented checks for `external_id` in `public.externalid` table and corresponding `feed_url` in `public.feed` table.
- Filtered out feeds with HTTP status codes 404 and 500, as well as those from Japan and France.
- Added methods for fetching and extracting data from the TransitLand API, combining operator and feed data to prepare payloads.
@CLAassistant
Copy link

CLAassistant commented Oct 10, 2024

CLA assistant check
All committers have signed the CLA.

AlfredNwolisa and others added 6 commits October 10, 2024 10:37
…stop_id

2. adjusted payload to reflect external_id update
Incorporated URL status check and merged feed sync data using pandas DataFrames for enhanced processing. Refactored process_sync() function to include filtering, grouping, and data extraction improvements, leading to more efficient and accurate feed sync operations.
 Unit tests to verify the functionality of TransitFeedSyncProcessor, covering data retrieval, rate limit handling, synchronization processing, URL status checks, data extraction, and merging/filtering of data.
 Unit tests to verify the functionality of TransitFeedSyncProcessor, covering data retrieval, rate limit handling, synchronization processing, URL status checks, data extraction, and merging/filtering of data.
@AlfredNwolisa AlfredNwolisa changed the title Feat/transitland feed sync Feat: transitFeedSyncProcessor implmentation Oct 16, 2024
@AlfredNwolisa AlfredNwolisa requested review from davidgamez and cka-y and removed request for davidgamez October 16, 2024 09:55
Comment on lines 8 to 12
{
"execution_id": "execution_id",
"feed_stable_id": "feed_stable_id",
"feed_id": "feed_id",
"feed_onestop_id": "feed_onestop_id"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function sends the TransitFeedSyncPayload as a payload, please add all fields in here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# payloads for publishing to queue
payloads = []
for data in combined_data:
associated_id = self.get_associated_id(db_session, data['feeds_onestop_id'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to filter the associated ID by the ID and the source. This ensures that the ID doesn't conflict with any other source. You can use the transit land source as: "TLD".

Standardized comments and docstrings for better clarity and consistency. Improved code readability by cleaning up unnecessary lines and aligning comment styles.
Ensured that all modified files include a newline at the end
Added 'source' attribute to payload and implemented new logic for determining feed updates or new entries based on 'external_id' and 'source'. Refactored test cases to align with these changes.
Extended the timeout for the `requests.head` call from 10 to 25 seconds to accommodate slower server responses.
Added 'source' parameter in README.md and reformat function arguments in main.py and test_feed_sync.py for better readability. Ensure consistent indentation and correct minor formatting issues.
Added 'source' parameter in README.md and reformat function arguments in main.py and test_feed_sync.py for better readability. Ensure consistent indentation and correct minor formatting issues.
Added the 'source' field to the payload dictionary in the README.md example and made minor whitespace adjustments in the main source and test files.
@davidgamez davidgamez mentioned this pull request Oct 21, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants