-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-39636: Migrate Ook into a FastAPI application #140
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- pyproject.toml packaging - nox for task running - scriv change log - markdown readme - ruff for linting - drop kustomize Note that this should break CI; the next step is to start switching the app itself over.
These won't work yet; this is just beginning to move the aiohttp pieces in the right place.
This likely won't work yet, but starts to replace the aiohttp app.py factory module.
This establishes our pattern of organizing the internal data model into tthe domain. Now, the DocumentRecord is technically a persistence model.
This service works with DocumentRecord domain objects and takes actions on the Algolia index.
This moves the classification module in ook.services and repackages it as a service class. More work is likely needed to modernize this service.
This implements the concept of a Kafka consumer that routes messages based on the topic name and types of the Pydantic models, described in https://sqr-076.lsst.io This code should eventually be moved into Kafkit.
- Drop run command because it's not needed for our FastAPI pattern - Migrate upload_doc_stub command to use the new AlgoliaDocIndexService.
Include Kafkit to provide Pydantic-based schema management and a Kafka consumer/producer.
Also Fix ruff linting issues.
We'll see if this actually renders and parses correctly with current versions of Pydantic...
jonathansick
force-pushed
the
tickets/DM-39636
branch
from
July 15, 2023 18:37
f90cda0
to
eef7093
Compare
FastAPI now recommends using a lifecycle context manager rather than the on_event handlers.
This will need to be entirely re-implemented, so lets not keep it around for now.
consumer_kafka_messages is run in a separate asyncio task that is started when the app starts up. This coroutine sets up the PydanticAIOKafkaConsumer. The next step is to start hooking up the actual message routes.
The Kafka message schemas that Ook manages are now Pydantic models (AvroBaseModel via dataclasses_avroschema)
This will be re-built later when the next-gen LTD is producing events again.
The purpose of this service is to get metadata about a GitHub repository as needed, often as a sub-service for document parsers. This replaces the old github and utils modules, as well as the githubcreationdate workflow (which is now the seed for this service). We've optimized the algorithm for getting the date of the first commit as well. No longer are we ignoring additional pages of content.
This service handles ingesting Lander v1 documents that use JSON-LD based metadata.jsonld documents. It replaces the former ltdlander workflow module.
This service replaces the ingest_ltd_sphinx_technote workflow. The service is responsible for ingesting the original format of Sphinx technotes.
This makes it easier to spin up a Kafka producer from either HTTP path operations or in the handlers for the Kafka consumer. Now the Kafka consumer gets it's PydanticSchemaManager from the factory rather than creating an entirely new one. This mocking and kafka producer modules are based on Squarebot's; these need to be moved up into Kafkit.
Since Ook won't be a webhook receiver, it doesn't the GitHub webhook secret. Instead, we'll get GitHub events through Squarebot.
This version includes FastApi dependencies for Kafka apps.
jonathansick
force-pushed
the
tickets/DM-39636
branch
from
July 19, 2023 14:01
aee98d6
to
7662689
Compare
The hyphens are illegal in schema names.
jonathansick
force-pushed
the
tickets/DM-39636
branch
from
July 19, 2023 15:10
d9d7fb9
to
19b7a7f
Compare
This integrates the classification service with the POST /ingest/ltd endpoint. The classification service can now send resolved ingest requests to the Kafka ingest topic. The new LtdMetadataService provides basic access to the LTD v1 API (read only).
jonathansick
force-pushed
the
tickets/DM-39636
branch
from
July 19, 2023 16:50
6ae94ee
to
4988f01
Compare
jonathansick
force-pushed
the
tickets/DM-39636
branch
2 times, most recently
from
July 19, 2023 21:03
a093042
to
f987c80
Compare
This makes it easier in dev environments to understand when messsages are being consumed, and if they're matched to a router and handled successfully or not
This isn't parsing correctly; need to check what units its in.
The Algolia SearchClient manages an aiohttp client session. Therefore it needs to be shut down. It's a context manager, which is tricky to use as a dependency; especially with our Factory pattern when it's run outside a FastAPI response handler context. The way we work around this is to manually call the __aenter___ and __aexit methods in the initialization and close phases of the dependency.
When running the async version, browse_objects_async I get this error: TypeError: 'async for' received an object from __aiter__ that does not implement __anext__: coroutine", "logger":"ook", "offset":6, "partition":0, "topic":"lsst.square-events.ook.ingest"} See algolia/algoliasearch-client-python#559 The work-around is to use the sync API, browse_objects, until this is resolved.
This will hopefully cause Pydantic to render the enum out into a string with the enum's value
This adds support for the product_slugs field in the POST /ingest/ltd endpoint; with a TaskGroup we can easily run multiple classification/queuing tasks at once.
jonathansick
force-pushed
the
tickets/DM-39636
branch
from
July 20, 2023 15:41
133cb8a
to
39bec72
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backwards-incompatible changes
POST /ook/ingest/ltd
endpoint.New features
Other changes