Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate Geoglam & NO2 dataset ingestion #155

Open
2 tasks
smohiudd opened this issue Jul 22, 2024 · 2 comments
Open
2 tasks

Automate Geoglam & NO2 dataset ingestion #155

smohiudd opened this issue Jul 22, 2024 · 2 comments

Comments

@smohiudd
Copy link
Contributor

smohiudd commented Jul 22, 2024

Description

NO2 (#89) and Geoglam (#167, #173) datasets requires monthly ingestion as new assets are created. This is currently a manual process however should be automated. veda-data-airflow has a feature that allows scheduled ingestion by creating dataset specific DAGs. The file must still be transferred to the collection s3 bucket. A json file must be uploaded to the airflow event bucket. Here is an example json:

{
    "collection": "emit-ch4plume-v1",
    "bucket": "lp-prod-protected",
    "prefix": "EMITL2BCH4PLM.001/",
    "filename_regex": ".*.tif$",
    "schedule": "00 05 * * *",
    "assets": {
        "ch4-plume-emissions": {
            "title": "EMIT Methane Point Source Plume Complexes",
            "description": "Methane plume complexes from point source emitters.",
            "regex": ".*.tif$"
        }
    }
}

Acceptance Criteria

  • scheduled ingestion json files are created and uploaded for NO2 and Geoglam datasets
  • All new items in staging are also in production
@slesaad
Copy link
Member

slesaad commented Jul 24, 2024

Putting the discovery-items config within s3://<EVENT_BUCKET>/collections/ in the following format: https://github.com/US-GHG-Center/ghgc-data/blob/add/lpdaac-dataset-scheduled-config/ingestion-data/discovery-items/scheduled/emit-ch4plume-v1-items.json will trigger the discovery and subsequent ingestion of the collection items based on the schedule attribute

@smohiudd
Copy link
Contributor Author

smohiudd commented Aug 1, 2024

mcp-prod will need a new release of airflow to include automated ingestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants