From a1d76302d4a1294c9a862f2e532a118ff50e313d Mon Sep 17 00:00:00 2001 From: owenlittlejohns Date: Fri, 15 Mar 2024 14:35:00 -0600 Subject: [PATCH] Update UMM-Var documentation following OB DAAC feedback. --- ...ublish_to_cmr_with_earthdata_varinfo.ipynb | 246 +++++++++--------- 1 file changed, 130 insertions(+), 116 deletions(-) diff --git a/docs/how_to_publish_to_cmr_with_earthdata_varinfo.ipynb b/docs/how_to_publish_to_cmr_with_earthdata_varinfo.ipynb index 4876b59..a7a807c 100644 --- a/docs/how_to_publish_to_cmr_with_earthdata_varinfo.ipynb +++ b/docs/how_to_publish_to_cmr_with_earthdata_varinfo.ipynb @@ -1,23 +1,38 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "## How to use `earthdata-varinfo` to publish UMM-Var records to CMR\n", - "### Overview:\n", + "# How to use `earthdata-varinfo` to publish UMM-Var records to CMR\n", "\n", - "This notebook demonstrates how to create and publish, Unified Metadata Model-Variable (UMM-Var) records to NASA's Common Metadata Repository (CMR) with, `earthdata-varinfo` >= 2.0.0. `earthdata-varinfo` utilizes [`python-cmr`](https://github.com/nasa/python_cmr) to query CMR for collection granules to download locally. The `VarInfoFromNetCDF4` class in `earhdata-varinfo` is used to create CMR compliant UMM-Var entries. Lastly, the `requests` library is used to publish UMM-Var records to a given CMR environment (`OPS`, `UAT`, and `SIT`).\n", + "This notebook demonstrates how to create and publish, Unified Metadata Model-Variable (UMM-Var) records to NASA's Common Metadata Repository (CMR) with, `earthdata-varinfo` >= 2.0.0.\n", + "\n", + "There are three main workflows described in this notebook:\n", + "\n", + "* The use of a single overarching function `generate_collection_umm_var`, which:\n", + " * Uses [python-cmr](https://github.com/nasa/python_cmr) to query CMR for collection granules.\n", + " * Downloads one of these granules to the local machine.\n", + " * Uses `VarInfoFromNetCDF4` to parse in-file metadata from the granule.\n", + " * Creates UMM-Var JSON objects for each of the variables found in the downloaded granule.\n", + " * (Optionally) Publishes these UMM-Var objects to CMR (in a specified environment: `OPS`, `UAT` or `SIT`).\n", + "* Performing the same workflow as above, but using individual functions and classes to perform each step in isolation.\n", + "* Publication of a single UMM-Var record.\n", + "\n", + "It is recommended to use `generate_collection_umm_var` for most use-cases. However, if a local file already exists on the machine running this notebook, or a collection doesn't yet have granule metadata, then the second workflow can be used to skip the initial steps of identifying and downloading a granule.\n", "\n", "### Setting up your environment to run this notebook\n", "\n", + "**Recommended option:**\n", + "\n", "Create and activate your `pyenv` or conda environment, then:\n", "\n", "```\n", "pip install earthdata-varinfo\n", "```\n", "\n", + "**Alternative:**\n", + "\n", "If this doesn't work, alternatively you can clone the git repository, and install the package in editable mode:\n", "\n", "```\n", @@ -25,109 +40,81 @@ "cd earthdata-varinfo\n", "pip install -e .\n", "```\n", + "\n", "### Other notebook requirements:\n", "\n", - "When installing `earthdata-varinfo` via PyPI required packages should automatically be installed as dependencies. \n", + "When installing `earthdata-varinfo` via PyPI, required packages should automatically be installed as dependencies. \n", "For local development, without a standard pip installation, third party requirements can be installed from the following files:\n", "\n", "```\n", "pip install -r requirements.txt -r dev-requirements.txt\n", "pip install notebook\n", "```\n", - "### Example usage:\n", - "\n", - "* [GLDAS_NOAH10_3H](https://cmr.uat.earthdata.nasa.gov/search/collections.umm_json?concept-id=C1256543837-EEDTEST)\n", - "* [M2I1NXASM](https://cmr.uat.earthdata.nasa.gov/search/collections.umm_json?concept-id=C1256535511-EEDTEST)\n", "\n", "### Authorization:\n", "\n", - "* Launchpad or EDL tokens must be used in order query and publish to CMR.\n", - "* Authorization headers for EDL tokens contain the header prefix `Bearer` before the token\n", - " * For example: `Bearer `\n", - "* Authorization headers for Launchpad tokens do **NOT** contain any prefixes in the header\n", - " * For example: ``\n", + "This notebook uses two types of tokens for authentication with external resources.\n", + "\n", + "* Launchpad tokens are required to query for and publish metadata records to CMR. The `Authorization` header for these token does not include an HTTP authentication scheme, so the value for the `Authorization` header looks as follow:\n", + " * ``\n", + "* Earthdata Login (EDL) tokens are used to download granule files. EDL tokens use the `Bearer` authentication scheme, meaning the `Authorization` header is as follows:\n", + " * `Bearer `\n", "\n", "To request a Launchpad Token visit:\n", - "* [Launchpad Authentication User's Guide](https://wiki.earthdata.nasa.gov/display/CMR/Launchpad+Authentication+User%27s+Guide)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Publish UMM-Var records for **GLDAS_NOAH10_3H** with `generate_collections_umm_var`\n", + "* [Launchpad Authentication User's Guide](https://wiki.earthdata.nasa.gov/display/CMR/Launchpad+Authentication+User%27s+Guide)\n", "\n", - "`generate_collections_umm_var` is a wrapper function that combines the functionalities in `varinfo.cmr_search`, the `VarInfoFromNetCDF4` class and `varinfo.umm_var` to create and publish UMM-Var entries to CMR." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Update `auth_header` to include your EDL token (e.g. `Bearer `) or Launchpad token (e.g. ``)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "auth_header = 'Bearer or '" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Update the `collection_concept_id` to the **GLDAS_NOAH10_3H** concept-id for the EEDTEST provider.\n", - "* This can be updated to any concept-id for any provider" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "collection_concept_id_gldas = 'C1256543837-EEDTEST'" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Import `generate_collection_umm_var` from `varinfo.generate_umm_var`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from varinfo.generate_umm_var import generate_collection_umm_var" + "### UMM-Var native IDs:\n", + "\n", + "UMM records have a native ID that is required for publication of any record. `earthdata-varinfo` implements the following scheme for native IDs:\n", + "\n", + "```\n", + "-\n", + "\n", + "# e.g.:\n", + "C1234567890-PROV-variable_name\n", + "\n", + "# Or for a nested variable:\n", + "C1234567890-PROV-variable_group_variable_path\n", + "```\n", + "\n", + "Using `earthdata-varinfo` multiple times to generate UMM-Var record for the same collection will result in updating existing records, rather than creating duplicate UMM-Var records for the same variables.\n", + "\n", + "### Examples in this notebook:\n", + "\n", + "* Using `generate_collection_umm_var`: [GLDAS_NOAH10_3H](https://cmr.uat.earthdata.nasa.gov/search/collections.umm_json?concept-id=C1256543837-EEDTEST)\n", + "* Using individual functions: [M2I1NXASM](https://cmr.uat.earthdata.nasa.gov/search/collections.umm_json?concept-id=C1256535511-EEDTEST)" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ + "## Workflow 1: Using the single `generate_collection_umm_var` function (recommended):\n", + "\n", + "**This option is recommended if you have a collection in CMR with granules, and want the simplest workflow**\n", + "\n", + "This example shows how to publish UMM-Var records for **GLDAS_NOAH10_3H** with `generate_collection_umm_var`. `generate_collection_umm_var` is a wrapper function that combines the functionality of individual classes and functions of `earthdata-varinfo`, including: `varinfo.cmr_search`, `VarInfoFromNetCDF4` and `varinfo.umm_var`.\n", + "\n", "`generate_collection_umm_var` will:\n", "\n", - "* Download the most recent granule for **GLDAS_NOAH10_3H**\n", - "* Generate the UMM-Var records for this granule\n", - "* Publish these records to CMR if `publish=True`. \n", + "* Query CMR to find the collection specified and links to granules in that collection.\n", + "* Download the most recent granule for **GLDAS_NOAH10_3H**.\n", + "* Parse the in-file metadata for the downloaded granule.\n", + "* Generate the UMM-Var records from the parsed file information.\n", + "* Publish these records to CMR if `publish=True`.\n", "* If `publish=True`, a list of ingested variable concept-ids or the error(s) from an unsucessful ingest is returned\n", " * `['V1259971755-EEDTEST', 'V1259971757-EEDTEST', ...]` \n", " * `['V1259971755-EEDTEST', '#: CMR error 1\\n #: CMR error 2', ...]`\n", "* If `publish=False` (default) a list of UMM-Var entries is returned:\n", - " * `[...{'Name': 'lat', 'LongName': 'lat', ...}, {'Name': 'time', 'LongName': 'time', ...}...]`" + " * `[...{'Name': 'lat', 'LongName': 'lat', ...}, {'Name': 'time', 'LongName': 'time', ...}...]`\n", + "\n", + "\n", + "**Customising the cell below for a different collection:**\n", + "\n", + "The following cell specifies the collection concept ID of **GLDAS_NOAH10_3H** (from the `EEDTEST` CMR provider). \n", + "This can be updated to any concept-id for any provider.\n", + "\n", + "Update `auth_header` in the cell below to include your Launchpad token." ] }, { @@ -136,21 +123,32 @@ "metadata": {}, "outputs": [], "source": [ + "from varinfo.generate_umm_var import generate_collection_umm_var\n", + "\n", + "\n", + "auth_header = ''\n", + "collection_concept_id_gldas = 'C1256543837-EEDTEST'\n", + "\n", "generate_collection_umm_var(collection_concept_id=collection_concept_id_gldas,\n", " auth_header=auth_header, publish=True)" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "### Publishing and creating UMM-Var entries for **M2I1NXASM**:\n", - "This example is an alternative to using `generate_collection_umm_var`. It demonstrates the individual components of `generate_collection_umm_var` with:\n", + "## Workflow 2: Publishing CMR records with lower-level functions:\n", + "\n", + "**This workflow is primarily for collections without granules or when a file already exists on a local machine.**\n", + "\n", + "The following example will publish and create UMM-Var entries for **M2I1NXASM**. It does so by using the individual pieces of functionality wrapped by `generate_collection_umm_var`.\n", + "\n", "* `varinfo.cmr_search`: queries CMR for a granule download link and downloads granules locally\n", "* `VarInfoFromNetCDF4`: varinfo parent class that represents the contents of a granule\n", "* `varinfo.umm_var`: contains functions for creating and publishing UMM-Var records to CMR\n", - "* `CMR_UAT` is a string constant (e.g. https://cmr.uat.earthdata.nasa.gov/search/) of a CMR environment" + "* `CMR_UAT` is a string constant (e.g. https://cmr.uat.earthdata.nasa.gov/search/) of a CMR environment\n", + "\n", + "First import the individual functions and classes required:" ] }, { @@ -161,22 +159,16 @@ "source": [ "from cmr import CMR_UAT\n", "\n", - "from varinfo.cmr_search import (get_granules, get_granule_link, \n", - " download_granule)\n", - "\n", "from varinfo import VarInfoFromNetCDF4\n", - "\n", - "from varinfo.umm_var import (get_all_umm_var, get_umm_var, publish_all_umm_var,\n", - " publish_umm_var)" + "from varinfo.cmr_search import download_granule, get_granules, get_granule_link\n", + "from varinfo.umm_var import get_all_umm_var, publish_all_umm_var, publish_umm_var" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Update the `collection_concept_id` to the **M2I1NXASM** concept-id for the EEDTEST provider\n", - "* This can be updated to any concept-id for any provider" + "Next define the CMR concept ID of the collection that will have UMM-Var records generated. In this example, the `collection_concept_id` used is for the **M2I1NXASM** collection in the EEDTEST provider, but this can be updated to a collection concept ID from any provider." ] }, { @@ -189,7 +181,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -197,7 +188,9 @@ "\n", "* `get_granules`: queries `CMR_UAT` (default is `CMR_OPS`) for a UMM-G record (granule record) given a collection or granule concept-id\n", " * you can query any CMR environment by adding `cmr_env=CMR_UAT` or `cmr_env=CMR_SIT`\n", - "* `get_granule_link`: parses the UMM-G record from `get_granules` for a data download URL" + "* `get_granule_link`: parses the UMM-G record from `get_granules` for a data download URL\n", + "\n", + "**This step can be skipped if a granule file is already present on your machine.**" ] }, { @@ -209,19 +202,21 @@ "granule_response = get_granules(concept_id=collection_concept_id_merra,\n", " cmr_env=CMR_UAT,\n", " auth_header=auth_header)\n", + "\n", "url = get_granule_link(granule_response)\n", - "url" + "print(url)" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Download the granule locally with `download_granule`\n", "* Defaults to current directory\n", "* Add optional argument `out_directory=/path/to/save/granule` to save to specified path\n", - "* Returns the path the granule was downloaded to (e.g. `/path/granule/was/saved/to`)" + "* Returns the path the granule was downloaded to (e.g. `/path/granule/was/saved/to`)\n", + "\n", + "**This step can be skipped if a granule file is already present on your machine.**" ] }, { @@ -234,11 +229,12 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Instantiate a ```VarInfoFromNetCDF4``` object for a local NetCDF-4 file. " + "**Start here if you have a local granule file already.**\n", + "\n", + "Instantiate a ```VarInfoFromNetCDF4``` object for a local NetCDF-4 file. This will parse the in-file metadata for the specified NetCDF-4 file, including relationships between variables (such as coordinates, bounds, and dimensions)." ] }, { @@ -252,7 +248,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -268,11 +263,10 @@ "outputs": [], "source": [ "umm_var_dict = get_all_umm_var(var_info)\n", - "umm_var_dict" + "print(umm_var_dict)" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -294,12 +288,14 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "### Publish one UMM-Var record with the `var_info.get_variable()` object\n", - "This example is another alternative to using `generate_collection_umm_var`. In this example we use the granule we have already download locally (**M2I1NXASM**) to create and ingest a single UMM-Var record.\n", + "## Workflow 3: Publishing a single UMM-Var record:\n", + "\n", + "**This workflow is for updating or creating a single UMM-Var record.**\n", + "\n", + "This example is another alternative to using `generate_collection_umm_var`. In this example we use a locally downloaded granule (**M2I1NXASM**) to create and ingest a single UMM-Var record for a variable of interest.\n", "* Use `var_info.get_variable()` to retrieve the variable object from `var_info`\n", "* Keys are the full variable paths (e.g. `'/TROPPV'`)" ] @@ -310,15 +306,36 @@ "metadata": {}, "outputs": [], "source": [ + "from cmr import CMR_UAT\n", + "\n", + "from varinfo import VarInfoFromNetCDF4\n", + "from varinfo.umm_var import get_umm_var, publish_umm_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First parse the local file, and from it identify the variable of interest:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var_info = VarInfoFromNetCDF4('MERRA2_400.inst1_2d_asm_Nx.20220130.nc4',\n", + " short_name='M2I1NXASM')\n", + "\n", "variable = var_info.get_variable('/TROPPV')" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Check if the variable exists and get a dictionary of the variable's UMM-Var JSON record" + "Check if the variable exists and, if so, get a dictionary of the variable's UMM-Var JSON record" ] }, { @@ -336,12 +353,10 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "Publish the UMM-Var record for `TROPPV` (from **M2I1NXASM**) to CMR_UAT with `publish_umm_var`\n", - "* This will return a variable concept-id (e.g. `'V1259972421-EEDTEST'`)" + "Publish the UMM-Var record for `TROPPV` (from **M2I1NXASM**) to CMR_UAT with `publish_umm_var`. This will return a variable concept-id (e.g. `'V1259972421-EEDTEST'`)." ] }, { @@ -359,7 +374,7 @@ ], "metadata": { "kernelspec": { - "display_name": "earthdata-varinfo", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -373,9 +388,8 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" - }, - "orig_nbformat": 4 + "version": "3.10.12" + } }, "nbformat": 4, "nbformat_minor": 2