Skip to content

Commit

Permalink
DAS-1966 - Include documentation notebook for CMR GraphQL.
Browse files Browse the repository at this point in the history
  • Loading branch information
owenlittlejohns committed Sep 23, 2023
1 parent 2203968 commit a1b88e9
Showing 1 changed file with 296 additions and 0 deletions.
296 changes: 296 additions & 0 deletions docs/how_to_generate_umm_var_via_cmr_graphql.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4e8fecd4-d367-4c8f-9155-428b2dda126a",
"metadata": {},
"source": [
"# How to generate UMM-Var records via the CMR GraphQL API\n",
"\n",
"As of v2.0.0, `earthdata-varinfo` can generate and publish UMM-Var records to CMR. Functionality to generate and return UMM-Var record JSON has been embedded within the [CMR GraphQL specification](https://graphql.earthdata.nasa.gov/api). This notebook will provide an example of how to programmatically request UMM-Var records in JSON format from the CMR GraphQL API.\n",
"\n",
"*Note: This feature of the CMR GraphQL API is currently under development and not deployed as of 2023-09-23.*\n",
"\n",
"## Environment set up:\n",
"\n",
"First create and activate a Python environment using either `pyenv` or conda. Then pip install the following requirements:\n",
"\n",
"* [requests](https://pypi.org/project/requests/) - A package to make HTTP requests.\n",
"* [notebook](https://pypi.org/project/notebook/) - A package to run the web-based Jupyter notebook environment.\n",
"\n",
"## Import necessary packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f118bc0b-1120-4254-aec4-f2f90bbb5ed3",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import requests"
]
},
{
"cell_type": "markdown",
"id": "b52955a2-51a9-4269-9b55-f116e1268a43",
"metadata": {},
"source": [
"## Select the correct CMR GraphQL environment:\n",
"\n",
"The CMR GraphQL endpoint is available in SIT, UAT and production environments. Select the appropriate environment by updating the last line in the next cell to use the correct environment key ('local', 'sit', 'uat' or 'production'). The notebook is configured to access UAT by default."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0634b4ce-a9bf-4db9-9a7c-ca2cdf42a37f",
"metadata": {},
"outputs": [],
"source": [
"graphql_environments = {\n",
" 'local': 'http://localhost:3013/dev/api',\n",
" 'sit': 'https://graphql.sit.earthdata.nasa.gov/api',\n",
" 'uat': 'https://graphql.uat.earthdata.nasa.gov/api',\n",
" 'production': 'https://graphql.earthdata.nasa.gov/api'\n",
"}\n",
"\n",
"graphql_url = graphql_environments['uat']"
]
},
{
"cell_type": "markdown",
"id": "c69f6cf7-1b65-47c6-884f-f89c353ee225",
"metadata": {},
"source": [
"## Define the GraphQL query:\n",
"\n",
"Requests to a GraphQL API require a [query or mutation](https://graphql.org/learn/queries/) to be defined. In this case, the `Collection` query is used. This query requires a client to specify the fields of the object to be returned. The request below specifies a single field `generateVariableDrafts`, which will in turn trigger an AWS Lambda function to use `earthdata-varinfo` to generate UMM-Var JSON (schema version 1.8.2) for all identified variables within a sample granule for that collection.\n",
"\n",
"The query defines the fields of the generated UMM-Var records that the response will contain:\n",
"\n",
"* `dataType`\n",
"* `definition`\n",
"* `dimensions`\n",
"* `longName`\n",
"* `name`\n",
"* `standardName`\n",
"* `units`\n",
"* `metadataSpecification`\n",
"\n",
"For more information on each of these fields, please see the [UMM-Var JSON schema](https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model/browse/variable/v1.8.2/umm-var-json-schema.json)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a584730-66e1-43c4-90aa-2ebdc2dd02fb",
"metadata": {},
"outputs": [],
"source": [
"graphql_query = '''\n",
" query Collection($params: CollectionInput) {\n",
" collection(params: $params) {\n",
" generateVariableDrafts {\n",
" count\n",
" items {\n",
" dataType\n",
" definition\n",
" dimensions\n",
" longName\n",
" name\n",
" standardName\n",
" units\n",
" metadataSpecification\n",
" }\n",
" }\n",
" }\n",
" }\n",
"'''"
]
},
{
"cell_type": "markdown",
"id": "3f68f0fc-41d2-4bc2-bcb5-91d3f091f66f",
"metadata": {},
"source": [
"## Set GraphQL query parameters:\n",
"\n",
"The `Collection` query in CMR-GraphQL has a number of parameters that can be specified to identify matching collections. In this example, the collection concept ID is specified. The example collection used [a testing copy of a GPM IMERG precipitation product](https://cmr.uat.earthdata.nasa.gov/search/concepts/C1245618475-EEDTEST.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8696a4de-56cb-4524-8b04-f7b1e04f1d29",
"metadata": {},
"outputs": [],
"source": [
"variables = {\n",
" 'params': {\n",
" 'conceptId': 'C1245618475-EEDTEST'\n",
" }\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "f164de17-39f1-4db3-b077-fe0e72b21c5e",
"metadata": {},
"source": [
"## Create full JSON payload:\n",
"\n",
"With the query and the variables defined, the JSON payload for the HTTP request to the CMR-GraphQL interface can be formed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b35c3cb-43bb-4d5b-ba69-c52e8f1ff0f4",
"metadata": {},
"outputs": [],
"source": [
"payload = {\n",
" 'query': graphql_query,\n",
" 'variables': variables\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "62c1b142-8b21-4d74-b9ff-761b94483dc2",
"metadata": {},
"source": [
"## Configure HTTP request headers:\n",
"\n",
"The HTTP request made in this notebook requires two headers:\n",
"\n",
"* `Authorization` - this will include an [Earthdata Login](https://urs.earthdata.nasa.gov/) (EDL) Bearer token.\n",
"* `Content-Type` - this header tells the HTTP request the media type of the body of the request. In this case the request contains JSON, as defined in the payload above.\n",
"\n",
"This notebook assumes that there is a file in the same directory as the notebook called `edl_token.txt`, and that it contains a valid EDL Bearer token for the environment being accessed (as defined by `graphql_url`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb4daeed-4ff8-42bb-828b-0d37b17816b8",
"metadata": {},
"outputs": [],
"source": [
"# Read the authorization token from a file\n",
"with open('edl_token.txt', 'r') as file_handler:\n",
" edl_token = file_handler.read().strip()\n",
"\n",
"# Create headers, including the EDL Bearer token\n",
"headers = {\n",
" 'Authorization': f'Bearer {edl_token}',\n",
" 'Content-Type': 'application/json',\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "6b77a6e0-0036-4a4d-9aaf-b856af390425",
"metadata": {},
"source": [
"## Make a request to CMR GraphQL:\n",
"\n",
"The following cell submits the HTTP request to the CMR GraphQL API, using the URL of the environment chosen at the start of this notebook. The request combines the payload (`Collection` query and parameters) with the necessary headers to retrieve an HTTP response."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15a0cb4e-f68b-47f0-a4f0-1e463ea98d3c",
"metadata": {},
"outputs": [],
"source": [
"cmr_graphql_response = requests.post(graphql_url, json=payload, headers=headers)"
]
},
{
"cell_type": "markdown",
"id": "c7dccbab-daba-42f5-9a38-6efbe476ae3a",
"metadata": {},
"source": [
"## CMR GraphQL response:\n",
"\n",
"The resulting response from the CMR GraphQL API is in HTTP format. A successful response will return the requested information within the `data` attribute of the response body, while any errors will be reported under the `errors` attribute ([see the GraphQL documentation for a full description of the response body](https://graphql.org/learn/serving-over-http/#response)). The code below will print the full response, rendered as JSON.\n",
"\n",
"The expected output for a successful request will look as follows, denoting the total count generated UMM-Var records and their JSON each variable detected in a sample granule for the requested collection. These are contained under the `generateVariableDrafts` field:\n",
"\n",
"```\n",
"{\n",
" \"data\": {\n",
" \"collection\": {\n",
" \"generateVariableDrafts\": {\n",
" \"count\": 16,\n",
" \"items\": [\n",
" {\n",
" \"dataType\": \"int32\",\n",
" \"definition\": \"Grid/time\",\n",
" \"dimensions\": [\n",
" {\n",
" \"Name\": \"Grid/time\",\n",
" \"Size\": 1,\n",
" \"Type\": \"TIME_DIMENSION\"\n",
" }\n",
" ],\n",
" \"longName\": \"Grid/time\",\n",
" \"name\": \"Grid/time\",\n",
" \"standardName\": \"time\",\n",
" \"units\": \"seconds since 1970-01-01 00:00:00 UTC\",\n",
" \"metadataSpecification\": {\n",
" \"URL\": \"https://cdn.earthdata.nasa.gov/umm/variable/v1.8.2\",\n",
" \"Name\": \"UMM-Var\",\n",
" \"Version\": \"1.8.2\"\n",
" }\n",
" },\n",
" ...\n",
" ]\n",
" }\n",
" }\n",
" }\n",
"}\n",
"\n",
"}\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7a312848-d851-4b38-af1b-2b944cbe05a4",
"metadata": {},
"outputs": [],
"source": [
"data = cmr_graphql_response.json()\n",
"print(json.dumps(data, indent=2))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit a1b88e9

Please sign in to comment.