From f0cedd526df4be1044cbc3d2bf215c79ea1a0a88 Mon Sep 17 00:00:00 2001 From: "Laura, Jason R" Date: Mon, 15 Apr 2024 13:59:33 -0700 Subject: [PATCH] Addresses comments re: data linkages --- content/moon/standards/metadata_standards.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/content/moon/standards/metadata_standards.md b/content/moon/standards/metadata_standards.md index ec4fe41..e836c1b 100644 --- a/content/moon/standards/metadata_standards.md +++ b/content/moon/standards/metadata_standards.md @@ -23,13 +23,15 @@ The standards described here are general, because the ability to create metadata ### Qualitative Fitness of Use We are not aware of existing qualitative, fitness-for-use standards. Therefore, these statements can be varied in form. Please consider describing what types of qualitative issues may be present in a given data sets, what use cases are most likely given the data product, and what use cases require caution. -### Provenance and Traceabilty -The goal of provenance is to allow a user to reproduce the creation of an analysis ready data product from the archived form. Why is this desirable? As an example, for some applications, the traceability of individual pixel values can be of extreme importance. Are their values anomalous because of processing that occurred, or are they observations warranting additional study? To allow users to answer those questions, they must be able to re-create a data product, stopping at any step along the creation process to perform their own analysis. +### Provenance and Traceability +The goal of provenance is to allow a user to reproduce the creation of an analysis-ready data product from the archived form. Why is this desirable? As an example, for some applications, the traceability of individual pixel values can be of extreme importance. Are their values anomalous because of processing that occurred, or are they observations warranting additional study? To allow users to answer those questions, they must be able to re-create a data product, stopping at any step along the creation process to perform their own analysis. To support this, a data provider should provide some reasonable mechanism for recreation. Some potential options include a text file or script with the commands and information on the software and software version used to create the file or a step-by-step plain language description of the processing steps including citations for the tools used. -### Articial Intellegence / Machine Learning Derived Data -Data derived in whole or in part using AI/ML techniques must be clearly labelled as such. When possible, the inpout data sets and model(s) used should be described. As with all data, qualitative and quantitative metadata are essential. When feasible, models and training data should be made available and linked in the data documentation. +As described in the Data Linkages section, below, an explicit link back to the input, archived source data (with DOI), should be provided when possible.[^1] + +### Artificial Intelligence / Machine Learning Derived Data +Data derived in whole or in part using AI/ML techniques must be clearly labelled as such. When possible, the input data sets and model(s) used should be described. As with all data, qualitative and quantitative metadata are essential. When feasible, models and training data should be made available and linked in the data documentation. ### Data Interpolation For derived data (e.g., DTMs, image mosaics, etc.), decisions made by the data producer to interpolate the data should document the method selected and the rationale. This supports user interpretation of the data and helps with traceability of data values within the product. Further, when possible data should be provided in gridded and ungridded (e.g., point cloud for DTMs) forms. @@ -42,11 +44,13 @@ For derived data (e.g., DTMs, image mosaics, etc.), decisions made by the data p - Data can have an [ISO compliant XML metadata file](https://wiki.icaci.org/index.php?title=ISO_19157:2013_Geographic_information_-_Data_quality). This file is to be as compliant as possible and should use the planetary domain extension. ### Data Linkages and Unique Identifiers -- For data derived from lower level data, stored in a long term archive, linkages back to the source data should be provided[^1]. -- Data not easily derived from lower level data should include a unique identifier (e.g., a DOI). +- For data derived from archived sources (e.g., the PDS, JAXA DARTS, ESA PSA), linkages back to the source data should be provided. When combined with the provenance files, this allows users to recreate products from source files, reproducing or modifying the processing pipeline as desired. +- Data not easily derived from lower-level data should include a unique identifier (e.g., a DOI). +- Linkages to a primary, peer-reviewed, science publication should be provided when possible. +- Linkags to superseded datasets should be provided when the releasing data would supersede a previously released data set. Ideally, this linkage is by-directional. ### Discussion {{< comments >}} -[^1]: Combined with the provenance files, this allows users to recreate products from source files, reproducing or modifying the processing pipeline as desired. \ No newline at end of file +[^1] If a DOI is not available, consider asking the archive to mint a DOI for you to best track data provenance.