Skip to content

Commit

Permalink
Merge pull request #21 from DOI-USGS/linkages
Browse files Browse the repository at this point in the history
Addresses comments re: data linkages
  • Loading branch information
jlaura authored Apr 26, 2024
2 parents d726f89 + f0cedd5 commit 3ad32f5
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions content/moon/standards/metadata_standards.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,15 @@ Accuracy statements should consider using a standard reporting mechanism such as
### Qualitative Fitness of Use
We are not aware of existing qualitative, fitness-for-use standards. Therefore, these statements can be varied in form. Please consider describing what types of qualitative issues may be present in a given data sets, what use cases are most likely given the data product, and what use cases require caution.

### Provenance and Traceabilty
The goal of provenance is to allow a user to reproduce the creation of an analysis ready data product from the archived form. Why is this desirable? As an example, for some applications, the traceability of individual pixel values can be of extreme importance. Are their values anomalous because of processing that occurred, or are they observations warranting additional study? To allow users to answer those questions, they must be able to re-create a data product, stopping at any step along the creation process to perform their own analysis.
### Provenance and Traceability
The goal of provenance is to allow a user to reproduce the creation of an analysis-ready data product from the archived form. Why is this desirable? As an example, for some applications, the traceability of individual pixel values can be of extreme importance. Are their values anomalous because of processing that occurred, or are they observations warranting additional study? To allow users to answer those questions, they must be able to re-create a data product, stopping at any step along the creation process to perform their own analysis.

To support this, a data provider should provide some reasonable mechanism for recreation. Some potential options include a text file or script with the commands and information on the software and software version used to create the file or a step-by-step plain language description of the processing steps including citations for the tools used.

### Articial Intellegence / Machine Learning Derived Data
Data derived in whole or in part using AI/ML techniques must be clearly labelled as such. When possible, the inpout data sets and model(s) used should be described. As with all data, qualitative and quantitative metadata are essential. When feasible, models and training data should be made available and linked in the data documentation.
As described in the Data Linkages section, below, an explicit link back to the input, archived source data (with DOI), should be provided when possible.[^1]

### Artificial Intelligence / Machine Learning Derived Data
Data derived in whole or in part using AI/ML techniques must be clearly labelled as such. When possible, the input data sets and model(s) used should be described. As with all data, qualitative and quantitative metadata are essential. When feasible, models and training data should be made available and linked in the data documentation.

### Data Interpolation
For derived data (e.g., DTMs, image mosaics, etc.), decisions made by the data producer to interpolate the data should document the method selected and the rationale. This supports user interpretation of the data and helps with traceability of data values within the product.
Expand All @@ -43,11 +45,13 @@ Data licensing can be mandated by the funders of the data creation. For all data
- Data can have an [ISO compliant XML metadata file](https://wiki.icaci.org/index.php?title=ISO_19157:2013_Geographic_information_-_Data_quality). This file is to be as compliant as possible and should use the planetary domain extension.

### Data Linkages and Unique Identifiers
- For data derived from lower level data, stored in a long term archive, linkages back to the source data should be provided[^1].
- Data not easily derived from lower level data should include a unique identifier (e.g., a DOI).
- For data derived from archived sources (e.g., the PDS, JAXA DARTS, ESA PSA), linkages back to the source data should be provided. When combined with the provenance files, this allows users to recreate products from source files, reproducing or modifying the processing pipeline as desired.
- Data not easily derived from lower-level data should include a unique identifier (e.g., a DOI).
- Linkages to a primary, peer-reviewed, science publication should be provided when possible.
- Linkags to superseded datasets should be provided when the releasing data would supersede a previously released data set. Ideally, this linkage is by-directional.

### Discussion

{{< comments >}}

[^1]: Combined with the provenance files, this allows users to recreate products from source files, reproducing or modifying the processing pipeline as desired.
[^1] If a DOI is not available, consider asking the archive to mint a DOI for you to best track data provenance.

0 comments on commit 3ad32f5

Please sign in to comment.