Skip to content

manage_03_updating_prop_sets

cspayne edited this page Sep 22, 2023 · 20 revisions

Updating prop_sets

This is done with update_prop_sets.py

This script:

  • Uses XSLT to retrieve RDA property updates
  • Updates local prop_sets in map_storage while maintaining the sinopia elements containing guidance and implementation sets

Dependencies

Installing Python packages may require pip in place of pip3 for some systems

  • Python for Windows or WSL (recommend v3.8 or above)
  • lxml.etree
    • $ pip3 install lxml
  • Java
    • Windows: download Java and ensure PATH is set up correctly
    • WSL: $ sudo apt install openjdk-11-jre-headless
  • Saxon processor from Saxonica
    • For further documentation: Getting started with SaxonJ
    • The free Saxon-HE (Home Edition) is available for download from GitHub - see Saxonica/Saxon-HE > releases
    • The Saxon-HE repository README file provides information about current Saxon releases; we run sinopia_maps and map_storage stylesheets using version 11.5 or higher.
  • Create saxon11 folder in Windows or WSL and extract SaxonHE11.5J.zip to folder
    • Take note of the full directory path to this saxon11 folder
  • Test: $ java -cp {path_to_directory}/saxon-he-{saxon_version}.jar net.sf.saxon.Query -t -qs:"current-date()" and confirm output appears similar to the following:
SaxonJ-HE 11.5 from Saxonica
Java version 11.0.20.1
Analyzing query from {current-date()}
Analysis time: 378.7204 milliseconds
<?xml version="1.0" encoding="UTF-8"?>2023-09-20-07:00Execution time: 64.2685ms
Memory used: 11Mb

Running the script

Running scripts with certain Python versions may require python in place of python3 for some systems
Run the script from the map_storage top-level folder

$ python3 py/update_prop_sets.py

Storing current local properties in memory

  • update_prop_sets.py first retrieves the current prop_set XML files from the local map_storage repository.
  • These file names are stored as keys in a dictionary with empty arrays as values.
  • Each property is stored in memory as an object and placed in the corresponding file's array within the dictionary.
  • Prop is a Python class created to store properties in memory while update_prop_sets.py is running.
  • Each Prop object stores the entire property as a string in the prop_string variable, as well as the property id, iri, label, domain, and sinopia element as strings in separate variables.

This module:

  • Contains the function store_props() which will create Prop objects for each prop_set file when passed a dictionary of prop_set files
  • Uses the lxml.etree libary to traverse through a prop_set file
  • Stores the Prop object in the prop_set file's array

Running XSLT transformation

Once the local properties have all been stored in memory, 001_01_build_update.xsl is run using the Saxon XSLT processor to create updated prop_set instances.

  • 001_01_build_update.xsl and 001_02_source_templates.xsl retrieve information from descriptions of RDA canonical properties published as RDF/XML on GitHub (see RDARegistry/RDA-Vocabularies) and from the University of Washington Libraries RDA Application Profile Extension to assemble prop_set instances.
  • prop_set instances output by the stylesheets include minimal property description reproduced from each data source; they do not include any information for implementing properties in Sinopia resource templates.
  • The organization of these files follows the organization of RDF/XML files describing canonical RDA/RDF properties at RDA-Vocabularies (properties with domain Agent, with domain Expression, and so on for each RDA Entity), with one additional prop_set instance for the UW Extension properties.
  • The stylesheets do not retrieve descriptions of deprecated properties.
  • A second dictionary stores property information from these new prop_set instances.

*For more information on prop_set structure and use, see the sinopia_maps wiki

Adding implementation sets back into the updated prop_sets

This module contains the add_implementation_set() function and the add_props() function which are used to add all implementation sets back into updated prop_set XML files.

Adding implementation_sets for existing properties

For each prop_set file, if a property exists in both the original prop_set and the updated prop_set and the property had an implementation_set before the update, this implementation_set is added back into the updated prop_set under the correct property (note: add_implementation_set() inserts the entire sinopia element back into the prop, including any guidance elements).

  • The match_props() function matches properties in the dictionaries using the prop_iri and if the original property included an implementation_set, add_implementation_set() is run.
  • add_implementation_set() uses the lxml.etree library to insert the implementation set as a child element of the correct property and update the prop_set XML file.

Adding implementation_sets for deprecated properties

If a property from the previous prop_set does not exist in the updated file, this property has been deprecated. If the deprecated property contained an implementation set, then it is currently being used in a resource template and needs to be added to the current prop_set XML file.

For each file:

  • The compare_props() function locates all of the deprecated properties from the file that contain implementation sets and returns these properties in an array.
  • The add_props() function uses the lxml.etree library and inserts the deprecated property into the XML file, marking the property as deprecated.

NOTE: Currently, any comments that exist in prop_set files will not be conserved during updates.