Skip to content

ucd-library/dams-amerine-wine-labels-metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amerine Wine Labels

2024-10 Add Titles

I need a method to add the wine label information from Eric into the label data.

Eric has a spreadsheet, that includes producer, brand, wine type, date city/region, abv, proof marginalia. This was saved as titles.csv.

I need to update the jq files first. This script creates a titles.json file from the csv file. It includes titles and other information

PREFIX wine_label: <ark:/87287/d7794w/schema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ucdlib: <http://schema.library.ucdavis.edu/schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX schema: <http://schema.org/>

CONSTRUCT {
  ?ark a schema:label;
    schema:name ?name;
    rdfs:label ?labelValue;
    wine_label:folder ?old_title;
    wine_label:digitization_number ?label_number;
    wine_label:producer ?producer;
    wine_label:brand ?brand;
    wine_label:wine_type ?wine_type;
    wine_label:date ?date;
    wine_label:city_region ?city_region;
    wine_label:ABV ?abv;
    wine_label:proof ?proof;
    wine_label:marginalia ?marginalia;
    wine_label:description ?description;
    wine_label:questions ?questions;
    .
  }
WHERE {
  BIND(replace(?url,"https://digital.ucdavis.edu/collection/amerine-wine-labels/labels/label_","") as ?label_number)
  BIND('ark:/87287/d7794w' AS ?collection)
  BIND(uri(concat("ark:/85140/", replace(?filename,".jpg",""))) AS ?ark)
  BIND(coalesce(?producer, "") AS ?p)
  BIND(coalesce(concat(" § ",?brand), "") AS ?b)
  BIND(coalesce(concat(" § ",?wine_type), "") AS ?w)
  BIND(coalesce(concat(" § ",?city_region), "") AS ?c)

  BIND(concat(?p,?b,?w,?c) AS ?name)
}
tarql titles.rq  titles.csv | \
    riot --syntax=turtle --formatted=jsonld | \
    jsonld compact -c $(pwd)/context.json > titles.json

Then you can make individual titles.json files with something like:

for i in $(cd items; echo ark:/85140/d4????); do \
    echo $i;
    jq --arg ark "$i"  '.["@graph"][] | select(.["@id"]==$ark) | del(.["@id"]) | del(.["@type"])' titles.json > items/$i/title.json;
done

And you can join these together with the following jq conversion. I’m also removing the isPartOf, and adding a better publisher

.[0]
 +
(.[1] |
  del(.["schema:isPartOf"]) |
  del(.["schema:identifier"][] | select(contains("ark:") | not )))
+ .[2]
+ {
  "schema:publisher":{
      "@id":"http://id.loc.gov/authorities/names/no2008108707",
      "schema:name":"University of California, Davis. General Library. Dept. of Special Collections"
    }
}
c=context.json
for d in items/ark:/85140/d4????; do \
    echo -n -e "$d\r";\
    cp ${d}.jsonld.json ${d}.jsonld.json-;\
    jq -s -f add_titles.jq ${c} ${d}.jsonld.json- ${d}/title.json  > ${i};
done

2024-10 remove @graph

In the current metadata format, we tried to include information that the label describesWine. Here’s an example:

{
  "@graph": [
    {
      "ucdlib:describesWine": {
        "@id": "@base:#wine"
      "schema:identifier": [
        "label_0041",
        "ark:/85140/d4001n"
      ]
    },
    {
      "@id": "@base:#wine",
      "@type": "ucdlib:Wine",
      "ucdlib:WineType": {
        "@id": "ucdlib:Still"
      },
      "http://www.wikidata.org/prop/direct/P297": "ES"
    }
  ]
}

We are removing this, and as a result, removing the @graph component. We only do this to the records with a @graph node.

I’ll use this opportunity to add in a new context file.

{
  "@context": {
    "wine_label": "ark:/87287/d7794w/schema#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ucdlib": "http://schema.library.ucdavis.edu/schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "schema": "http://schema.org/"
  }
}

And you can join these together with the following jq conversion:

.[0]
 +
(.[1]["@graph"][0] |
  del(.["ucdlib:describesWine"])
)
c=context.json
for i in $(grep -l '@graph' items/ark:/85140/d4????.jsonld.json); do \
    echo $i;
    cp ${i} ${i}-;\
    jq -s -f rm_graph.jq ${c} ${i}- > ${i};\
done

2024-09 Initial Metadata Commit

Originally, there was not an identifier for the collection; I have minted, ark:/87287/d7794w for this collection.

Get from old version

Somehow I managed to mess up all the files that I had been adding to the amermine-wine-labels metadata file. I got the old data back from sandbox export, and I’m trying to upload them again.

In the process I noticed that the form of the metadata changes midway through the data. Labels of:

for i in label_[0123]??? label_4[0-8]?? label_490? label_491[0123]; do echo $i; done

have two items within a ["@graph"], while,

for i in label_491[456789] label_49[2-9]?? label_[5-9]???; do echo $i; done

Don’t have a graph, and have the information in the root. So, I need to replace what I had before.

for i in label_[0123]??? label_4[0-8]?? label_490? label_491[0123]; do ark=$(jq -r '.["@graph"][0]["schema:identifier"][] | select(.|match("^ark:"))' $i.jsonld.json); mkdir -p $(dirname $ark); mv $i $ark; mv $i.jsonld.json $ark.jsonld.json ; done
for i in label_491[456789] label_49[2-9]?? label_[5-9]???; do ark=$(jq -r '.["schema:identifier"][] | select(.|match("^ark:"))' $i.jsonld.json); mkdir -p $(dirname $ark); mv $i $ark; mv $i.jsonld.json $ark.jsonld.json ; done

This script just changes name of the metadata. Now, I could also rexport the data, because that’s where this original data came from, but by looking at these data files, they are the same, so I’ll just rsync the ones I messed up.

for i in d4*.json; do echo $i; diff  ../../../v1/items/ark\:/85140/$i $i; done | less

And now, I think I need to check in a version of the metadata, before I try this again.

Original Processing

I seem to have been doing two things. First, I went through and identified every image that is simply a card label, and not a wine label. I cleverly called the metadata for these label.json, which is pretty dumb. I will renanme these as index-card-label.json which is a bit more understandable.

The way that I would do this was be going back to the directory with the jpegs, and I’d rename the metadata.json file to label.json. Then, I’d remove the metadata.ttl data. Then, I would often copy the metadata from the next label and rewrite the label info. That’s probably to get the metdata for the upcoming labels.

l=3629; cd ../a$l; mv metadata.json label.json; rm metadata.ttl; cat label.json
 cp ../a3630/metadata.json label.json; cat label.json

Also, for a few index cards, we only have the thumbnail, not the full index. These are cards a1044, a1070, and a1091. a1044 looks like it says, K,L,M. a1070 says N,O,P. a1091 says Q,R,S,T. These all have a index-card-label card assigned to them.

The last index-card-label in the data is item a3659. After that, either there are no more labels, or else the labels are no longer catalogged.

There are no full images without a thumbnail. Note, there are no sequences missing from the list of items.

Other designations

However, the labels alone do not seem to indicate where all the breaks exist. We can go through the data, and see where all the changes in metadata occur.

	last_metadata='';
	cur_folder='folder/'
	for a in data/a*; do
		b=`basename $a`;
		f=${b#a*}
		# Maybe a new Folder
		if [[ -f $a/metadata.json ]]; then
			this_metadata=`tr -d "\n" < $a/metadata.json | sed -e 's/\s//g'`;
			if [[ "$this_metadata" != "$last_metadata" ]] ; then
				cur_folder=folder/$f
				cur_dir=$cur_folder
				[[ -d $cur_folder ]] || mkdir $cur_folder;
				j=`basename $this_json`;
				jq . < $a/metadata.json > $cur_folder/metadata.json
				last_metadata=$this_metadata;
			fi
		if [[ -f $a/full.jpg ]] ; then
			cp $a/full.jpg $cur_dir/label_$f.jpg
		fi
		elif [[ -f $a/index-card-label.json ]]; then
			cur_dir=$cur_folder/index_card_$f
			[[ -d $cur_dir ]] || mkdir $cur_dir;
			echo $cur_dir/metadata.json
			jq . < $a/index-card-label.json > $cur_dir/metadata.json
			if [[ -f $a/full.jpg ]] ; then
				cp $a/full.jpg $cur_dir/index_card_$f.jpg
			fi
		fi
	done

Processing Examples

Once I had the json files, I sometimes needed to go back and create versions, since I changed things. For example, when switching to schema.org, I needed to change the language designation, since they use a IETF Standard. `jq` is your friend in this case. For example, here’s that change.

for i in $(find folder -name metadata.json | xargs grep -l language_id ) ; do
 mv $i $i.bak;
 jq '. |= . + {inLanguage: (.language_id+(if has("country_id") then "-"+.country_id else "" end)),country:.country_id} | del(.language_id, .country_id) ' $i.bak > $i;
done

Updating ARKs

These ARKs were currently pointing to the labelthis project. They have been updated with the following command. This runs on the metadata.ttl files in the database.

for i in $(find . -name metadata.ttl); do
 id=$(sparql -q --data=$i --results=CSV --query=- <<<"prefix : <http://schema.org/>  select ?n WHERE { ?s :identifier ?n filter regex(?n,'^ark:') .}" | sed -e 's/\r//g' | tail -1);
 http --session=ucd-library POST https://ezid.cdlib.org/id/$id Content-Type:text/plain <<<"_target:https://digital.ucdavis.edu/$id";
done

About

Amerine (Maynard) Wine Label Collection (ark:/87287/d7794w)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published