Skip to content

Indexing EAD in ArcLight

Mx A. Matienzo edited this page May 9, 2017 · 26 revisions

Now that you have your ArcLight application up and running, we need to index data into it.

Download sample EAD

First we need to download or access our EAD's. Let's create a directory where we can store these within our application.

$ mkdir eads

Now let's add some data there.

# This command will save one of our test datasets to the directory you just created
$ wget -P eads/ https://raw.githubusercontent.com/sul-dlss/arclight/master/spec/fixtures/ead/nlm/alphaomegaalpha.xml

Repository configuration

Next we need to run our indexing task and tell the task which "Repository" the EAD file is linked to. By default, your ArcLight application should have a file config/repositories.yml that was generated. This file contains information about the repositories that should be displayed. For example ead alphaomegaalpha.xml we want to link it to the first repository in that file, nlm:

nlm:
  name: 'National Library of Medicine. History of Medicine Division'
  description: 'NLM’s History of Medicine Division collects, preserves, makes available, and interprets for diverse audiences one of the world’s richest collections of historical material related to human health and disease.'
  building: 'Building 38, Room 1E-21'
  address1: '8600 Rockville Pike'
  address2: ''
  city: 'Bethesda'
  state: 'MD'
  zip: '20894'
  country: 'USA'
  phone: ''
  contact_info: '[email protected]'
  thumbnail_url: "https://collections.nlm.nih.gov/pageturnerserver/ajaxp?theurl=http://localhost:8080/fedora/get/nlm:nlmuid-101421040-img/THUMB"

Indexing a signle file

We can now use the index task in ArcLight to index our EAD.

$ FILE=./eads/alphaomegaalpha.xml REPOSITORY_ID=nlm bundle exec rake arclight:index
Loading ./eads/alphaomegaalpha.xml into index...
Indexed ./eads/alphaomegaalpha.xml (in 0.837 secs).

Adding more finding aids and repositories

You can add new repositories to the repositories.yml file. The key that begins a repository is the same value you will use as the REPOSITORY_ID in the indexing rake task.

For convenience you can separate EADs by directory and run the rake arclight:index_dir using the DIR and REPOSITORY_ID environment variables to index files all to the same repository:

# this assumes there's a directory with EAD files called /tmp/sul-spec, and a repository configured with the ID "SPEC"
$ DIR=/tmp/sul-spec REPOSITORY_ID=SPEC bundle exec rake arclight:index_dir