Skip to content

librariam/dspace-exams-ingest-scripts

 
 

Repository files navigation

Exam metadata generation and ingest for DSpace

The Old Exams Repository is maintained by the University of Toronto Libraries. It contains the 3 most recent years of exams.

System Requirements


Installation

Clone or download the scripts to your local repository. Ensure you have a the pre-requistie software installed before running the scripts.

You must run step1.py before running step2.py, there are more details below about the usage and workflow.


Usage

  1. python step1.py /directory_path_to_pdf_exams/ campus[A, B or C]

  2. python step2.py '/directory_path_to_pdf_exams/


Workflow

1. Scanning & Filenaming

  • Exams are scanned into PDF with file names
  • Each PDF file must contain the course code, month and year.
  • DSpace Dublin Core metadata are generated based on each PDF's filename.

Example: Campus C, they should use "au" for August and "ap" for April to properly distinguish these two months.

detailed exam file naming convention found here

2. Generate metadata

  • Once exams are received in PDF format from campuses A, B or C file metadata is generated
  • Dublin Core metadata is generated from the file names using beautiful soup
  • The script also uses a CSV file of departmental codes per campus for mapping

sample generated metadata file found here

3. DSpace Simple Archive

  • step2.py script is used to package the PDFs and metadata into DSpace simple archives for ingest

4. Batch Import Into TSpace

  • DSpace simple archives are imported into their respective collections via batch import
  • Collections older than 3 years old are removed

License

DSpace Simple Archives Importer is licensed under Apache License 2.0.

About

Exam metadata generation and ingest for DSpace

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%