Installation

Software dependencies

We recommend the use of pip and virtualenv for environment and dependency management in this and other Python projects. If you don't have them installed we recommend sudo easy_install pip and then sudo pip install virtualenv.

Bootstrapping a development environment

Copy openemory/localsettings.py.dist to openemory/localsettings.py and configure any local settings: DATABASES, SECRET_KEY, SOLR_, FEDORA_, customize LOGGING, etc.
Create a new virtualenv and activate it.
Install fabric: pip install fabric
Use fabric to run a local build, which will install python dependencies in your virtualenv, run unit tests, and build sphinx documentation: fab build

Deploy to QA and Production should be done using fab deploy.

After configuring your database, run syncdb:

python manage.py syncdb

Use eulindexer to index repository content into your configured Solr instance.

Configure the environment

When first installing this project, you'll need to create a virtual environment for it. The environment is just a directory. You can store it anywhere you like; in this documentation it'll live right next to the source. For instance, if the source is in /home/httpd/openemory/src, consider creating an environment in /home/httpd/openemory/env. To create such an environment, su into apache's user and:

$ virtualenv --no-site-packages /home/httpd/openemory/env

This creates a new virtual environment in that directory. Source the activation file to invoke the virtual environment (requires that you use the bash shell):

$ . /home/httpd/openemory/env/bin/activate

Once the environment has been activated inside a shell, Python programs spawned from that shell will read their environment only from this directory, not from the system-wide site packages. Installations will correspondingly be installed into this environment.

Note

Installation instructions and upgrade notes below assume that you are already in an activated shell.

Install System Dependencies

Beginning with Release 0.5 - Faculty Profiles, OpenEmory uses the Python Imaging Library Pillow to support faculty profile photo uploads. -Pillow can be installed via pip, but support for JPEG and PNG formats depends on the certain system libraries. For JPEG, libjpeg is required; for PNG, libz is required. On recent versions of Ubuntu, libjpeg8-dev and zlib1g-dev packages should be installed (libjpeg62-dev probably works with the path adjustment noted below).

Note

By default on Ubuntu, libz.so is not installed directly in /usr/lib, but in an architecture-specific like /usr/lib/i386-linux-gnu/ or /usr/lib/x86_64-linux-gnu. As a work-around, add a symlink either to /usr/lib or to the virtualenv lib directory, e.g.:

$ sudo ln -s /usr/lib/i386-linux-gnu/libz.so /usr/lib

To test that the required libraries are installed correctly, pip install PIL (or pip install --upgrade PIL if already installed). At the end of the installation, PIL setup provides a summary of the configuration. Check to see that JPEG and PNG are listed as available:

--- JPEG support available
--- ZLIB (PNG/ZIP) support available

python-ldap requires the following packages: python-dev libldap2-dev libsasl2-dev libssl-dev

Install python dependencies

OpenEmory depends on several python libraries. The installation is mostly automated, and will print status messages as packages are installed. If there are any errors, pip should announce them very loudly.

To install python dependencies, cd into the repository checkout and:

$ pip install -r pip-install-req.txt

Note: installation of some dependencies (i.e. django-tracking) requires that django settings are available. To install these manually, use this:

$ env DJANGO_SETTINGS_MODULE=openemory.settings pip install -r pip-install-after-config.txt

If you are a developer or are installing to a continuous integration server where you plan to run unit tests, code coverage reports, or build sphinx documentation, you probably will also want to:

$ pip install -r pip-dev-req.txt

After this step, your virtual environment should contain all of the needed dependencies.

Solr/EULindexer

OpenEmory uses Solr and :mod:`eulindexer` for searching and indexing Fedora content. The Solr schema included with the source code at solr/schema.xml should be used as the Solr schema configuration. For convenience, this directory also contains a sample solrconfig.xml and minimal versions of all other solr configuration files used by the index.

The url for accessing the configured Solr instance should be set in localsettings.py as SOLR_SERVER_URL.

Repository content accessible via OpenEmory should be indexed using EULindexer. To add OpenEmory to an installed and configured instance of EULindexer, add the deployed indexdata url to the eulindexer localsettings.py, e.g.:

INDEXER_SITE_URLS = {
    'openemory': 'http://openemory.library.emory.edu/indexdata/',
}

To populate the index initially, or to reindex all content, run the reindex script that is available in EULindexer:

$ python manage.py reindex -s openemory

Install the application

Apache

After installing dependencies, copy and edit the wsgi and apache configuration files in src/apache inside the source code checkout. Both may require some tweaking for paths and other system details.

Configuration

Configure application settings by copying localsettings.py.dist to localsettings.py and editing for local settings (database, Fedora repository, Pid Manager, etc.).

After configuring all settings, initialize the db with all needed tables and initial data using:

$ python manage.py syncdb
$ python manage.py migrate

Load Fedora fixtures and control objects to the configured repository using:

$ python manage.py syncrepo

This application makes use of the :mod:`django.contrib.sites` module to generate ARKs. After running syncdb and starting the web app, use the Django DB Admin site to configure the default site by replacing the example.com domain with the domain for the deployed web application.

Cron jobs

Session cleanup

The application uses database-backed sessions. Django recommends periodically clearing the session table in this configuration. To do this, set up a cron job to run the following command periodically from within the application's virtual environment:

$ manage.py cleanup

This script removes any expired sessions from the database. We recommend doing this about every week, though exact timing depends on usage patterns and administrative discretion.

Index faculty

The application relies on current directory information about faculty. This information is provided by Emory Shared Data, but we also index it in solr for improved searching capabilities. Set up a nightly cron job to re-scan the ESD data and update the index:

$ manage.py index_faculty

Statistics email

The application collects usage statistics and sends quarterly reports to article authors. Set up a cron job to create and send these reports by running the following command from within the application's virtual environment. The script should run at the beginning of January, April, July, and October:

$ manage.py quarterly_stats_by_author

Harvest PMC Data

The application harvests article metadata from PubMed Central nigtly and stores it in the OpenEmory SQL database to be later ingested. The followng command should be run to keep the harvest queue up to date. In this mode article metadata is harvested from the last harvest date to the present:

$ manage.py fetch_pmc_metadata --auto-date

Additionally, there is a second job which runs once a month that does a full harvest to catch any records that may have been missed for any reason:

$ manage.py fetch_pmc_metadata

Email Reports of Duplicates

Set up iWatch to trigger notifications on folder where reports are created.

Upgrade Notes

Release 2.2.5 - OpenEmory Relaunch Interface Changes

Please use the Django Admin to edit the flatpage contents in the database so that the site navigation can be updated. The "/about/authors-rights/" needs to be updated to "/about/author-rights/", title "Authors' Rights" needs to be updated to "Author Rights".
Please use the Django Admin to edit the flatpage contents in the database so that the site navigation can be updated. The "/data-archiving/" needs to be updated to "/publishing-your-data/", title "Data Archiving" needs to be updated to "Publishing Your Data".
Please use the Django Admin to edit the flatpage contents in the database so that the site navigation can be updated. There needs to be "/about/depositadvice/" added, "/how-to/submit/" updated, and "/about/staff/" title updated.
Please check the "django_flatpage_sites" table in the database and make sure that the "site_id" is all marked as "1" or the "site_id" that we are using for this app.
Currently the Admin page may not be viewable due to a problem in eullocal; until it is fixed permanently, we just need to delete: SiteProfileNotAvailable from /home/httpd/openemory/env/lib/python2.7/site-packages/eullocal/django/ldap/backends.py.

Release 2.1.2 - Merging Old Preconnector

fixing embargo duration
pdf file download bug
pubsid report
download pmc subset

Release 2.1.1 - Author Enhancements

fixing styles for publication page
adjusting mods to save non emory faculty authors

Release 2.1.0 - Content Type Harmonization

mime type debugging
fixing styles

Release 2.0.0 - New Content Type (Presentation)

adding new content

Release 1.9.0 - New Content Type (Poster)

adding new content

Release 1.8.0 - New Content Type (Report)

debugging conflicting policies in XACML

Release 1.7.0 - New Content Type (Conference)

Release 1.6.0 - New Content Type (Chapter)

run this script to cleanup journal articles (updated)

$ python manage.py journal_title

Release 1.5.0 - New Content Type (Book)

run this script to match all content models for articles and books

$ python manage.py cmodel_cleanup

Release 1.4.0 - Author Enhancements

run this script to match all current journal titles with Sherpa Romeo

$ python manage.py journal_title

Release 1.3 - Pre Fedora Migration

run migrations for downtime

$ python ./manage.py migrate downtime $ python ./manage.py migrate mx

Release 1.2.16 - Connector

run migrations for publication

$ python ./manage.py migrate publication

create LastRun object:

$ from openemory.publication.models import LastRun
$ LastRun(name='Convert Symp to OE', start_time='2014-01-01 00:00:00').save()

Set up iWatch to trigger notifications on folder where reports are created
Setup cron job to run import command
Configure REPORTS_DIR in localsettings.py

Release 1.2.10 - Symplectic Elements

run migrations for accounts to add add_articlerecord to Site Admin group permissions:
```
$ python manage.py migrate accounts
```

Add the following variables to localsettings.py:

# SYMPLECTIC-Elements
SYMPLECTIC_BASE_URL = <URL>
SYMPLECTIC_USER = <USER>
SYMPLECTIC_PASSWORD = <PASS>

Release 1.2.9 - Odds and Ends

Run migrations:
```
$ python ./manage.py migrate accounts
```

Release 1.2.7 - OAI modifications

Run add_dc_ident to modify dc data:

$ python ./manage.py add_dc_ident --username=<USERNAME>

Run add_to_oai to update OAI info:

$ python ./manage.py add_to_oai --username=<USERNAME>

Release 1.2.5 - Bug Fix

The system pip and virtualenv packages need to be updated before the fab file is run:
```
$ sudo pip install --upgrade pip
$ sudo pip install --upgrade virtualenv
```
Run add_dc_ident to restore dc identifiers:
```
$ python ./manage.py add_dc_ident
```

Release 1.2.4 - Captcha / Bug Fixes

Add the following to local setting BEFORE fab is run. Values will be provided at deploy time:

# reCAPTCHA keys for your server or domain from https://www.google.com/recaptcha/
RECAPTCHA_PUBLIC_KEY = ''
RECAPTCHA_PRIVATE_KEY = ''
RECAPTCHA_OPTIONS = {}

Release 1.2.3 - OAI

Run syncrepo to load collection object:
```
$ python ./manage.py syncrepo
```
A manage commnd needs to be run to prepare the articles to be harvested by OAI:
```
$ python manage.py add_to_oai --username=<USERNAME> > oai.log
```

Release 1.2.2 - License and Rights Enhancements

Run migrations to add License model:
```
$ python ./manage.py migrate
```
Run the following command to load the initial license info:
```
$ python ./manage.py loaddata init_license
```
A manage commnd needs to be run to remove empty contentMetadata datastreams, copy license info into the MODS and ADD OAI info. The script should be run with the fedoraAdmin user:
```
$ python manage.py cleanup_articles --username=<USERNAME> > cleanup.log
```

Release 1.2 - Search Engine Optimization and bug fixes

New configurations have been added localsettings.py:
- GOOGLE_ANALYTICS_ENABLED - set True/False to enable/disable Google Analytics on the site (analytics should generally only be enabled in production)
- GOOGLE_SITE_VERIFICATION - set to the value provided by Google Webmaster Tools to allow site verification
See localsettings.py.dist for examples.

Release 1.0 - Design Integration, Rights and Technical Metadata

Now using :mod:`django.contrib.flatpages` for pages with static site content (about, how-tos, etc). Run syncdb and migrate to update the database:
```
$ python manage.py syncdb
$ python manage.py migrate
```

Note

For an existing installation with a database you want to preserve, you will have to fake the 0012_add_model_announcement migration if you receive the error message Table accounts_announcement already exists:

$ python manage.py migrate accounts 0012 --fake --delete-ghost-migrations

You can then run the migrate command above to finish the migrations.

A nightly cron job is needed to run the following command to check for embargoes that have expired and reindex them so that the full text can be searched:
```
$ python manage.py expire_embargo
```
The output of this script should be redirected to a log. The log Should be rolled on a regular basis.
A nightly cron job is needed to sync indexed faculty data with ESD:
```
$ python manage.py index_faculty
```
A cron cron job is needed to run at the beginning of each quarter to send out stats for the previous quarter:
```
$ python manage.py quarterly_stats_by_author
```
The output of this script should be redirected to a log. The log Should be rolled on a regular basis.

Release 0.7 - Polish & Prep

ESD faculty information is now indexed in Solr for search functionality. In order to accommodate indexing disparate types of data, the unique key for Solr has been changed. Solr should be configured with the new schema, and then all data must be cleared and reindexed.
Restart eulindexer after this and any other solr schema changes.
After updating Solr with the new schema, index Faculty data from Emory Shared Data into Solr:
```
$ python manage.py index_faculty
```
This release adds models and migrations. Sync and migrate the database:
```
$ python manage.py syncdb
$ python manage.py migrate
```

Release 0.6 - Faculty Demo

Now makes use the PID manager and the :mod:`django.contrib.sites` module to generate ARKs for repository content. To configure:
- After running syncdb and starting the web app, use the Django DB Admin site to configure the default site by replacing the example.com domain with the domain for the deployed web application.
- Create a domain and user for OpenEmory ARKs on the PID manager (the user should have permissions to create pids and targets), and configure all of the PIDMAN_ settings in localsettings.py based on the examples in localsettings.py.dist

Release 0.5 - Faculty Profiles

Now includes :mod:`south` for database migrations. For a new installation, you should run syncdb to add the required database tables for south and any of the other tables not managed by South:
```
$ python manage.py syncdb
```
Note

By default, Django will prompt you to create a superuser when you run syncdb on a new database; since the user profile model is managed by :mod:`south`, you should not attempt to create any accounts until after you have completed the migrations. To skip this prompt, you may run syncdb with the --noinput option. After migrations are complete, use the createsuperuser manage.py command to create a new super ures.

Then run the south migrate command to update the database tables that are now managed by :mod:`south`:
```
$ python manage.py migrate
```
For an existing installation with a database you want to preserve, run the syncdb step above to add the required database tables for south, and then fake the initial migrations:
```
$ python manage.py migrate accounts 0001 --fake
$ python manage.py migrate harvest 0001 --fake
$ python manage.py migrate publication 0001 --fake
```
After this step, you should be able to use South migrations normally.
Python dependencies now include Python Imaging Library (PIL). See Install System Dependencies for instructions on the libraries required for JPEG and PNG support.
Profile editing provides an option for users to upload images; this user uploaded content will be stored in the configured MEDIA_ROOT directory. System administrators may wish to revisit the configuration for this Django setting (previously set in settings.py but now included in localsettings.py; see localsettings.py.dist for example configuration).

Release 0.4.x - Article Metadata

Run syncdb to add new article review permissions and update the Site Admin group permissions:
```
$ python manage.py syncdb
```

Added new logic for generating Article MODS from NLM records harvested from PubMed Central. Any existing test records should either be removed and reharvested, or updated as follows. Activate the virtualenv and start the Django console:

$ python manage.py shell

Then run the following to update Articles in the configured repository with NLM xml:

from eulfedora.server import Repository
from openemory.publication.models import Article
from django.conf import settings
repo = Repository(username=settings.FEDORA_MANAGEMENT_USER,
    password=settings.FEDORA_MANAGEMENT_PASSWORD)
for a in repo.get_objects_with_cmodel(Article.ARTICLE_CONTENT_MODEL, type=Article):
  if a.contentMetadata.exists:
    try:
      if str(a.contentMetadata.content):
        a.descMetadata.content = a.contentMetadata.content.as_article_mods()
        a.save('populating MODs from NLM xml')
    except:
      pass

This release includes new solr fields. Configure a new core and reindex project content into it.
This release includes support for editing inactive Fedora items. This support requires updated Fedora policies. Update Fedora policies while upgrading this package.
Updated Fedora policies provide read access to all OpenEmory content (not published content only) to logged-in users with the "indexer" role. It is recommended to create a Fedora user with an indexer role and configure :mod:`eulindexer` to use this account. For example:
```
<user name="eulindexer" password="...">
  <attribute name="fedoraRole">
    <value>indexer</value>
  </attribute>
</user>
```

Release 0.3.x - Searching & Social

This release includes new relational Python modules and database tables. To upgrade, install new python dependencies in your virtualenv:
```
$ pip install -r pip-install-req.txt
```
And then update the database with new tables via syncdb:
```
$ python manage.py syncdb
```
Note

As part of this release, the user profile model has been customized, which entails a database change. If you wish to create profiles for existing Emory LDAP users, run the inituser script with the usernames. You may also want to drop the former ldap profile table, emory_ldap_emoryldapuserprofile, as it is no longer in use. Any users created or updated after this upgrade will get the new profiles automatically.

Release 0.2.x - Harvesting

This release includes new relational database tables and fixtures. Upgrade requires a syncdb:
```
$ python manage.py syncdb
```
This release changes the project solr schema. Before installing the software, set up a new solr core for the project. The solr configuration files will be produced as part of the release. If the URL of this solr core is different from the old one then update it in localsettings.py. After the updated OpenEmory website is live, reindex the site. As eulindexer:
```
$ python manage.py reindex -s openemory
```

Files

DEPLOYNOTES.rst

Latest commit

History

DEPLOYNOTES.rst

File metadata and controls

Installation

Software dependencies

Bootstrapping a development environment

Configure the environment

Install System Dependencies

Install python dependencies

Solr/EULindexer

Install the application

Apache

Configuration

Cron jobs

Session cleanup

Index faculty

Statistics email

Harvest PMC Data

Email Reports of Duplicates

Upgrade Notes

Release 2.2.5 - OpenEmory Relaunch Interface Changes

Release 2.1.2 - Merging Old Preconnector

Release 2.1.1 - Author Enhancements

Release 2.1.0 - Content Type Harmonization

Release 2.0.0 - New Content Type (Presentation)

Release 1.9.0 - New Content Type (Poster)

Release 1.8.0 - New Content Type (Report)

Release 1.7.0 - New Content Type (Conference)

Release 1.6.0 - New Content Type (Chapter)

Release 1.5.0 - New Content Type (Book)

Release 1.4.0 - Author Enhancements

Release 1.3 - Pre Fedora Migration

Release 1.2.16 - Connector

Release 1.2.10 - Symplectic Elements

Release 1.2.9 - Odds and Ends

Release 1.2.7 - OAI modifications

Release 1.2.5 - Bug Fix

Release 1.2.4 - Captcha / Bug Fixes

Release 1.2.3 - OAI

Release 1.2.2 - License and Rights Enhancements

Release 1.2 - Search Engine Optimization and bug fixes

Release 1.0 - Design Integration, Rights and Technical Metadata

Release 0.7 - Polish & Prep

Release 0.6 - Faculty Demo

Release 0.5 - Faculty Profiles

Release 0.4.x - Article Metadata

Release 0.3.x - Searching & Social

Release 0.2.x - Harvesting