Skip to content

jcheadle-rti/heal_segmentation

Repository files navigation

NIH RePORTER Query Script

This repository contains an input file, a python script, and a bash script which, when executed, reads the project IDs found in the input file, queries the NIH RePORTER API using these IDs, and outputs 4 files:

  • heal_awards.csv - a table of the HEAL awards and associated information from the NIH RePORTER API
  • heal_awards_pubs.csv - a table of publications associated with HEAL awards
  • projects_not_in_reporter.txt - a list of project numbers that do not return information from the NIH RePORTER API
  • projects_with_missing_nums.txt - a list of project titles that do not contain project numbers and thus are not queryable with the NIH RePORTER API

Updates

15 APR 2022

Code has been updated to optionally pull all project_nums that are associated with appl_ids. Note: appl_ids uniquely identify records, whereas project_nums do not - For a center grant such as the MAARC grant, the project_num (1U2CDA050098-01) is identical to the Survey, Data, methods, and Administrative Core grants

03 FEB 2022

This repository has been updated to allow for input IDs to be either appl_ids or project_nums. It is highly recommended to use appl_ids where possible due to the ambiguity with naming conventions of project_nums.

17 DEC 2021

This repository has been updated to now abstract out the id_type, input file (and corresponding project_id and project_title column names), output path, output prefix for the files. There is also the option to get rid of non-UTF-8 chars with the --replace-non-utf flag.

A user changes script parameters and executes the script in the query_nih_reporter.sh bash script.

Inputs

This script can take a list of appl_ids or project_nums. The list of appl_ids was generated by NIH.

A list of HEAL project (along with project_nums) can be downloaded from the funding awarded website. Information about the API we use in this project can be found at the NIH RePORTER API website.

Considerations

  • When possible, it is better to use appl_ids instead of project_nums.
  • The project_num input data we work with here was last updated in January 2022.
  • Some of the project numbers from the input data CSV are blank, meaning we can't get information from the NIH RePORTER API.
  • Some project numbers are valid, but do not appear to be in NIH RePORTER.
  • There does not seem to be a hook within the NIH Reporter wherein we could look up HEAL studies, thus we rely on the aforementioned input list.

Quick Start

Requirements

It is assumed that those running this script are using Debian-flavored Linux and have bash installed.

  • Python 3.6+
  • pip
  • git
  • venv
  • bash

Setup

  • Clone the repository git clone https://github.com/jcheadle-rti/heal_segmentation.git
  • Create and activate the virtual environment python3 -m venv venv; source venv/bin/activate
  • Update pip and install required packages
    • pip install --upgrade pip
    • pip install -r requirements.txt

Run Bash Script

  • Review the bash script (query_nih_reporter.sh) to confirm the parameters are accurate
  • At the command prompt, run bash query_nih_reporter.sh

Outputs

  • heal_awards.csv - a table of the HEAL awards and associated information from the NIH RePORTER API
  • heal_awards_pubs.csv - a table of publications associated with HEAL awards
  • projects_not_in_reporter.txt - a list of project numbers that do not return information from the NIH RePORTER API
  • projects_with_missing_nums.txt - a list of project titles that do not contain project numbers and thus are not queryable with the NIH RePORTER API

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published