Skip to content

Commit

Permalink
rewrite readme
Browse files Browse the repository at this point in the history
  • Loading branch information
fedorov authored Nov 27, 2023
1 parent b622b3d commit 660141c
Showing 1 changed file with 40 additions and 54 deletions.
94 changes: 40 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,55 @@
# IDC Index
## About

The IDC Index is a Python library designed to query basic metadata and download data hosted on the NCI Imaging Data Commons (IDC).
`idc-index` is a Python package that enables query of the basic metadata and download of DICOM files hosted by the [NCI Imaging Data Commons (IDC)](https://imaging.datacommons.cancer.gov).

## Installation
## 👷‍♂️🚧 **WARNING**: this package is in its early development stages. Its functionality and API will change. Stay tuned for the updates and documentation, and please share your feedback about it by opening issues in this repository, or by starting a discussion in [IDC User forum](https://discourse.canceridc.dev/).🚧

Install the IDC Index using pip:
```
pip install idc-index
## Usage

There are no prerequisites - just install the package ...
```bash
$ pip install idc-index==0.2.7
```
## Description
... and run queries against the "mini" index of Imaging Data Commons data!
```python
from idc_index import index

The IDC Index offers a suite of functionalities, enabling users to retrieve diverse information regarding collections, patients, studies, series, and images. The library uses an index of data generated by the SQL query available in the [release notes](https://github.com/ImagingDataCommons/idc-index/releases).
client = index.IDCClient()

## Usage
query = """
SELECT
collection_id,
STRING_AGG(DISTINCT(Modality)) as modalities,
STRING_AGG(DISTINCT(BodyPartExamined)) as body_parts
FROM
index
GROUP BY
collection_id
ORDER BY
collection_id ASC
"""

The library provides the following key functionalities along with their available arguments:
client.sql_query(query)
```

- Initialization: Instantiates the IDC Client Class by reading the CSV index and downloading the s5cmd tool.
- IDC Version:
- get_idc_version() : Get the release version of IDC data
- Data Retrieval:
- get_collections(): Retrieve a list of unique collection IDs.
- get_series_size(seriesInstanceUID): Obtain the size of a series in MB by providing the SeriesInstanceUID.
- get_patients(collection_id=None, outputFormat="list" or ("dict" or "df")): Retrieve information about patients within a collection.
- get_dicom_studies(patientId=None, outputFormat="list" or ("dict" or "df")): Retrieve studies for a patient_id.
- get_dicom_series(studyInstanceUID=None, outputFormat="list" or ("dict" or "df")): Retrieve series within a study.
- download_dicom_series(seriesInstanceUID, downloadDir, dry_run=False, quiet=True ): Download images associated with a SeriesInstanceUID to a specified directory.
- download_from_selection(downloadDir=None, dry_run=True, collection_id=None, patientId=None, studyInstanceUID=None): Download images associated with specific filter(s) to a specified directory.
Details of the attributes included in the index are in the release notes.

## Example
## Tutorial

Here's an example demonstrating how to use the IDC Client:
This package was first presented at the 2023 Annual meeting of Radiological Society of North America (RSNA) Deep Learning Lab [IDC session](https://github.com/RSNA/AI-Deep-Learning-Lab-2023/tree/main/sessions/idc).

Please check out [this tutorial notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc_rsna2023.ipynb) for the introduction into using `idc-index` for navigating IDC data.

### Initialize the IDC Client
```
from idc_index import index
```
```
idc_client = index.IDCClient()
```
### Check IDC Version
```
idc_client.get_idc_version()
```
## Resources

### Query data
```
idc_client.get_collections()
```
```
idc_client.get_patients(collection_id='nsclc_radiomics',outputFormat="list")
```
```
idc_client.get_dicom_studies(patientId='D1-0975', outputFormat="dict")
```
```
idc_client.get_dicom_series(studyInstanceUID='1.3.6.1.4.1.32722.99.99.191411096482148278088383576909215626011', outputFormat="df")
```
### Download data
```
idc_client.download_dicom_series(seriesInstanceUID='1.3.6.1.4.1.32722.99.99.459644025247509819689655120845267405', downloadDir='/content/test')
```
* [Imaging Data Commons Portal](https://imaging.datacommons.cancer.gov/) can be used to explore the content of IDC from the web browser
* [s5cmd](https://github.com/peak/s5cmd) is a highly efficient, open source, multi-platform S3 client that we use for downloading IDC data, which is hosted in public AWS and GCS buckets
* [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) 3D Slicer extension that relies on `idc-index` for search and download of IDC data

## Resources
## Acknowledgment

This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003l.

If this package helped your research, we would appreciate if you could cite IDC paper below.

* [https://imaging.datacommons.cancer.gov/](https://imaging.datacommons.cancer.gov/)
* [https://github.com/peak/s5cmd](https://github.com/peak/s5cmd)
> Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. _National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence_. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

0 comments on commit 660141c

Please sign in to comment.