-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
40 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,69 +1,55 @@ | ||
# IDC Index | ||
## About | ||
|
||
The IDC Index is a Python library designed to query basic metadata and download data hosted on the NCI Imaging Data Commons (IDC). | ||
`idc-index` is a Python package that enables query of the basic metadata and download of DICOM files hosted by the [NCI Imaging Data Commons (IDC)](https://imaging.datacommons.cancer.gov). | ||
|
||
## Installation | ||
## 👷♂️🚧 **WARNING**: this package is in its early development stages. Its functionality and API will change. Stay tuned for the updates and documentation, and please share your feedback about it by opening issues in this repository, or by starting a discussion in [IDC User forum](https://discourse.canceridc.dev/).🚧 | ||
|
||
Install the IDC Index using pip: | ||
``` | ||
pip install idc-index | ||
## Usage | ||
|
||
There are no prerequisites - just install the package ... | ||
```bash | ||
$ pip install idc-index==0.2.7 | ||
``` | ||
## Description | ||
... and run queries against the "mini" index of Imaging Data Commons data! | ||
```python | ||
from idc_index import index | ||
|
||
The IDC Index offers a suite of functionalities, enabling users to retrieve diverse information regarding collections, patients, studies, series, and images. The library uses an index of data generated by the SQL query available in the [release notes](https://github.com/ImagingDataCommons/idc-index/releases). | ||
client = index.IDCClient() | ||
|
||
## Usage | ||
query = """ | ||
SELECT | ||
collection_id, | ||
STRING_AGG(DISTINCT(Modality)) as modalities, | ||
STRING_AGG(DISTINCT(BodyPartExamined)) as body_parts | ||
FROM | ||
index | ||
GROUP BY | ||
collection_id | ||
ORDER BY | ||
collection_id ASC | ||
""" | ||
|
||
The library provides the following key functionalities along with their available arguments: | ||
client.sql_query(query) | ||
``` | ||
|
||
- Initialization: Instantiates the IDC Client Class by reading the CSV index and downloading the s5cmd tool. | ||
- IDC Version: | ||
- get_idc_version() : Get the release version of IDC data | ||
- Data Retrieval: | ||
- get_collections(): Retrieve a list of unique collection IDs. | ||
- get_series_size(seriesInstanceUID): Obtain the size of a series in MB by providing the SeriesInstanceUID. | ||
- get_patients(collection_id=None, outputFormat="list" or ("dict" or "df")): Retrieve information about patients within a collection. | ||
- get_dicom_studies(patientId=None, outputFormat="list" or ("dict" or "df")): Retrieve studies for a patient_id. | ||
- get_dicom_series(studyInstanceUID=None, outputFormat="list" or ("dict" or "df")): Retrieve series within a study. | ||
- download_dicom_series(seriesInstanceUID, downloadDir, dry_run=False, quiet=True ): Download images associated with a SeriesInstanceUID to a specified directory. | ||
- download_from_selection(downloadDir=None, dry_run=True, collection_id=None, patientId=None, studyInstanceUID=None): Download images associated with specific filter(s) to a specified directory. | ||
Details of the attributes included in the index are in the release notes. | ||
|
||
## Example | ||
## Tutorial | ||
|
||
Here's an example demonstrating how to use the IDC Client: | ||
This package was first presented at the 2023 Annual meeting of Radiological Society of North America (RSNA) Deep Learning Lab [IDC session](https://github.com/RSNA/AI-Deep-Learning-Lab-2023/tree/main/sessions/idc). | ||
|
||
Please check out [this tutorial notebook](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/labs/idc_rsna2023.ipynb) for the introduction into using `idc-index` for navigating IDC data. | ||
|
||
### Initialize the IDC Client | ||
``` | ||
from idc_index import index | ||
``` | ||
``` | ||
idc_client = index.IDCClient() | ||
``` | ||
### Check IDC Version | ||
``` | ||
idc_client.get_idc_version() | ||
``` | ||
## Resources | ||
|
||
### Query data | ||
``` | ||
idc_client.get_collections() | ||
``` | ||
``` | ||
idc_client.get_patients(collection_id='nsclc_radiomics',outputFormat="list") | ||
``` | ||
``` | ||
idc_client.get_dicom_studies(patientId='D1-0975', outputFormat="dict") | ||
``` | ||
``` | ||
idc_client.get_dicom_series(studyInstanceUID='1.3.6.1.4.1.32722.99.99.191411096482148278088383576909215626011', outputFormat="df") | ||
``` | ||
### Download data | ||
``` | ||
idc_client.download_dicom_series(seriesInstanceUID='1.3.6.1.4.1.32722.99.99.459644025247509819689655120845267405', downloadDir='/content/test') | ||
``` | ||
* [Imaging Data Commons Portal](https://imaging.datacommons.cancer.gov/) can be used to explore the content of IDC from the web browser | ||
* [s5cmd](https://github.com/peak/s5cmd) is a highly efficient, open source, multi-platform S3 client that we use for downloading IDC data, which is hosted in public AWS and GCS buckets | ||
* [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) 3D Slicer extension that relies on `idc-index` for search and download of IDC data | ||
|
||
## Resources | ||
## Acknowledgment | ||
|
||
This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003l. | ||
|
||
If this package helped your research, we would appreciate if you could cite IDC paper below. | ||
|
||
* [https://imaging.datacommons.cancer.gov/](https://imaging.datacommons.cancer.gov/) | ||
* [https://github.com/peak/s5cmd](https://github.com/peak/s5cmd) | ||
> Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. _National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence_. RadioGraphics (2023). https://doi.org/10.1148/rg.230180 |