Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of the underlying index tables #126

Open
fedorov opened this issue Sep 12, 2024 · 9 comments
Open

Documentation of the underlying index tables #126

fedorov opened this issue Sep 12, 2024 · 9 comments
Labels
documentation Improvements or additions to documentation

Comments

@fedorov
Copy link
Member

fedorov commented Sep 12, 2024

We need to set up a process where we could have schemas and relationships among the growing number of those smaller tables automatically reflected in our documentation, and ideally have a visual browser where users could explore those relationships - automatically generated from the schema documents.

Related thread with ideas and relevant technologies: https://discord.com/channels/909674491309850675/921073327009853451/1283795006477565983

@fedorov fedorov added the documentation Improvements or additions to documentation label Sep 12, 2024
@fedorov
Copy link
Member Author

fedorov commented Sep 16, 2024

Some NCI components use this for describing the model: https://github.com/CBIIT/c3dc-model

@vkt1414
Copy link
Collaborator

vkt1414 commented Sep 29, 2024

@fedorov in the past, I was discussing this with Deepa about having a relationship diagram.

https://github.com/drawdb-io/drawdb was the tool I found. Its pretty good in my opinion. If you find it good as well, I can help with this issue.

@fedorov
Copy link
Member Author

fedorov commented Sep 30, 2024

The other tool mentioned in the thread above - Mermaid - seemed like a nice solution:

drawdb looks sleek, but I think the question is what is next once you modeled it there? I don't want to create yet another manual task for anyone.

On the other hand, we can automatically generate Mermaid code directly from the Parquet files (column name + data type). We could then embed that Mermaid code into the docs. We could also augment idc-index-data with a mechanism to either inject descriptions of the columns directly into Parquet files metadata fields, or require a JSON schema to accompany each query. Or if we want to play nice with CRDC use Bento MDF. We could next generate Mermaid diagram code as part of the release, which could then be picked up downstream in the IDC documentation and/or idc-index documentation.

@vkt1414
Copy link
Collaborator

vkt1414 commented Oct 1, 2024

Looks very cool. I'll try to learn how it works.

@fedorov
Copy link
Member Author

fedorov commented Oct 2, 2024

If you plan to contribute your time working on this, I would prefer if we find the time to chat before you jump into getting a PR ready! ;-)

@vkt1414
Copy link
Collaborator

vkt1414 commented Oct 2, 2024

Sure @fedorov! My progress can be slow, but mermaid seems exciting, so I'll see if it can stop me from sleeping early. I can meet tomorrow or Friday after 4 pm.

@fedorov
Copy link
Member Author

fedorov commented Oct 3, 2024

Sounds good - let's coordinate on Discord! Thank you!

@vkt1414
Copy link
Collaborator

vkt1414 commented Oct 4, 2024

@fedorov could I know where the column descriptions are there for the tables in idc-index?

@fedorov
Copy link
Member Author

fedorov commented Oct 7, 2024

As discussed last week, here's the tentative plan:

  1. add Yaml or JSON file accompanying each of the tables generated by the queries in https://github.com/ImagingDataCommons/idc-index-data/tree/main/scripts and https://github.com/ImagingDataCommons/idc-index-data/tree/main/assets in idc-index-data. This existing documentation can be used to initialize column descriptions: https://github.com/ImagingDataCommons/idc-index/blob/main/docs/column_descriptions.md.
  2. while generating Parquet files with the resulting tables, inject column descriptions into the Parquet schema (see related discussion in https://www.perplexity.ai/search/i-am-distributing-sql-tables-a-yWRtF.V8SMOgIcTiZQ6Dsw.
  3. Upon generating of the release in idc-index-data, include schema files along with the parquet files as release assets.
  4. While generating documentation here in idc-index, replace https://github.com/ImagingDataCommons/idc-index/blob/main/docs/column_descriptions.md with the content automatically generated from Parquet.
  5. Generate Mermaid descriptions for rendering the relationships across tables automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants