Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support annotation of substrings with HERD or another system #1092

Open
rly opened this issue Apr 5, 2024 · 3 comments
Open

[Feature]: Support annotation of substrings with HERD or another system #1092

rly opened this issue Apr 5, 2024 · 3 comments
Assignees
Labels
category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s)
Milestone

Comments

@rly
Copy link
Contributor

rly commented Apr 5, 2024

What would you like to see added to HDMF?

Use case 1: HED tags are strings that can contain multiple keys, separated by commas, in any order. A DynamicTable may have a column of HED tags. We want to associate these keys with persistent identifiers in the HED schema, but I'm not 100% sure that is necessary. HED already provides tools for processing the HED tags and linking them to the HED schema.
Use case 2: HDMF-ML permits the storage of a PyTorch model output as a long text field. We want to be able to annotate terms within this output with the AI Ontology. A similar hypothetical use case is if a user wants to store text from a scientific paper, device configuration file, or software output in HDMF and associate terms within these strings to external resources.

A single string may not be the ideal representation for these data, but sometimes that is what we have to work with.

In use case 1, the key can be anywhere in any string in the one-dimensional VectorData.
In use case 2, we want to annotate a particular substring of a scalar text field, since the same substring may appear multiple times with different meanings (rare), so it would be important to store the starting index of the substring.
These probably require different solutions.

It may also be useful to have a way to refer to substrings in general for annotation, like DynamicTableRegion for row slicing of tables and TimeIntervals for annotating time series in time.

I'm open to ideas. Just wanted to start a discussion.

What solution would you like?

^

Do you have any interest in helping implement the feature?

Yes.

@rly rly added category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s) labels Apr 5, 2024
@mavaylon1
Copy link
Contributor

Focusing on case 2, what do is mean to store a pytorch model output as a long textfield? If I had a model that does semantic segmentation and I predicted a segmented image. The matrix is stored as a string?

@mavaylon1 mavaylon1 added this to the Future milestone May 13, 2024
@VisLab
Copy link

VisLab commented Jul 8, 2024

@rly with the release of HED version 8.3.0, HED now has persistent identifiers for each HED tag (and auxiliary items such as unit classes etc.). HED now has an associated Ontology (see https://bioportal.bioontology.org/ontologies/HED).

Is there any more documentation on the roadmap for HERD and the needed support?

@mavaylon1
Copy link
Contributor

mavaylon1 commented Jul 8, 2024

@VisLab Hi there. As the main developer of HERD, the next planned stage is a continuation of user facing tools to more easily automate term validation and HERD population when writing the file.

We do have some ideas that have not been formalized in a community facing roadmap that are beyond user facing tools.

That being said, the team and I are more than happy to discuss expanding HERD. I can talk with the team next week, and then get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

3 participants