Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to the spec and schemata for RFC-2 #242

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

normanrz
Copy link
Contributor

@normanrz normanrz commented May 31, 2024

This is the companion PR for RFC-2 which adds the changes to the spec document, json schemata, examples and test files. The PR is meant to support the review process of RFC-2 by providing the specifics.

Again, a brief summary of the main changes:

  • Zarr v3 is used for OME-Zarr that includes that all metadata moves from .zattrs to zarr.json
  • The OME-Zarr metadata will live under the ome key in the attributes of the zarr.json files
  • Renaming the spec doc to OME-Zarr

Copy link
Contributor

github-actions bot commented May 31, 2024

Automated Review URLs

latest/index.bs Outdated Show resolved Hide resolved
@@ -24,19 +24,13 @@ Status Text: will be provided between numbered versions. Data written with these
Status Text: (an "editor's draft") will not necessarily be supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Row above: Status Text: <a href="../0.4/index.html">0.4</a>. looks like it needs manual update to 0.5?

Same for line 612: This edition of the specification is [https://ngff.openmicroscopy.org/0.4/](https://ngff.openmicroscopy.org/0.4/]).

├── A # First row of the plate
│ ├── .zgroup
│ ├── zarr.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this zarr.json need to be present? I'm not entirely clear from reading the Zarr v3 spec whether you are allowed to have empty directories? (or if the rules are different from Zarr v2 with .zgroup)?

If you aren't allowed empty directories, then does this need a change/clarification to the labels section above where we have:

Intermediate folders are permitted but not necessary and currently contain no extra metadata

Do we need zarr.json to be shown within the original directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an ongoing discussion about "implicit" groups. I think the community is leaning towards disallowing these, i.e. requiring zarr.json files for intermediate folders.

@will-moore
Copy link
Member

Unchanged in this PR but index.bs still has:

Each "multiscales" dictionary SHOULD contain the field "name". It MUST contain the field "version", which indicates the version of the multiscale metadata of this image (current version is [NGFFVERSION]).

Even the example below that text doesn't contain version. Also it seems that version is no-longer needed since that's provided by the https://ngff.openmicroscopy.org/0.5 key?

latest/index.bs Outdated Show resolved Hide resolved
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/bioformats2raw-removing-resolution-index-from-zarr-hierarchy-seems-to-forfeit-some-metadata/97347/4

@will-moore
Copy link
Member

In looking to implement support for reading the proposed V0.5 data (in ome-ngff-validator), I am finding the usage of the versioned key https://ngff.openmicroscopy.org/0.5 is a bit painful when you want to get_version() since I have to iterate through a list of potential versions to check if the key exists.
This means that I will always have to update the code to support new versions, instead of retrieving the version and using that to automatically pick the correct schema to validate against.

So I find that I am in agreement with various comments on RFC-2 about the concerns of using a version string as a key.

@d-v-b
Copy link
Contributor

d-v-b commented Jun 12, 2024

Instead of using a URL-with-a-version-inside as a key, I think it would be better to pick a name like "ome" or "ome-ngff" as the key for an object, and have a version field in that object, and a schema_url field in that object. much clearer, and it would allow parsers to check the version of the metadata without knowing the version beforehand.

@normanrz normanrz mentioned this pull request Jul 2, 2024
@normanrz
Copy link
Contributor Author

normanrz commented Jul 2, 2024

I updated this PR for the RFC-2 revision. The namespace key is now ome and there is a separate version attribute.

@will-moore will-moore mentioned this pull request Jul 3, 2024
3 tasks
@will-moore
Copy link
Member

Working with these schemas and those from @d-v-b's dev1 branch e.g. https://github.com/ome/ngff/blob/7da3d7bbd7c49db29b44e54a6bf5fd7e1387f100/0.5-dev1/schemas/image.schema in the ome-ngff-validator, I noticed that in this PR, the schemas include the attributes (and ome), so that you can validate the raw zarr.json against the schema, whereas in the dev1 branch, the attributes were not included in the schemas, so you needed to validate against the contents of the attributes key. This approach may have been chosen to reduce the number of changes in going from zarr v2 -> v3.

I don't know which approach is most useful to the community, given the various tools that might want to consume these schemas? Is it most useful to be able to validate against a whole zarr.json file or against the root.attrs as loaded in hand via zarr-python?
In the case of ome-ngff-validator I'm happy to use either approach, so I just wanted to flag it up for discussion in case others have strong views?

rfc/2/index.md Outdated Show resolved Hide resolved
@normanrz
Copy link
Contributor Author

normanrz commented Jul 8, 2024

Is there a json schema for the base zarr.json where we could plug in the OME-Zarr metadata schema? cc @d-v-b

@d-v-b
Copy link
Contributor

d-v-b commented Jul 8, 2024

Is there a json schema for the base zarr.json where we could plug in the OME-Zarr metadata schema? cc @d-v-b

I'm not aware of one, but we should a) make one b) include it with the zarr v3 spec. Were I to work on this today, I would start by fixing up the rather meager v3 support in pydantic-zarr, and then use that to generate the schema. But any way of generating such a schema is valid.

If part of [[#multiscale-md]], the length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data.
The "axes" are used as part of [[#multiscale-md]]. The length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data.

The "dimension_names" attribute in the `zarr.json` of the Zarr array of a multiscale level MUST match the names in the "axes" metadata.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"dimension_names" are redundant in an OME-Zarr multiscale image, so must they be mandatory? Perhaps this restriction could be relaxed to something like:

If the "dimension_names" attribute is specified in the zarr.json of the Zarr array of a multilscale level, it SHOULD match the names in the "axes" metadata.

This will enable arrays with undesirable/non-descriptive/missing dimension names to be used in an OME-Zarr hierarchy without any array metadata changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to discuss this. Weakening this restriction could cause conflicts between the array metadata and the OME-Zarr metadata. We would need to define a precedence order.
I wonder what the circumstances would be that you can add the OME-Zarr metadata on the group level, but cannot adjust the array metadata to match the "dimensions_names" attribute?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can always work around this, so it is not essential.

I have legacy multiscale arrays that have been converted from NetCDF that I would prefer to keep immutable and a one-to-one mapping to their source. I want to slap OME-Zarr on top with sensible axes names (e.g. z, y, x). However, the dimension_names of the underlying arrays may encode other information and be inconsistent between scales (e.g. segmented_x_1.23_um).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.4 spec has many restrictions on the underlying arrays (consecutively numbered groups, the order of dimensions, nested directory layout, etc.) that have since been addressed by this RFC and RFC-3. As far as I can see1, the dimension_names restriction introduced in this RFC is the only remaining restriction that could make arrays incompatible with OME-Zarr metadata (aside from requiring the length of axes to match the number of dimensions of the arrays, which is necessary).

Footnotes

  1. I've only read the spec for multiscales and dependent metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-v-b What do you think about this? I believe you advocated for strictly keeping dimension_names in sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they should be kept in sync, because the alternative is confusing -- how should clients interpret variance between axes and dimension_names? And if clients are supposed to just ignore dimension_names, then why did we add it to the zarr v3 spec in the first place?

@joshmoore
Copy link
Member

The three approving reviews of RFC-2 have now been merged: #261

Minor changes here to address the above discussions, #259 and any issues of versioning, etc. are welcome.

@normanrz
Copy link
Contributor Author

I changed the JSON schema files to use the attributes as a root instead of the root of the zarr.json. I think that composes better because we don't have to redefine the Zarr core metadata in our schema.

I also added schema_url as a new property but removed it again because it caused issues with the ome-ngff-challenge. We should discuss whether to add schema_url to the OME-Zarr metadata. Personally, I am not convinced that this is necessary because it is trivial to look up the schema in this repository based on the version attribute. That makes schema_url somewhat redundant and verbose.
cc @joshmoore @d-v-b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants