Changes to the spec and schemata for RFC-2 #242

normanrz · 2024-05-31T16:11:17Z

This is the companion PR for RFC-2 which adds the changes to the spec document, json schemata, examples and test files. The PR is meant to support the review process of RFC-2 by providing the specifics.

Again, a brief summary of the main changes:

Zarr v3 is used for OME-Zarr that includes that all metadata moves from .zattrs to zarr.json
The OME-Zarr metadata will live under the ome key in the attributes of the zarr.json files
Renaming the spec doc to OME-Zarr

github-actions · 2024-05-31T16:11:28Z

Automated Review URLs

latest/index.bs

will-moore · 2024-06-03T10:53:44Z

latest/index.bs

@@ -24,19 +24,13 @@ Status Text: will be provided between numbered versions. Data written with these
 Status Text: (an "editor's draft") will not necessarily be supported.


Row above: Status Text: <a href="../0.4/index.html">0.4</a>. looks like it needs manual update to 0.5?

Same for line 612: This edition of the specification is [https://ngff.openmicroscopy.org/0.4/](https://ngff.openmicroscopy.org/0.4/]).

latest/index.bs

will-moore · 2024-06-03T11:36:36Z

latest/index.bs

    ├── A                     # First row of the plate
-    │   ├── .zgroup
+    │   ├── zarr.json


Does this zarr.json need to be present? I'm not entirely clear from reading the Zarr v3 spec whether you are allowed to have empty directories? (or if the rules are different from Zarr v2 with .zgroup)?

If you aren't allowed empty directories, then does this need a change/clarification to the labels section above where we have:

Intermediate folders are permitted but not necessary and currently contain no extra metadata

Do we need zarr.json to be shown within the original directory?

There is an ongoing discussion about "implicit" groups. I think the community is leaning towards disallowing these, i.e. requiring zarr.json files for intermediate folders.

will-moore · 2024-06-03T12:20:06Z

Unchanged in this PR but index.bs still has:

Each "multiscales" dictionary SHOULD contain the field "name". It MUST contain the field "version", which indicates the version of the multiscale metadata of this image (current version is [NGFFVERSION]).

Even the example below that text doesn't contain version. Also it seems that version is no-longer needed since that's provided by the https://ngff.openmicroscopy.org/0.5 key?

latest/index.bs

imagesc-bot · 2024-06-10T16:18:28Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/bioformats2raw-removing-resolution-index-from-zarr-hierarchy-seems-to-forfeit-some-metadata/97347/4

will-moore · 2024-06-12T08:49:50Z

In looking to implement support for reading the proposed V0.5 data (in ome-ngff-validator), I am finding the usage of the versioned key https://ngff.openmicroscopy.org/0.5 is a bit painful when you want to get_version() since I have to iterate through a list of potential versions to check if the key exists.
This means that I will always have to update the code to support new versions, instead of retrieving the version and using that to automatically pick the correct schema to validate against.

So I find that I am in agreement with various comments on RFC-2 about the concerns of using a version string as a key.

d-v-b · 2024-06-12T08:52:42Z

Instead of using a URL-with-a-version-inside as a key, I think it would be better to pick a name like "ome" or "ome-ngff" as the key for an object, and have a version field in that object, and a schema_url field in that object. much clearer, and it would allow parsers to check the version of the metadata without knowing the version beforehand.

normanrz · 2024-07-02T15:48:00Z

I updated this PR for the RFC-2 revision. The namespace key is now ome and there is a separate version attribute.

will-moore · 2024-07-04T09:56:04Z

Working with these schemas and those from @d-v-b's dev1 branch e.g. https://github.com/ome/ngff/blob/7da3d7bbd7c49db29b44e54a6bf5fd7e1387f100/0.5-dev1/schemas/image.schema in the ome-ngff-validator, I noticed that in this PR, the schemas include the attributes (and ome), so that you can validate the raw zarr.json against the schema, whereas in the dev1 branch, the attributes were not included in the schemas, so you needed to validate against the contents of the attributes key. This approach may have been chosen to reduce the number of changes in going from zarr v2 -> v3.

I don't know which approach is most useful to the community, given the various tools that might want to consume these schemas? Is it most useful to be able to validate against a whole zarr.json file or against the root.attrs as loaded in hand via zarr-python?
In the case of ome-ngff-validator I'm happy to use either approach, so I just wanted to flag it up for discussion in case others have strong views?

rfc/2/index.md

normanrz · 2024-07-08T19:24:54Z

Is there a json schema for the base zarr.json where we could plug in the OME-Zarr metadata schema? cc @d-v-b

d-v-b · 2024-07-08T19:28:14Z

Is there a json schema for the base zarr.json where we could plug in the OME-Zarr metadata schema? cc @d-v-b

I'm not aware of one, but we should a) make one b) include it with the zarr v3 spec. Were I to work on this today, I would start by fixing up the rather meager v3 support in pydantic-zarr, and then use that to generate the schema. But any way of generating such a schema is valid.

LDeakin · 2024-07-10T23:40:02Z

latest/index.bs

-If part of [[#multiscale-md]], the length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data.
+The "axes" are used as part of [[#multiscale-md]]. The length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data.
+
+The "dimension_names" attribute in the `zarr.json` of the Zarr array of a multiscale level MUST match the names in the "axes" metadata.


"dimension_names" are redundant in an OME-Zarr multiscale image, so must they be mandatory? Perhaps this restriction could be relaxed to something like:

If the "dimension_names" attribute is specified in the zarr.json of the Zarr array of a multilscale level, it SHOULD match the names in the "axes" metadata.

This will enable arrays with undesirable/non-descriptive/missing dimension names to be used in an OME-Zarr hierarchy without any array metadata changes.

I am open to discuss this. Weakening this restriction could cause conflicts between the array metadata and the OME-Zarr metadata. We would need to define a precedence order.
I wonder what the circumstances would be that you can add the OME-Zarr metadata on the group level, but cannot adjust the array metadata to match the "dimensions_names" attribute?

I can always work around this, so it is not essential.

I have legacy multiscale arrays that have been converted from NetCDF that I would prefer to keep immutable and a one-to-one mapping to their source. I want to slap OME-Zarr on top with sensible axes names (e.g. z, y, x). However, the dimension_names of the underlying arrays may encode other information and be inconsistent between scales (e.g. segmented_x_1.23_um).

The 0.4 spec has many restrictions on the underlying arrays (consecutively numbered groups, the order of dimensions, nested directory layout, etc.) that have since been addressed by this RFC and RFC-3. As far as I can see¹, the dimension_names restriction introduced in this RFC is the only remaining restriction that could make arrays incompatible with OME-Zarr metadata (aside from requiring the length of axes to match the number of dimensions of the arrays, which is necessary).

Footnotes

I've only read the spec for multiscales and dependent metadata ↩

@d-v-b What do you think about this? I believe you advocated for strictly keeping dimension_names in sync.

I think they should be kept in sync, because the alternative is confusing -- how should clients interpret variance between axes and dimension_names? And if clients are supposed to just ignore dimension_names, then why did we add it to the zarr v3 spec in the first place?

joshmoore · 2024-09-13T09:47:50Z

The three approving reviews of RFC-2 have now been merged: #261

Minor changes here to address the above discussions, #259 and any issues of versioning, etc. are welcome.

…tributes' within a zarr.json

normanrz · 2024-09-27T11:33:04Z

I changed the JSON schema files to use the attributes as a root instead of the root of the zarr.json. I think that composes better because we don't have to redefine the Zarr core metadata in our schema.

I also added schema_url as a new property but removed it again because it caused issues with the ome-ngff-challenge. We should discuss whether to add schema_url to the OME-Zarr metadata. Personally, I am not convinced that this is necessary because it is trivial to look up the schema in this repository based on the version attribute. That makes schema_url somewhat redundant and verbose.
cc @joshmoore @d-v-b

Changes to the spec and schemata for RFC-2

ff76581

will-moore reviewed Jun 1, 2024

View reviewed changes

latest/index.bs Outdated Show resolved Hide resolved

typo

81d8742

will-moore reviewed Jun 3, 2024

View reviewed changes

latest/index.bs Show resolved Hide resolved

will-moore reviewed Jun 3, 2024

View reviewed changes

latest/index.bs Outdated Show resolved Hide resolved

will-moore mentioned this pull request Jun 13, 2024

V05 zarr v3 ome/ome-ngff-validator#34

Closed

normanrz mentioned this pull request Jun 14, 2024

Support zarr v3 #249

Open

normanrz added 2 commits June 28, 2024 14:28

change version in link

00371a3

adapt spec to RFC-2 revision

ee4ce14

normanrz mentioned this pull request Jul 2, 2024

RFC-2 revision 1 #250

Merged

will-moore mentioned this pull request Jul 3, 2024

V05 dev2 ome/ome-ngff-validator#36

Open

3 tasks

will-moore reviewed Jul 4, 2024

View reviewed changes

rfc/2/index.md Outdated Show resolved Hide resolved

Update rfc/2/index.md

f1bab0c

LDeakin reviewed Jul 10, 2024

View reviewed changes

normanrz mentioned this pull request Jul 18, 2024

Add zarr3 streaming scalableminds/webknossos#7933

Closed

normanrz added 2 commits September 26, 2024 15:43

adds schema_url + changes the root object for the JSON schemas to 'at…

4e4bf5a

…tributes' within a zarr.json

rm schema_url

4df0940

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to the spec and schemata for RFC-2 #242

Changes to the spec and schemata for RFC-2 #242

normanrz commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024 •

edited

Loading

will-moore Jun 3, 2024

will-moore Jun 3, 2024

normanrz Jun 3, 2024

will-moore commented Jun 3, 2024

imagesc-bot commented Jun 10, 2024

will-moore commented Jun 12, 2024

d-v-b commented Jun 12, 2024

normanrz commented Jul 2, 2024

will-moore commented Jul 4, 2024

normanrz commented Jul 8, 2024

d-v-b commented Jul 8, 2024

LDeakin Jul 10, 2024

normanrz Jul 11, 2024

LDeakin Jul 11, 2024

LDeakin Jul 11, 2024

normanrz Jul 22, 2024

d-v-b Jul 22, 2024

joshmoore commented Sep 13, 2024

normanrz commented Sep 27, 2024

		@@ -24,19 +24,13 @@ Status Text: will be provided between numbered versions. Data written with these
		Status Text: (an "editor's draft") will not necessarily be supported.

Changes to the spec and schemata for RFC-2 #242

Are you sure you want to change the base?

Changes to the spec and schemata for RFC-2 #242

Conversation

normanrz commented May 31, 2024 • edited Loading

github-actions bot commented May 31, 2024 • edited Loading

Automated Review URLs

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

will-moore commented Jun 3, 2024

imagesc-bot commented Jun 10, 2024

will-moore commented Jun 12, 2024

d-v-b commented Jun 12, 2024

normanrz commented Jul 2, 2024

will-moore commented Jul 4, 2024

normanrz commented Jul 8, 2024

d-v-b commented Jul 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshmoore commented Sep 13, 2024

normanrz commented Sep 27, 2024

normanrz commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024 •

edited

Loading