Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise ZEP0 #59

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

MSanKeys963
Copy link
Member

@MSanKeys963 MSanKeys963 commented Mar 4, 2024

Hi all. 👋🏻

This PR was last updated on: 10/15/2024

I'm proposing changes to our ZEP process, i.e. ZEP0, based on the discussions at #44, #52 and #55. Here are the major changes I'm proposing through this PR:

  1. Addition of new ZEP type under Specification ZEPs category → Protocol Extension ZEPs → Removed in 3ae1fc5. Please check Revise ZEP0 #59 (comment).

Until now, we have only been using Core ZEPs to make changes to the specification. Extension ZEPs would help define and clearly add new extensions to the specification via the extension mechanism.

  1. Adding a new ZEP category → Lean ZEPs → Removed in 81b7ca2. Please check Revise ZEP0 #59 (comment).

This category is suitable for adding significantly small and straightforward changes to the Zarr specification directly via a pull request.

  1. The process of submitting a ZEP

Until now, if anyone wishes to submit a ZEP proposal, they'd need to create two PRs, first in zeps repository and the other in zarr-specs repository. I've proposed simplifying this process by combining the zeps and zarr-specs repositories. Now, the author could submit only one PR to submit a ZEP proposal for the Zarr specification.

  1. Discussing a ZEP

As pointed out in #55, discussing a ZEP proposal in multiple issues and PR fragments the conversation, leading to a loss of motivation and momentum. I've simplified this and proposed that all the discussions occur in a single PR. I've also kept an option to open separate issues if the conversation becomes too long and hard to follow, but this depends on the need and the author's choice. But please keep in mind that all the conversation and issues will happen in zarr-specs repository.

  1. How does a ZEP become accepted?

This is mostly formalising the various phases I've outlined in #55 (comment). Until now, we have been using a voting mechanism to decide on the fate of the ZEP proposals. Over the past couple of years, we've realised we need more involved participation from the stakeholders. I've proposed to add a couple of phases before we start voting on the ZEPs.

  1. ZEP Format

I've proposed that the ZEP be in .rst instead of .md format, similar to the specification text format.

There is also a question regarding the versioning of the ZEP proposal, which are modified after acceptance, for example ZEP0. There's a suggestion of using semantic versioning by @LDeakin in #55 (comment), which is something we can discuss and adopt.
PS. If we adopt semantic versioning, this update would be ZEP0.1. :)

Many thanks to @joshmoore, @rabernat, @jbms and others for the initial discussions regarding the ZEP0 revision.

Also, I've been fiddling around to merge the zeps and zarr-specs repository and have come up with: https://docs-test-sanket.readthedocs.io/en/latest/. I'd appreciate any feedback or comments on the new webpage design.

Please let me know what you think of this proposal. Also, please send this to folks interested in this conversation.

Preview: https://zeps--59.org.readthedocs.build/en/59/active/ZEP0000.html

CC: @zarr-developers/implementation-council @zarr-developers/steering-council

functionality, data types and features to accommodate emerging and essential
use cases.

- **Lean ZEP**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example for a Lean ZEP? Where would adding a codec fall?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment. In particular, how will an author choose between an Extension ZEP and an Lean ZEP?

Copy link
Member Author

@MSanKeys963 MSanKeys963 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #59 (comment).


The PR should contain the narrative document explaining the need, importance,
and use case for the change of the ZEP and should be submitted in the
zarr-specs repository with the name `zep-<n>: <title>.rst` under
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any fans of RST here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to move to markdown.


In unusual cases, the [Zarr Steering Council] may be asked to decide whether a
controversial ZEP is `Accepted`.
1. **Stakeholder Engagement**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reader might ask: Who are the stakeholders and how do I contact them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly the ZIC and ZSC but also the broader community.
The author is supposed to use the submitted PR to spread the word to everyone, but we also have Google groups set up for notification purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to defining stakeholders clearly.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Sanket! I think the introduction of the "lean ZEP" is a major improvement in process. Having all the changes and discussion in a single-threaded PR will make it much easier to discuss and review ZEPs.

I see no reason to limit the lean ZEP process to only minor changes. If anything, the more significant the proposal, the more important it is to be able to review and discuss clearly. I predict that, if we adopt this, nearly all future ZEPs will be declared as "lean ZEPs" mostly because it is easier for everyone involved. Who will be the arbiter of whether a ZEP is minor enough to qualify for a Lean ZEP? What will happen if someone submits a Lean ZEP which some people feel is scoped too large? Who will enforce turning it into a regular (old-style) ZEP?

So I would like the see the "lean ZEP" process simply become the process for all ZEPs, rather than the more complicated multiple PR / multiple repo process. I won't insist on that if there are others who think maintaining the original process is important. However, I'm curious who that is. Please 👍 this comment if you are in favor of simply making the "lean ZEP" process the default for all ZEPs. Or 👎 this comment if you would like to keep the multiple repo / multiple PR process for some ZEPs.

functionality, data types and features to accommodate emerging and essential
use cases.

- **Lean ZEP**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment. In particular, how will an author choose between an Extension ZEP and an Lean ZEP?


The PR should contain the narrative document explaining the need, importance,
and use case for the change of the ZEP and should be submitted in the
zarr-specs repository with the name `zep-<n>: <title>.rst` under
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to move to markdown.

The PR should contain the specific changes to the Zarr specification without
requiring a separate narrative document. As Lean ZEPs involve minor and
non-controversial changes, a narrative document and the use of the template and
instructions file are not necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a great way to do ZEPs. Everything in one place! 🎉

@normanrz
Copy link
Contributor

normanrz commented Mar 5, 2024

I agree with @rabernat. Lean ZEPs sound great!

My learning from the last ZEPs was that we need to engage the stakeholders earlier in the process and not only when presenting them with a final vote. Having the feedback earlier increases buy-in and reduces the number of post-vote changes. I don't think the solution to that problem is more phases and more votes in the process. On the contrary, I think we need to make the conversation easier to follow and actively reach out to ZIC and ZSC. That is why the Lean ZEP approach is appealing. It simplifies the process/paperwork and streamlines the conversation.
In any case, the ZEP process only works with the expectation that members of the ZIC and ZSC bring enough time and attention to follow the conversations.

I still think we need a mechanism of forcing a vote, though. Consensus seems ill-defined in a distributed, volunteer-driven community.

@normanrz
Copy link
Contributor

normanrz commented Mar 5, 2024

There is also a question regarding the versioning of the ZEP proposal, which are modified after acceptance, for example ZEP0. There's a suggestion of using semantic versioning by @LDeakin in #55 (comment), which is something we can discuss and adopt.
PS. If we adopt semantic versioning, this update would be ZEP0.1. :)

In the spirit of copying PRs, I would advocate for keeping linear ZEP numbering. The ZEP process is its own artifact that gets created and modified through ZEPs. ZEP 0 introduced the process. This proposed modification should become its own ZEP.

The wish for ZEP versioning almost seems like an artifact of a too complex ZEP process. If it were as easy to make and approve ZEPs like a PR, we could just introduce changes like the sharding "index_location" in a new ZEP.

@jbms
Copy link

jbms commented Mar 5, 2024

Regarding versioning, as I see it, a "ZEP" would just be the additional process taken for reviewing a non-editorial/non-trivial spec change (i.e. PR). Therefore, there would be no reason to version a "ZEP" except perhaps in its draft state, where you could potentially identify versions by git commit hashes, because once it is accepted it is merged into the spec and any further revisions would be done as a separate PR. In fact I would propose that the ZEP number be assigned based on the PR number.

For example, let's say we add a "pack_bits" codec in PR 1234, that initially only supports "bool" data type. Then someone in PR 1255 adds support for a "uint4" data type and modifies the description of the "pack_bits" codec to also support "uint4". When we document what features are supported by each implementation, if not already explicit we could at this point update the descriptions to indicate that certain implementations support pack_bits with bool only while others support both bool and uint4.

It might be useful to define as part of the spec a precise way to indicate support for various features. For example we could say that support for a given data type can be indicated by the "codecs/pack_bits/<data_type>" feature identifier. I'm not sure how important it is to specify these feature flags precisely, though.

@normanrz
Copy link
Contributor

normanrz commented Mar 5, 2024

I think it is important to separate support for a feature of the Zarr specification in an implemention from the ZEP that introduced the feature.

It might be useful to define as part of the spec a precise way to indicate support for various features. For example we could say that support for a given data type can be indicated by the "codecs/pack_bits/<data_type>" feature identifier. I'm not sure how important it is to specify these feature flags precisely, though.

I think an implementation needs to be able to decide during metadata parsing whether it can correctly understand a Zarr array/group. If it encounters unknown attributes, values or identifiers that would already indicate a lack of support. Basically, that means that new changes can only be additive (e.g. new attributes on existing codecs, new values for attributes, new codecs).

@jbms
Copy link

jbms commented Mar 5, 2024

I think it is important to separate support for a feature of the Zarr specification in an implemention from the ZEP that introduced the feature.

It might be useful to define as part of the spec a precise way to indicate support for various features. For example we could say that support for a given data type can be indicated by the "codecs/pack_bits/<data_type>" feature identifier. I'm not sure how important it is to specify these feature flags precisely, though.

I think an implementation needs to be able to decide during metadata parsing whether it can correctly understand a Zarr array/group. If it encounters unknown attributes, values or identifiers that would already indicate a lack of support. Basically, that means that new changes can only be additive (e.g. new attributes on existing codecs, new values for attributes, new codecs).

To be clear, I was imagining that these feature identifiers would be solely for documentation purposes, similar to caniuse.com, not that they would appear in the metadata anywhere.

@MSanKeys963
Copy link
Member Author

Thanks all for the feedback so far.

Here are my responses:

Who will be the arbiter of whether a ZEP is minor enough to qualify for a Lean ZEP? What will happen if someone submits a Lean ZEP which some people feel is scoped too large? Who will enforce turning it into a regular (old-style) ZEP?

@rabernat: The changes which are minor and non-controversial would fall under the Lean ZEP category, e.g. zarr-developers/zarr-specs#256 or adding a new data type. The ZEP, which proposes a significant change to the specification and requires a reference implementation of the sort, should not be submitted under Lean ZEP, e.g. major specification version update (V3 → V4), extensions like sharding, variable chunking, etc.
In the ZEP workflow section, the author has various options to ask around if their changes are minor enough to be considered under a Lean ZEP.

Categorising all ZEPs under Lean might lead to less involved participation in an important proposal from everyone involved. The reading and implementation phases ensure that the stakeholders are given enough time to read and work on implementing the proposal before it is finalised. I've outlined various issues we've encountered here: #55 (comment).

I've also proposed removing the option for multiple repo/PR processes so that everything would take place in the zarr-specs repository.

I don't think the solution to that problem is more phases and more votes in the process.

@normanrz: Why do you think a multi-phase approach would not help have more engaged participation from the stakeholders?

@normanrz
Copy link
Contributor

e.g. zarr-developers/zarr-specs#256 or adding a new data type

Adding data types or codecs can be controversial, too. I don't think it is possible to know a priori. That is why a community discussion is needed for every ZEP. PRs are a great tool for that.

requires a reference implementation of the sort

Shouldn't all ZEPs come with a reference implementation or at least a proof-of-concept?

I don't think the solution to that problem is more phases and more votes in the process.

@normanrz: Why do you think a multi-phase approach would not help have more engaged participation from the stakeholders?

Everybody in the Zarr community is volunteering and very busy. Multi-phase processes require a lot of effort to understand and to follow. I might be wrong but I think that more complex processes can be intimidating and annoying both on the side of the ZEP authors, who need to chase the feedback/votes, and on the side of the community members, who are frequently pressured to issue feedback/votes.

@MSanKeys963
Copy link
Member Author

Resurrecting this PR after having discussions and feedback from various community and ZEP meetings. I have extended the scope of Lean ZEPs to add breaking changes if there is unanimous approval from participating stakeholders. This change is specifically targeted to include zarr-developers/zarr-specs#292 and other similar changes in the future.

@normanrz, here are my responses to your last comment —

Adding data types or codecs can be controversial, too. I don't think it is possible to know a priori. That is why a community discussion is needed for every ZEP. PRs are a great tool for that.

It can be. That's why it's recommended and preferred that an issue in the zarr-specs repo be created to have a community discussion before working on the actual PR for a ZEP; see here - 3rd para.

Shouldn't all ZEPs come with a reference implementation or at least a proof-of-concept?

ZEPs which add new functionality by changing or extending the existing specification should be accompanied by an implementation/POC.
Other ZEPs like Lean, which includes minor changes (addition or removal of non-controversial changes) or Informational or Process, don't require a POC.

complex processes can be intimidating and annoying both on the side of the ZEP authors, who need to chase the feedback/votes, and on the side of the community members, who are frequently pressured to issue feedback/votes.

I understand that the phased approach requires involved participation from the active maintainers and implementors. I've outlined issues we've faced with the existing version of the ZEP process at #55 (comment) during ZEP1. We need a mechanism to avoid repeating those issues. I think the phased approach is one solution that could help us. If you have suggestions, let's discuss them. I'm all ears. :)

Also, having all the discussions and comments in a single PR for the entirety of a ZEP proposal would be convenient for the author to facilitate feedback and votes from everyone involved.

The SPEC ZEP (abbreviation of Specification ZEP) for now introduces changes in
the core specification of Zarr. The changes related to core specification follow a
Specification ZEPs deal with the changes related to [Zarr Specifications]. The
SPEC ZEP (abbreviation of Specification ZEP) introduces changes to the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless "SPEC" is an acronym, it should not be all caps.

strict and thorough review process and should be adopted by everyone in the
Zarr community.
Zarr community. Specification ZEPs are of two types. They are:

- **Core protocol ZEP**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between a specification zep and a core protocol zep?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see, a core protocol zep is a subtype of a specification zep. What does the word "protocol" mean here? Are we just using that word to avoid re-using the word "specification"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'protocol' terminology was coined when we first introduced the idea of the ZEP process back in 2022.

Are we just using that word to avoid re-using the word "specification"?

I'm not sure about that. IIRC, the term 'protocol' was used heavily during V3 development in early 2019. I have no strong opinions about 'Protocol'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+1 to eliminating Protocol and sticking with Specification.

@normanrz
Copy link
Contributor

Thanks for addressing some of our feedback and revising the ZEP.

I think the lean ZEPs help to move the process in the right direction. However, a main problem we are currently facing is lack of (timely) responses from the council members. Without solving that problem, I don't think we will see significant improvements to the process. So, I think it is important to also move zarr-developers/governance#44 forward.

@d-v-b
Copy link

d-v-b commented Aug 26, 2024

It seems like a "lean ZEP" as defined in this PR is a spec change that gets approved without a formal ZEP process, and so I would argue that a "lean ZEP" is is not a ZEP, or at least it communicates very little to categorize such changes as ZEPs.

Instead of requiring that every spec change fall into some ZEP category, It would be simpler to reserve the concept of "ZEP" for formal proposals that do observe a ZEP process. Instead of defining a "lean ZEP", this document can simply state that not all spec changes happen via ZEPs -- some spec changes, like typo fixes, or even semantically significant changes, can be proposed, discussed, and accepted via github PRs provided there is an exceptional level of consensus, bypassing the ZEP process completely.

@normanrz wrote:

I think the lean ZEPs help to move the process in the right direction. However, a main problem we are currently facing is lack of (timely) responses from the council members. Without solving that problem, I don't think we will see significant improvements to the process. So, I think it is important to also move zarr-developers/governance#44 forward.

I agree with this completely. We are not bottlenecked by the lack of new ZEP categories (although it's important to define those carefully). We are bottlenecked because there is not enough developer time going into the spec.

@jhamman
Copy link
Member

jhamman commented Aug 26, 2024

We are not bottlenecked by the lack of new ZEP categories (although it's important to define those carefully).

Extrapolating on this a bit, let me introduce a new idea here. Rather than adding additional ZEP categories, we do away with all but core-protocol ZEPs, reserving these for major and/or breaking changes to the specification. All other changes (e.g. bug fixes, clarifications) and/or extensions to the specification can be done as pull requests directly to the spec repo and merged through a much simpler process (ideally managed entirely by the spec core team -- not the ZEP process).

@joshmoore
Copy link
Member

I think the primary additions I'd make to your comment, @jhamman, would be:

  • We could probably use a pretty good common definition, or at least understanding, of what a breaking specification change is, including from provisional versions. Ultimately, this might come down to a question of versioning.
  • For extensions, I think it's important to separate between the implenentation of extensions (adding a codec ✅) and their definition. The latter is designing an interface and I think having wider feedback on those would be import for the health of the ecosystem (cf. Added consolidated metadata to spec zarr-specs#309 (comment))

@MSanKeys963
Copy link
Member Author

Thanks, everyone, for your feedback.

I understand the importance of adding changes to the spec via GitHub PRs. I'm in favour, but I'd like to keep the Lean ZEPs category in the process, as it'll help us maintain a record of changes that have gone in the spec in a readable manner — the original idea I proposed in this PR to combine zarr-specs and ZEP repository and maintain a single website for both of them (sample here). The webpage would host and display a list of Lean ZEPs (which would essentially grab text from the PRs, i.e. lean ZEPs).

Let me know what you think. ^


The discussion at zarr-developers/governance#44 is important, and I like what Josh proposed in zarr-developers/governance#44 (comment).

I'd like to propose merging this PR unless strong objections exist to this current version. Successive PRs can be added for more changes or to reflect the progress/conclusion from zarr-developers/governance#44.

active/ZEP0000.md Outdated Show resolved Hide resolved
active/ZEP0000.md Outdated Show resolved Hide resolved
active/ZEP0000.md Outdated Show resolved Hide resolved
@d-v-b
Copy link

d-v-b commented Sep 20, 2024

I'd like to propose merging this PR unless strong objections exist to this current version.

I think @jhamman's proposal to slim down the ZEP process counts for me as a strong objection. We should at least discuss that as a viable alternative to this PR.

@MSanKeys963 if the main goal is to record all spec changes on the zeps web page, then this can be done without introducing a new ZEP category -- we could display ZEPs and PRs together on that site, with a clear indication of which is which.

@MSanKeys963
Copy link
Member Author

MSanKeys963 commented Oct 9, 2024

As per our @jhamman's #59 (comment), and discussion in the last community and ZEP meeting (notes here) with @joshmoore and @d-v-b — I've removed Lean ZEPs category in 81b7ca2.

Also, see: #59 (comment).

We discussed that zarr-developers/governance#44, when finalised, would mention when the Zarr-Specs core devs think it's necessary to follow the ZEP0 process to change the specification. The definition of these changes is yet to be decided, but they're mostly going to revolve around major changes like incrementing the specification version, change that involves participation from multiple implementations, and similar things.

All other changes to the specifications would be done simply via a GitHub PRs. These PRs would be approved and merged by the Zarr-Specs core devs team (when formed).

I've added multiple comments in the markdown document like:
<!-- Mention Zarr-Spec devs in line: x-y, after zarr-developers/governance/#44 is finalised --> to change the text when the time comes.

Please review. Thanks!

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few initial comments

active/ZEP0000.md Outdated Show resolved Hide resolved
template/template.md Outdated Show resolved Hide resolved
active/ZEP0000.md Outdated Show resolved Hide resolved
MSanKeys963 and others added 2 commits October 11, 2024 01:13
@joshmoore
Copy link
Member

@rabernat, @jakirkham, @alimanfoo, @ryan-williams: are there any changes you would like to see here?

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great progress. Nice work @MSanKeys963!

I am excited by the simplifications this will bring.

My main feedback

  • Let's clarify whether Specification ZEPs generate a new spec major version
  • Let's eliminate extension ZEPs entirely

Meta question: should this change be done via a ZEP? 🤣

active/ZEP0000.md Outdated Show resolved Hide resolved
strict and thorough review process and should be adopted by everyone in the
Zarr community.
Zarr community. Specification ZEPs are of two types. They are:

- **Core protocol ZEP**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+1 to eliminating Protocol and sticking with Specification.

active/ZEP0000.md Outdated Show resolved Hide resolved
active/ZEP0000.md Outdated Show resolved Hide resolved
active/ZEP0000.md Show resolved Hide resolved

In unusual cases, the [Zarr Steering Council] may be asked to decide whether a
controversial ZEP is `Accepted`.
1. **Stakeholder Engagement**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to defining stakeholders clearly.

active/ZEP0000.md Outdated Show resolved Hide resolved
@MSanKeys963
Copy link
Member Author

MSanKeys963 commented Oct 15, 2024

Thanks for your reviews, @joshmoore and @rabernat!

As per your feedback, I've removed extension ZEPs entirely as we're working on defining a new way to add extensions — currently under discussion in zarr-developers/zarr-specs#312 (comment) and zarr-developers/zarr-specs#316.

I've also removed the SPEC ZEPs categorisation as we previously had core and extension ZEPs under the category. Now, there is only Core ZEPs.

Also, every core ZEP (removed the term 'protocol') will lead to a major version increment of the Zarr specification.


Meta question: should this change be done via a ZEP? 🤣

@rabernat: Instead of a new ZEP, let's use this PR to add the changes so we can move ahead! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants