Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebNN EP] TFLite backend only supports limit ranges for Clip #20863

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

Honry
Copy link
Contributor

@Honry Honry commented May 30, 2024

No description provided.

@Honry
Copy link
Contributor Author

Honry commented May 30, 2024

@guschmue, @fs-eire, @fdwr, PTAL, thanks!

@fs-eire
Copy link
Contributor

fs-eire commented May 30, 2024

It looks to me that the change is taking considerations of implementation details on more priority than standard specification. Although WebNN is using TFLite, it does not necessarily mean that (technically) it will always use it on every environment, and even if so, it is still possible that a future version of TFLite may support arbitrary min/max attribute of Clip/Clamp.

It is hard to control the browser version and platforms users are using, so they may use a different version. This is why a standard is important and in my understanding the code should stick to the spec as much as possible, instead of the status of a particular underlying implementation.

There are a few PRs that are doing similar and I assume it's because in Chromium WebNN's underlying CPU engine is switched from XNNPACK to TFLite. I understand that there are technical reasons to do this in order to make it E2E working for WebNN. However, in long term, it would probably be better to introduce a way to allow user to know the gap between the spec and the implementation. It can be feature detection via some WebNN API or introducing WebNN version. A developer should have a better way to know whether a node can be supported in the current WebNN environment other than knowing how the detailed status of the underlying engine, which is supposed to be transparent.

@Honry
Copy link
Contributor Author

Honry commented May 30, 2024

It looks to me that the change is taking considerations of implementation details on more priority than standard specification. Although WebNN is using TFLite, it does not necessarily mean that (technically) it will always use it on every environment, and even if so, it is still possible that a future version of TFLite may support arbitrary min/max attribute of Clip/Clamp.

It is hard to control the browser version and platforms users are using, so they may use a different version. This is why a standard is important and in my understanding the code should stick to the spec as much as possible, instead of the status of a particular underlying implementation.

There are a few PRs that are doing similar and I assume it's because in Chromium WebNN's underlying CPU engine is switched from XNNPACK to TFLite. I understand that there are technical reasons to do this in order to make it E2E working for WebNN. However, in long term, it would probably be better to introduce a way to allow user to know the gap between the spec and the implementation. It can be feature detection via some WebNN API or introducing WebNN version. A developer should have a better way to know whether a node can be supported in the current WebNN environment other than knowing how the detailed status of the underlying engine, which is supposed to be transparent.

Thanks @fs-eire, very good point! It's really painful for users at current stage, as the spec and the implementation of WebNN is still under very active evolution. @fujunwei is working on these unsupported constraints by emulating or some other methods to fill up the gaps.

We will finally close the gap, and this table will keep up-to-date to maintain the latest status of op support status and constraints.

For long term, once everything is stable and the gap is small (e.g. WebNN passes the Origin Trial in Chromium), we can note users more change info along with Chrome version. WDYT?

cc/ @huningxin, @ibelem

@fs-eire
Copy link
Contributor

fs-eire commented May 30, 2024

It looks to me that the change is taking considerations of implementation details on more priority than standard specification. Although WebNN is using TFLite, it does not necessarily mean that (technically) it will always use it on every environment, and even if so, it is still possible that a future version of TFLite may support arbitrary min/max attribute of Clip/Clamp.
It is hard to control the browser version and platforms users are using, so they may use a different version. This is why a standard is important and in my understanding the code should stick to the spec as much as possible, instead of the status of a particular underlying implementation.
There are a few PRs that are doing similar and I assume it's because in Chromium WebNN's underlying CPU engine is switched from XNNPACK to TFLite. I understand that there are technical reasons to do this in order to make it E2E working for WebNN. However, in long term, it would probably be better to introduce a way to allow user to know the gap between the spec and the implementation. It can be feature detection via some WebNN API or introducing WebNN version. A developer should have a better way to know whether a node can be supported in the current WebNN environment other than knowing how the detailed status of the underlying engine, which is supposed to be transparent.

Thanks @fs-eire, very good point! It's really painful for users at current stage, as the spec and the implementation of WebNN is still under very active evolution. @fujunwei is working on these unsupported constraints by emulating or some other methods to fill up the gaps.

We will finally close the gap, and this table will keep up-to-date to maintain the latest status of op support status and constraints.

For long term, once everything is stable and the gap is small (e.g. WebNN passes the Origin Trial in Chromium), we can note users more change info along with Chrome version. WDYT?

cc/ @huningxin, @ibelem

Thank you for the information.

Is there existing discussion or proposal about:

  • APIs that allows users to query the implementation status
  • versioning

I think the gap may always exist because everything is moving: spec will update and accept new operators (or maybe deprecate less used operators too), and implementation will upgrade too. So I am not optimistic on "everything is stable" and I think a standard way to manage the gap is more reasonable to me.

@Honry
Copy link
Contributor Author

Honry commented May 30, 2024

Is there existing discussion or proposal about:

  • APIs that allows users to query the implementation status

This issue webmachinelearning/webnn#463 is talking about exposing operators/types status for each backend.

  • versioning

I am not aware of discussion about versioning. @huningxin, do you know?

I think the gap may always exist because everything is moving: spec will update and accept new operators (or maybe deprecate less used operators too), and implementation will upgrade too. So I am not optimistic on "everything is stable" and I think a standard way to manage the gap is more reasonable to me.

You are right, I mean we can mark more change info once the gap is smaller and the change is not so frequent.

We have a WebNN Status page to maintain the op impl status for each backend with Chrome version info.

Maybe we can add Chrome and ORT-Web version info to this table in future.

@Honry
Copy link
Contributor Author

Honry commented May 30, 2024

BTW, @fs-eire, do you know how does WebGPU EP manage versioning?

@huningxin
Copy link

huningxin commented May 30, 2024

@fs-eire , thanks for your feeback!

This is why a standard is important and in my understanding the code should stick to the spec as much as possible, instead of the status of a particular underlying implementation.

+1

it is still possible that a future version of TFLite may support arbitrary min/max attribute of Clip/Clamp.

Before underlying runtime like TFLite gets that support, WebNN implementation can emulate by composition. In WebNN spec, for each operators that can be decomposed, there is an emulation sample code. For example, clamp can be emulated by min and max operators as following code sample

if (options.minValue === undefined) {
  if (options.maxValue === undefined) {
    return input;
  } else {
    return builder.min(input, builder.constant(options.maxValue));
  }
} else {
  if (options.maxValue === undefined) {
    return builder.max(input, builder.constant(options.minValue));
  } else {
    return builder.min(
        builder.max(input, builder.constant(options.minValue)),
        builder.constant(options.maxValue));
  }
}

Chromium implementation of [TFLite] Support other range for Clamp operator can refer to above sample code for decomposition. So frameworks should stick to spec for consistency.

  • APIs that allows users to query the implementation status

This issue webmachinelearning/webnn#463 is talking about exposing operators/types status for each backend.

Correct, webmachinelearning/webnn#463 is the right one for feature detection discussion.

  • versioning

I am not aware of discussion about versioning. @huningxin, do you know?

Generally, Web spec doesn't have versioning, WebNN spec follows the same principle. WebNN spec is now in CR status for browser prototyping and developer preview. This kind of interop feedback would be great input to WG. Once the spec moves into more stable status, browser implementation should stick to the latest released version and maintain the backward compatibility.

@fs-eire
Copy link
Contributor

fs-eire commented May 31, 2024

@Honry @huningxin thank you for your detailed explanation.

https://webmachinelearning.github.io/webnn-status/

This link is helpful. Perhaps I can add this into somewhere in ORT documentation

how does WebGPU EP manage versioning?

Technically, WebGPU is doing feature detection based on "capabilities". For example, the f16 support can be detected by using this API. Good to know that WebNN has this discussion: webmachinelearning/webnn#463.

@guschmue guschmue added the ep:WebNN WebNN execution provider label Jun 3, 2024
@guschmue
Copy link
Contributor

guschmue commented Jun 3, 2024

I think it is clear that the versioning for webnn needs work.
Not sure if comparing webgpu is right since webgpu operates on a lower level and the surface there is more on a feature level that can be used by the ep while webnn has N ops where each op can be subject to changes between versions.

@guschmue
Copy link
Contributor

guschmue commented Jun 3, 2024

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

guschmue commented Jun 3, 2024

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

guschmue commented Jun 3, 2024

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Jun 4, 2024

Wanming, can we implement the min/max decomposition in ORT when using the WebNN CPU backend, then remove it when Junwei implements it in Chromium's CPU backend? Then we'd have fewer fragmented partitions, and it would be closer to the expected end result. In any case, I'm glad you added the Chromium todo comment, and I still approve knowing that proper support in the web API is coming soon.

@Honry
Copy link
Contributor Author

Honry commented Jun 4, 2024

Wanming, can we implement the min/max decomposition in ORT when using the WebNN CPU backend, then remove it when Junwei implements it in Chromium's CPU backend? Then we'd have fewer fragmented partitions, and it would be closer to the expected end result. In any case, I'm glad you added the Chromium todo comment, and I still approve knowing that proper support in the web API is coming soon.

👍Good point, I will follow up.

@guschmue guschmue merged commit da1f8f9 into microsoft:main Jun 6, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebNN WebNN execution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants