Approximate conversion of bmv2 p4 performance to hardware performance #1234

RithvikChuppala · 2024-03-13T20:26:27Z

I know that bmv2's performance is not production-grade and there are a lot of hardware-dependent factors but is there some approximate performance conversion factor or methodology to gauge the relative performance of the packet programs from bmv2's software switch to a production hardware switch? Something like clock cycles, CPU utilization, etc?

jafingerhut · 2024-03-13T21:34:17Z

It really depends upon the production switch you have in mind.

For example, Tofino's hardware architecture is such that at a basic introductory level you can say it has the following performance model:

If your P4 program fits into one pass, it operates at X billion packets per second of throughput, guaranteed
If your P4 program does not fit into one pass, it operates at 0 packets per second of throughput, guaranteed

Now of course you can get more nuanced than that, by allowing P4 programs that explicitly recirculate packets, and have other operating points like this:

If your P4 program fits into K passes, it operates at X billion packets per second, but only for a fraction 1/K of the ports being usable, with the rest being dedicated as recirculation ports

There are other hardware architectures where the performance will degrade more gradually than that, if you go "a little bit over" the budget of what can be done at X billion packets per second.

Some will have caches between the packet processing core and DRAM, and then cache hit rates play a huge part in the throughput and latency.

Sorry I can't give you a more specific answer, but if you dive at least a bit into two different-enough hardware architectures, you will start to see more of the reasons that "it depends" is the correct answer.

RithvikChuppala · 2024-03-13T21:42:23Z

Thanks for the quick reply!

For my use case, I'm implementing packet processing functionality to perform tunneling (stripping tunnel headers, adding new egress headers, etc). I aim to show that executing this packet process functionality in a programmable switch improves throughput and latency metrics compared to the normal software-based approach.

However, since bmv2 isn't an accurate representation of performance, what proxy metric for ideal hardware performance do you think makes the most sense?

jafingerhut · 2024-03-13T23:17:19Z

If you can, the truly best measure is to implement it and measure the relevant performance metrics on a real hardware device.

If for some reason that is not possible, then the next best thing is to learn about some hardware device in enough detail that you can make a good educated guess what the performance metrics would be.

github-actions · 2024-09-10T00:02:53Z

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

github-actions bot added the lifecycle/stale label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approximate conversion of bmv2 p4 performance to hardware performance #1234

Approximate conversion of bmv2 p4 performance to hardware performance #1234

RithvikChuppala commented Mar 13, 2024 •

edited

Loading

jafingerhut commented Mar 13, 2024

RithvikChuppala commented Mar 13, 2024

jafingerhut commented Mar 13, 2024

github-actions bot commented Sep 10, 2024

Approximate conversion of bmv2 p4 performance to hardware performance #1234

Approximate conversion of bmv2 p4 performance to hardware performance #1234

Comments

RithvikChuppala commented Mar 13, 2024 • edited Loading

jafingerhut commented Mar 13, 2024

RithvikChuppala commented Mar 13, 2024

jafingerhut commented Mar 13, 2024

github-actions bot commented Sep 10, 2024

RithvikChuppala commented Mar 13, 2024 •

edited

Loading