Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate conversion of bmv2 p4 performance to hardware performance #1234

Open
RithvikChuppala opened this issue Mar 13, 2024 · 4 comments

Comments

@RithvikChuppala
Copy link

RithvikChuppala commented Mar 13, 2024

I know that bmv2's performance is not production-grade and there are a lot of hardware-dependent factors but is there some approximate performance conversion factor or methodology to gauge the relative performance of the packet programs from bmv2's software switch to a production hardware switch? Something like clock cycles, CPU utilization, etc?

@jafingerhut
Copy link
Contributor

It really depends upon the production switch you have in mind.

For example, Tofino's hardware architecture is such that at a basic introductory level you can say it has the following performance model:

  • If your P4 program fits into one pass, it operates at X billion packets per second of throughput, guaranteed
  • If your P4 program does not fit into one pass, it operates at 0 packets per second of throughput, guaranteed

Now of course you can get more nuanced than that, by allowing P4 programs that explicitly recirculate packets, and have other operating points like this:

  • If your P4 program fits into K passes, it operates at X billion packets per second, but only for a fraction 1/K of the ports being usable, with the rest being dedicated as recirculation ports

There are other hardware architectures where the performance will degrade more gradually than that, if you go "a little bit over" the budget of what can be done at X billion packets per second.

Some will have caches between the packet processing core and DRAM, and then cache hit rates play a huge part in the throughput and latency.

Sorry I can't give you a more specific answer, but if you dive at least a bit into two different-enough hardware architectures, you will start to see more of the reasons that "it depends" is the correct answer.

@RithvikChuppala
Copy link
Author

Thanks for the quick reply!

For my use case, I'm implementing packet processing functionality to perform tunneling (stripping tunnel headers, adding new egress headers, etc). I aim to show that executing this packet process functionality in a programmable switch improves throughput and latency metrics compared to the normal software-based approach.

However, since bmv2 isn't an accurate representation of performance, what proxy metric for ideal hardware performance do you think makes the most sense?

@jafingerhut
Copy link
Contributor

If you can, the truly best measure is to implement it and measure the relevant performance metrics on a real hardware device.

If for some reason that is not possible, then the next best thing is to learn about some hardware device in enough detail that you can make a good educated guess what the performance metrics would be.

Copy link

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants