-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.7 million prims usdstage::load is slow even if multithreaded #3351
Comments
Filed as internal issue #USD-10263 |
Hey congrats on NSI being so awesome, @pberto ! USD has to do quite alot of work that I don't expect a flat NSI does. A good chunk of it goes into the fact that USD prepares for authoring/editing in addition to just extracting the data. A project we are hoping to get to in the next year is allowing clients to open a UsdStage without (or deferring) the cost of computing and building all of those editing dependencies, which I expect should help here, though we haven't yet done measurements to guide our expectations. |
I prefer without than defer @spiffmon :-) Actually thanks to this test I think NSI reading will become multi-threaded in the future (to the extent it is possible). Here some stats just using USD:
This is the
|
Btw, for completeness sake 🍶, here's the stats with everything set at
|
Any feedback here? spending 100 seconds to build the stage is a bit scary hehe. I'd like to know whats the plan in the medium term. |
So I have a scene with ~4.7M prims out of which just 0.5M are instances.
The time to create the stage is 40 seconds. I see all cores spinning so obviously USD is trying to read it in parallel as
UsdStage::Load
should. Now, if I dump the whole scene to a NSI file and load it with just 1 single core , it takes also 40 seconds. So it seems there should be space for USD to do a better job at doing that multi-thread reading ;-)The scene is the activision caldera scene with full geom on all the assets prior to
over "st_main"
.Also I switched off the layers with player data, so you could replicate this easily. Again the problem here is that the parallelized
UsdStage::Load
does not seem very efficient albeit using all cores. Would be nice to understand why. I can send you the.usda
file if needed.The text was updated successfully, but these errors were encountered: