Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of feature caching for node classification #548

Open
mfbalin opened this issue Sep 22, 2024 · 4 comments
Open

Definition of feature caching for node classification #548

mfbalin opened this issue Sep 22, 2024 · 4 comments

Comments

@mfbalin
Copy link

mfbalin commented Sep 22, 2024

https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#14-appendix-benchmark-specific-rules

Here, it is stated that feature caching is not allowed. What is the definition of feature caching?

We are preparing to make a submission using the GraphBolt GNN dataloader and our framework has support for feature and graph caching on GPUs with no redundancy across GPUs. We also support caching in the system memory so I am wondering whether I can utilize any of these components for a valid closed MLPerf submission for gnn node classification.

GraphBolt's caching facilities:
https://www.dgl.ai/dgl_docs/generated/dgl.graphbolt.CPUCachedFeature.html
https://www.dgl.ai/dgl_docs/generated/dgl.graphbolt.GPUCachedFeature.html

@ShriyaPalsamudram
Copy link
Contributor

Based on the rules, feature caching of any form is not allowed.

@drcanchi can you please review GraphBolt's caching and comment on whether this is any different and whether it can be used?

@mfbalin
Copy link
Author

mfbalin commented Sep 23, 2024

@ShriyaPalsamudram why is such caching not allowed? Both CPU and GPU memory hierarchies are made of multiple levels and caching is prevalently used to make anything run fast in hardware.

In our case, we treat GPU memory as a cache for the CPU memory which is a cache for the SSD storage.

@ShriyaPalsamudram
Copy link
Contributor

ShriyaPalsamudram commented Sep 23, 2024

The reason to disallow faature caching is to make the benchmark representative of real-world GNN workloads which typically work on much much larger datasets (and features). Because we couldn't access an open-sourced dataset that matches in size, we had to settle for a smaller one but make the benchmark be as representative as possible.

@mfbalin
Copy link
Author

mfbalin commented Sep 23, 2024

The reason to disallow faature caching is to make the benchmark representative of real-world GNN workloads which typically work on much much larger datasets (and features). Because we couldn't access an open-sourced dataset that matches in size, we had to settle for a smaller one but make the benchmark be as representative as possible.

Even when the dataset is large, caching will be employed to extract maximum performance from the underlying hardware. I guess we will have to make a submission in the open category to showcase what our software is capable of. Will any future submission utilizing caching qualify for the open division?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants