Skip to content

Performance Investigation

Sherlock edited this page Mar 12, 2021 · 21 revisions

Profiling Tools

  • nvprof

    • try run with/without --print-gpu-summary
    • try --profile-child-processes
    • Action: profile a training run
  • Visual Profiler UI

    • Use ruler to measure a time span
    • Identify the top hitters in kernels
    • Compare two sets of profiling results to identify the performance gap
    • Can you identify the start/end of a train_step from the timeline view?
  • torch profiler

  • Linux perf

CUDA Kernels Optimization