Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0] Phase 2 of Counter-Based Event Implementation #1698

Merged
merged 16 commits into from
Sep 19, 2024

Conversation

winstonzhang-intel
Copy link
Contributor

@winstonzhang-intel winstonzhang-intel commented May 31, 2024

-enable counter-based events for regular commandlist
-counter-based events may be reused even though they are not done
-when ref count goes to not used by external clients value it means that event may be reused by subsequent calls -move events that are no longer externally visible to re-usable pool and reuse those more aggressively

intel/llvm PR: intel/llvm#14754

@github-actions github-actions bot added the level-zero L0 adapter specific issues label May 31, 2024
@pbalcer
Copy link
Contributor

pbalcer commented Jun 6, 2024

This does not compile /w L0 adapter enabled. Also, feel free to add a relevant benchmark scenario to https://github.com/oneapi-src/unified-runtime/blob/main/.github/scripts/compute_benchmarks.py, or just run the existing benchmark with whatever env variables are needed. You can run these from: https://github.com/oneapi-src/unified-runtime/actions/workflows/benchmarks_compute.yml

You can reach out to me if you need help or advice.

@winstonzhang-intel
Copy link
Contributor Author

@pbalcer It should compile now, working out some of the e2e tests that are still failing.

@nrspruit
Copy link
Contributor

@winstonzhang-intel , please link the intel/llvm PR related to this issue so we can see the full e2e test results.

Copy link

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1):
https://github.com/oneapi-src/unified-runtime/actions/runs/9694638615

Copy link

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1):
https://github.com/oneapi-src/unified-runtime/actions/runs/9694638615
Job status: failure. Test status: skipped.

winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 2, 2024
@winstonzhang-intel winstonzhang-intel marked this pull request as ready for review July 2, 2024 07:03
@winstonzhang-intel winstonzhang-intel requested a review from a team as a code owner July 2, 2024 07:03
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 2, 2024
@nrspruit nrspruit requested review from MichalMrozek and pbalcer and removed request for MichalMrozek July 2, 2024 20:01
Copy link

github-actions bot commented Jul 3, 2024

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1):
https://github.com/oneapi-src/unified-runtime/actions/runs/9780598178

Copy link

github-actions bot commented Jul 3, 2024

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1):
https://github.com/oneapi-src/unified-runtime/actions/runs/9780598178
Job status: success. Test status: success.

Benchmark Results

---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl, mean execution time per 10 kernels (μs)
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (38.675 us)   : crit, 0, 38

        baseline (38.357 us)   :  0, 38

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (36.082 us)   : crit, 0, 36

        baseline (36.972 us)   :  0, 36

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (40.549 us)   : crit, 0, 40

        baseline (41.505 us)   :  0, 41

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (40.023 us)   : crit, 0, 40

        baseline (41.129 us)   :  0, 41

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),38.675,38.403,4.91%,37.600,206.755,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),36.082,36.040,2.38%,35.332,112.299,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),40.549,40.484,2.12%,39.520,109.681,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),40.023,39.999,2.41%,38.600,109.795,[CPU],[us]

@oneapi-src oneapi-src deleted a comment from github-actions bot Jul 3, 2024
@oneapi-src oneapi-src deleted a comment from github-actions bot Jul 3, 2024
source/adapters/level_zero/context.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/context.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/context.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/context.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/context.hpp Outdated Show resolved Hide resolved
source/adapters/level_zero/context.hpp Outdated Show resolved Hide resolved
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 10, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 10, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 10, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 10, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Jul 11, 2024
@winstonzhang-intel winstonzhang-intel requested a review from a team July 11, 2024 15:18
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Sep 11, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Sep 11, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Sep 11, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Sep 13, 2024
Copy link
Contributor

@MichalMrozek MichalMrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1):
https://github.com/oneapi-src/unified-runtime/actions/runs/10880913609

Copy link

Compute Benchmarks level_zero run (--env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1):
https://github.com/oneapi-src/unified-runtime/actions/runs/10880913609
Job status: success. Test status: success.

Summary

result is better

Benchmark This PR baseline
api_overhead_benchmark_sycl SubmitKernel out of order 48.362 50.631
api_overhead_benchmark_sycl SubmitKernel in order 47.024 49.385
api_overhead_benchmark_ur SubmitKernel out of order 31.312 31.93
api_overhead_benchmark_ur SubmitKernel in order 25.546 28.586
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 424.685 423.457
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 261.384 253.906
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 10.089 9.179
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.002 1.854
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.143 4.506
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 2.096 3.613
miscellaneous_benchmark_sycl VectorSum 858.416 863.651
Velocity-Bench Hashtable 207.852567 178.291413
Velocity-Bench Bitcracker 35.6076 35.8407
Velocity-Bench CudaSift 256.843 283.294
Velocity-Bench Easywave 446 457.0
Velocity-Bench QuickSilver 90.08 115.63
Velocity-Bench Sobel Filter 985.857 934.963

Charts

api_overhead_benchmark_sycl SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (48.362 μs)   : crit, 0, 48

        baseline (50.631 μs)   :  0, 50

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (47.024 μs)   : crit, 0, 47

        baseline (49.385 μs)   :  0, 49

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_ur SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_ur SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (31.312 μs)   : crit, 0, 31

        baseline (31.93 μs)   :  0, 31

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_ur SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_ur SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (25.546 μs)   : crit, 0, 25

        baseline (28.586 μs)   :  0, 28

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (424.685 μs)   : crit, 0, 424

        baseline (423.457 μs)   :  0, 423

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (261.384 μs)   : crit, 0, 261

        baseline (253.906 μs)   :  0, 253

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)

        This PR (10.089 μs)   : crit, 0, 10

        baseline (9.179 μs)   :  0, 9

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
    todayMarker off
    dateFormat  X
    axisFormat %s

    section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)

        This PR (3.002 μs)   : crit, 0, 3

        baseline (1.854 μs)   :  0, 1

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)

        This PR (2.143 μs)   : crit, 0, 2

        baseline (4.506 μs)   :  0, 4

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)

        This PR (2.096 μs)   : crit, 0, 2

        baseline (3.613 μs)   :  0, 3

    -   : 0, 0

    -   : 0, 0

Loading
miscellaneous_benchmark_sycl VectorSum
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title miscellaneous_benchmark_sycl VectorSum
    todayMarker off
    dateFormat  X
    axisFormat %s

    section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)

        This PR (858.416 μs)   : crit, 0, 858

        baseline (863.651 μs)   :  0, 863

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Hashtable
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable

        This PR (207.852567 M keys/sec)   : crit, 0, 207

        baseline (178.291413 M keys/sec)   :  0, 178

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Bitcracker
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker

        This PR (35.6076 s)   : crit, 0, 35

        baseline (35.8407 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench CudaSift
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench CudaSift
    todayMarker off
    dateFormat  X
    axisFormat %s

    section cudaSift

        This PR (256.843 ms)   : crit, 0, 256

        baseline (283.294 ms)   :  0, 283

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Easywave
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave

        This PR (446 ms)   : crit, 0, 446

        baseline (457.0 ms)   :  0, 457

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench QuickSilver
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver

        This PR (90.08 MMS/CTT)   : crit, 0, 90

        baseline (115.63 MMS/CTT)   :  0, 115

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Sobel Filter
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter

        This PR (985.857 ms)   : crit, 0, 985

        baseline (934.963 ms)   :  0, 934

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),48.362,47.646,7.34%,43.188,547.322,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),47.024,46.508,6.65%,44.278,209.617,[CPU],[us]

SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),31.312,31.050,6.53%,29.597,503.558,[CPU],[us]

SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.546,29.884,27.77%,13.324,230.644,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),424.685,467.871,19.83%,246.890,870.042,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),261.384,238.517,22.09%,230.359,746.004,[CPU],[us]

QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),10.089,9.944,18.73%,7.751,150.687,[CPU],[us]

StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.002,3.081,6.77%,0.382,3.365,[CPU],[GB/s]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.143,2.101,14.10%,1.894,75.835,[CPU],[us]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),2.096,1.670,45.10%,1.554,28.530,[CPU],[us]

VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),858.416,858.902,0.49%,821.607,879.002,[GPU],bw [GB/s]

hashtable

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.645735 s
207.852567 million keys/second

bitcracker

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.0101897 s
bitcracker - total time for whole calculation: 35.6076 s

cudaSift

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1185 1247 32.1749% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1256 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1138 1277 30.8987% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1267 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1140 1265 30.953% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1262 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1259 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1155 1257 31.3603% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1273 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1258 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1274 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1268 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1163 1253 31.5775% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1249 1284 33.9126% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1070 1268 29.0524% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1273 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1254 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1268 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1159 1261 31.4689% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1260 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1267 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1256 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1275 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1259 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1267 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1108 1276 30.0842% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1249 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1263 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1205 1271 32.7179% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1264 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1274 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1270 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1263 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1250 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1257 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1267 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1272 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 256.843 ms

easywave

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

QuickSilver

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1
QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.411710e-01 8.249170e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.726020e-01 9.738420e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 5.810500e-01 1.006878e+00 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 6.018250e-01 1.105585e+00 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 5.608290e-01 1.040724e+00 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.749500e-01 9.924050e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 5.696560e-01 1.000601e+00 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 5.518340e-01 1.028976e+00 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 5.396320e-01 1.035437e+00 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 5.596030e-01 9.911010e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.516e+07 1.516e+07 1.516e+07 0.000e+00 100.00
cycleInit 10 5.153e+06 5.153e+06 5.153e+06 0.000e+00 100.00
cycleTracking 10 1.000e+07 1.000e+07 1.000e+07 0.000e+00 100.00
cycleTracking_Kernel 104 4.942e+06 4.942e+06 4.942e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.556e+05 2.556e+05 2.556e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 7.140e+02 7.140e+02 7.140e+02 0.000e+00 100.00
Figure Of Merit 90.08 [Num Mega Segments / Cycle Tracking Time]

sobel_filter

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1
UR_L0_USE_DRIVER_INORDER_LISTS=1
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 14.9964 s
sobelfilter - total time for whole calculation: 0.985857 s

-enable counter-based events for regular commandlist
-counter-based events may be reused even though they are not done
-when ref count goes to not used by external clients value it means that event may be reused by subsequent calls
-move events that are no longer externally visible to re-usable pool and reuse those more aggressively

Signed-off-by: Winston Zhang <[email protected]>
Signed-off-by: Winston Zhang <[email protected]>
Signed-off-by: Winston Zhang <[email protected]>
Signed-off-by: Winston Zhang <[email protected]>
Signed-off-by: Winston Zhang <[email protected]>
Signed-off-by: Winston Zhang <[email protected]>
Signed-off-by: Winston Zhang <[email protected]>
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Sep 18, 2024
winstonzhang-intel added a commit to winstonzhang-intel/llvm that referenced this pull request Sep 18, 2024
@pbalcer pbalcer merged commit 77187f6 into oneapi-src:main Sep 19, 2024
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conformance Conformance test suite issues. level-zero L0 adapter specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants