Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portable CM script failed (name = build-docker-image, return code = 256) #1276

Closed
Agalakdak opened this issue Jul 31, 2024 · 2 comments
Closed

Comments

@Agalakdak
Copy link

Agalakdak commented Jul 31, 2024

I followed this guide: https://access.cknowledge.org/playground/?action=install

And then i use: cm pull repo mlcommons@cm4mlops --branch=dev

Ran this command: cmr "run-mlperf inference _find-performance _full _r4.1"
--model=bert-99
--implementation=nvidia
--framework=tensorrt
--category=datacenter
--scenario=Offline
--execution_mode=test
--device=cuda
--docker
--docker_cm_repo=mlcommons@cm4mlops
--docker_cm_repo_flags="--branch=mlperf-inference"
--test_query_count=100
--quiet

And after about 30 minutes I got the issue:

1 warning found (use docker --debug to expand):

  • SecretsUsedInArgOrEnv: Do not use ARG or ENV instructions for sensitive data (ARG "CM_GH_TOKEN") (line 14)
    mlperf-inference:mlpinf-v4.0-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-public.Dockerfile:45

43 |
44 | # Run commands
45 | >>> RUN cm run script --tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=bert-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=100 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_SUT_DESC_CACHE=no --env.CM_SUT_META_EXISTS=yes --env.CM_MODEL=bert-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet --fake_run --env.CM_RUN_STATE_DOCKER=True
46 |


ERROR: failed to solve: process "/bin/bash -c cm run script --tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=nvidia --env.CM_MLPERF_MODEL=bert-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=tensorrt --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=100 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=yes --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.0 --env.CM_SUT_DESC_CACHE=no --env.CM_SUT_META_EXISTS=yes --env.CM_MODEL=bert-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet --fake_run --env.CM_RUN_STATE_DOCKER=True" did not complete successfully: exit code: 2

CM error: Portable CM script failed (name = build-docker-image, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

Do you need information about my system? If so, let me know here.

@Agalakdak
Copy link
Author

/home/user
INFO:root: ! call "postprocess" from /home/user/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/customize.py
GPU Device ID: 0
GPU Name: Quadro RTX 5000
GPU compute capability: 7.5
CUDA driver version: 12.4
CUDA runtime version: 11.8
Global memory: 16892952576
Max clock rate: 1815.000000 MHz
Total amount of shared memory per block: 49152
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block X: 1024
Max dimension size of a thread block Y: 1024
Max dimension size of a thread block Z: 64
Max dimension size of a grid size X: 2147483647
Max dimension size of a grid size Y: 65535
Max dimension size of a grid size Z: 65535

@arjunsuresh
Copy link
Contributor

Followed up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants