Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submission generator abruptly terminating for short run #289

Open
anandhu-eng opened this issue Sep 25, 2024 · 1 comment
Open

submission generator abruptly terminating for short run #289

anandhu-eng opened this issue Sep 25, 2024 · 1 comment

Comments

@anandhu-eng
Copy link
Contributor

Error:

INFO:root:* cm run script "generate inference submission"
INFO:root:  * cm run script "get python3"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/f0de2faca1cd4d70/cm-cached-state.json
INFO:root:Path to Python: /home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/bin/python3
INFO:root:Python version: 3.10.13
INFO:root:  * cm run script "mlcommons inference src"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/1deddf35572d4ee8/cm-cached-state.json
INFO:root:  * cm run script "get sut system-description"
INFO:root:    * cm run script "detect os"
INFO:root:           ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:    * cm run script "detect cpu"
INFO:root:      * cm run script "detect os"
INFO:root:             ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops
INFO:root:             ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root:             ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-os/customize.py
INFO:root:           ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-cpu/customize.py
INFO:root:    * cm run script "get python3"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/f0de2faca1cd4d70/cm-cached-state.json
INFO:root:Path to Python: /home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/bin/python3
INFO:root:Python version: 3.10.13
INFO:root:    * cm run script "get compiler"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/b6567b69e07a4e1a/cm-cached-state.json
INFO:root:    * cm run script "detect sudo"
Check sudo -p [sudo] password for %u:
INFO:root:           ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops
INFO:root:           ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-sudo/run.sh from tmp-run.sh
INFO:root:           ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/detect-sudo/customize.py
INFO:root:    * cm run script "get generic-python-lib _package.dmiparser"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/58291d09c2744ec5/cm-cached-state.json
INFO:root:    * cm run script "get cache dir _name.mlperf-inference-sut-descriptions"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/f04a49145ce24056/cm-cached-state.json
Generating SUT description file for intel_spr_i9
INFO:root:         ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops
INFO:root:         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-sut-description/detect_memory.sh from tmp-run.sh
INFO:root:         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-sut-description/customize.py
INFO:root:  * cm run script "install pip-package for-cmind-python _package.tabulate"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/a2fa268cf0da4e5f/cm-cached-state.json
INFO:root:  * cm run script "get mlperf inference utils"
INFO:root:    * cm run script "get mlperf inference src"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/1deddf35572d4ee8/cm-cached-state.json
INFO:root:         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/customize.py
INFO:root:       ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/generate-mlperf-inference-submission/customize.py
=================================================
Cleaning /home/anandhu/gh_action_submissions ...
=================================================
* MLPerf inference submission dir: /home/anandhu/gh_action_submissions
* MLPerf inference results dir: /home/anandhu/gh_action_results/test_results
* MLPerf inference division: open
* MLPerf inference submitter: MLCommons
intel_spr_i9-reference-gpu-pytorch-v2.4.1-default_config
sut info completely filled from /home/anandhu/gh_action_results/test_results/intel_spr_i9-reference-gpu-pytorch-v2.4.1-default_config/cm-sut-info.json!
* MLPerf inference model: stable-diffusion-xl
 * mlperf_log_detail.txt
 * mlperf_log_accuracy.json
 * mlperf_log_summary.txt
INFO:MLPerfLog:Sucessfully loaded MLPerf log from /home/anandhu/gh_action_results/test_results/intel_spr_i9-reference-gpu-pytorch-v2.4.1-default_config/stable-diffusion-xl/offline/performance/run_1/mlperf_log_detail.txt.
INFO:MLPerfLog:Sucessfully loaded MLPerf log from /home/anandhu/gh_action_results/test_results/intel_spr_i9-reference-gpu-pytorch-v2.4.1-default_config/stable-diffusion-xl/offline/performance/run_1/mlperf_log_detail.txt.
Traceback (most recent call last):
  File "/home/anandhu/.local/bin/cm", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1552, in _run
    r = prepare_and_run_script_with_postprocessing(run_script_input)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4693, in prepare_and_run_script_with_postprocessing
    rr = run_postprocess(customize_code, customize_common_input, recursion_spaces, env, state, const,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4743, in run_postprocess
    r = customize_code.postprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/generate-mlperf-inference-submission/customize.py", line 436, in postprocess
    r = generate_submission(i)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/generate-mlperf-inference-submission/customize.py", line 414, in generate_submission
    result_string, result = mlperf_utils.get_result_string(env['CM_MLPERF_LAST_RELEASE'], model, scenario, result_scenario_path, power_run, sub_res, division, system_file, model_precision)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/mlperf_utils.py", line 194, in get_result_string
    acc_valid, acc_results, acc_targets, acc_limits = get_accuracy_metric(config, mlperf_model, accuracy_path)
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/mlperf_utils.py", line 98, in get_accuracy_metric
    with open(os.path.join(path, "accuracy.txt"), "r", encoding="utf-8") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/anandhu/gh_action_results/test_results/intel_spr_i9-reference-gpu-pytorch-v2.4.1-default_config/stable-diffusion-xl/offline/accuracy/accuracy.txt'

Command used for run:

cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=reference --backend=pytorch --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --precision=float16 --quiet --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --precision=float16 --clean --adr.cuda.version=12.4.1 --rerun

command used for submission generation:

cm run script --tags=generate,inference,submission --clean --preprocess_submission=yes --run-checker --tar=yes --env.CM_TAR_OUTFILE=submission.tar.gz --division=open --category=datacenter --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes --run_style=test --adr.submission-checker.tags=_short-run --quiet --submitter=MLCommons --results_dir=$HOME/gh_action_results/test_results --submission_dir=$HOME/gh_action_submissions

its terminating in this line of code.

@arjunsuresh
Copy link
Contributor

It says missing accuracy.txt right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants