Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stress_test: Control the level of workload of each components #1

Closed
da-ekchajzer opened this issue Sep 16, 2022 · 8 comments
Closed

Comments

@da-ekchajzer
Copy link
Contributor

Problem

We need to automate the collection of point-in-time power consumption measurements per component at different workload levels.

Example for CPU intel xeon platinium

168419401-f07653f2-066f-43b4-a3a9-340636a00c9a

We need to control the level of workload of each components involve in the evaluation.

Solution

I propose to use a stress module to control the workload level. We can start with a step of 10 : 0%, 10%, 20%, ..., 100%.

@github-benjamin-davy, you have already developed this kind of module. Would you have some advice / resources that could be useful ?

Article : https://medium.com/teads-engineering/estimating-aws-ec2-instances-power-consumption-c9745e347959

@github-benjamin-davy
Copy link

Hello @da-ekchajzer, we used Stress-ng on our side and we found it pretty flexible to simulate different kinds of workloads and most importantly target a specific CPU load which isn't always easy with other stress tools. To reflect an average usage we tested several options.

To be able to run on many platforms it should be coupled with power measurement (software based of physical wattmeters). By default what was missing on our side for CPU was the support of AMD machines but that's easily doable (without memory consumption however for AMD). The same goes for GPUs if looking at ML-oriented hardware (Nvidia and AMD provide tooling for this).

@da-ekchajzer
Copy link
Contributor Author

da-ekchajzer commented Sep 16, 2022

Thank you for the resource.

About power measurement, I'll let you give your opinion or advice here : #2

@da-ekchajzer da-ekchajzer changed the title Control the level of workload of each components : Stress test stress_test: Control the level of workload of each components Sep 30, 2022
@da-ekchajzer
Copy link
Contributor Author

da-ekchajzer commented Oct 4, 2022

To-do

  • List the type of stress test we want to implement
  • List the program that will handle the stress test and the parameter values for each chosen stress test

@da-ekchajzer
Copy link
Contributor Author

See https://github.com/teads/turbostress for an implementation of stress-ng in an analog context.

@maethor
Copy link
Collaborator

maethor commented Oct 24, 2022

I think we should provide a complete example based on stress-ng, but the list of stress tests could be left to the user. We should just load a list of commands from a file.

Something like could work fine :

#!/bin/bash

# We don't want to leave a stress test running after this script
trap 'kill $(jobs -p)' EXIT

INTERVAL=5
WARMUP=20
TIMEOUT=60

# In the final version, this list could be load from a config file
stresstests="""
stress-ng --cpu 1
stress-ng --cpu 4
stress-ng --cpu-load 100
"""

echo "$stresstests" | while IFS= read -r stresstest ; do
   if [ -n "$stresstest" ]; then
       echo "Running $stresstest"
       $stresstest > /dev/null 2>&1 &
       testpid=$!

       i=0
       while [ $((INTERVAL * i)) -lt "$TIMEOUT" ]; do
           if [ $((INTERVAL * i)) -gt "$WARMUP" ]; then
               uptime # Here we call the complete "get_states" function to get power consumption and various metrics.
           fi
           sleep $INTERVAL
           i=$((i+1))
       done

       kill $testpid
   fi
done

@bpetit
Copy link

bpetit commented Nov 2, 2022

Hey there !

Just had a great discussion with Arne from Green Coding. They have very valuable insights, and also a tool for this, includng very important variables like hyper threading, turboboost, etc.. We should synchronize before moving on a direction by ourselves !

We should have a look to https://github.com/green-coding-berlin/tools and synchronize with them

@da-ekchajzer
Copy link
Contributor Author

I guess we should have an interface to easily implement different types of tests.

@maethor
Copy link
Collaborator

maethor commented Dec 12, 2022

Now that we have decided to use stress-ng. I think we should continue the discussion here #19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants