Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help for tracing error #11

Open
Synlvejo opened this issue Apr 28, 2020 · 18 comments
Open

Help for tracing error #11

Synlvejo opened this issue Apr 28, 2020 · 18 comments

Comments

@Synlvejo
Copy link

Synlvejo commented Apr 28, 2020

Hello,

I can correctly collect the profile.cubex by using the scorep, and for example I use scorep-score to specify the right SCOREP_TOTAL_MEMORY should larger than 43MB.
But when I install the scorep_plugin_x86_energy and set the env like this(without SCOREP_TOTAL_MEMORY):
##export SCOREP_ENABLE_TRACING="true"
##export SCOREP_ENABLE_PROFILING="false"
##export SCOREP_METRIC_PLUGINS=x86_energy_sync_plugin
##export SCOREP_METRIC_PLUGINS_SEP=";"
##export SCOREP_METRIC_X86_ENERGY_SYNC_PLUGIN="BLADE/E"
##export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0
##export SCOREP_METRIC_X86_ENERGY_SYNC_PLUGIN_OFFSET=70

I run the application again get the error like this:

NAS Parallel Benchmarks (NPB3.4-OMP) - IS Benchmark
Size: 33554432 (class B)
Iterations: 10
Number of available threads: 20
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk!
[OTF2] src/otf2_archive_int.c:2122: error: This could not be done with the given memory: Can't create event writer!
[OTF2] src/OTF2_Archive.c:977: error: This could not be done with the given memory: Could not get local event writer
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] src/measurement/SCOREP_Memory.c:175: Error: No free memory page available: [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk!
[OTF2] src/otf2_archive_int.c:2122: error: This could not be done with the given memory: Can't create event writer!
[OTF2] src/OTF2_Archive.c:977: error: This could not be done with the given memory: Could not get local event writer
[Score-P] src/measurement/SCOREP_Memory.c:175: Error: No free memory page available: Out of memory. Please increase SCOREP_TOTAL_MEMORY=16384000 and try again.
[Score-P] src/measurement/SCOREP_Memory.c:179: Error: No free memory page available: Please ensure that there are at least 2MB available for each intended location.
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk!
[Score-P] src/measurement/SCOREP_Memory.c:183: Error: No free memory page available: Where there are currently 20 locations in use in this failing process.
[Score-P] Memory usage of rank 0
[Score-P] Memory used so far:
Out of memory. Please increase SCOREP_TOTAL_MEMORY=16384000 and try again.
……
……
[Score-P] src/measurement/SCOREP_Memory.c:179: Error: No free memory page available: Please ensure that there are at least 2MB available for each intended location.
[Score-P] Score-P runtime-management memory tracking:
Aborted

Then I set the env(SCOREP_TOTAL_MEMORY) large enough:
##export SCOREP_TOTAL_MEMORY=64000000(about 64MB)
I will get a loop message:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] Trace buffer flush on rank 0.
[Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

Further more , I set the env:
##export SCOREP_TOTAL_MEMORY=6400000000(about 6.4GB)
Error:
is.B.x: ../../build-backend/../src/measurement/scorep_environment.c:299: SCOREP_Env_GetPageSize: Assertion `env_total_memory <= (4294967295U)' failed.
Aborted

Is there any problem when I set the env or other operation?

Thanks for any help!

@rschoene
Copy link
Member

Hi,
It looks like you're running into two problems: buffers that are too large (6.4GB) and buffers that are too small (64MB).
Why is 64 MB supposedly too small, you might ask. Probably, you did not activate this plugin for the profile that you fed to 'scorep-score'. With this plugin enabled, you will record metrics with every synchronous event (enter/exit) that occurs. Each of these will add to the byte count.
Why does 6.4 GB not work, you might ask. This can be explained with the internals of Score-P, which uses 32bit values for storing offsets, if I remember correctly. Hence, only 4 GB allowed.
Please try something like
'export SCOREP_TOTAL_MEMORY=1G'
Btw, yes, you can add postfix scaling like M and G to make your 'SCOREP_TOTAL_MEMORY' setting more readable.

@Synlvejo
Copy link
Author

Synlvejo commented Apr 29, 2020

Hi,
It looks like you're running into two problems: buffers that are too large (6.4GB) and buffers that are too small (64MB).
Why is 64 MB supposedly too small, you might ask. Probably, you did not activate this plugin for the profile that you fed to 'scorep-score'. With this plugin enabled, you will record metrics with every synchronous event (enter/exit) that occurs. Each of these will add to the byte count.
Why does 6.4 GB not work, you might ask. This can be explained with the internals of Score-P, which uses 32bit values for storing offsets, if I remember correctly. Hence, only 4 GB allowed.
Please try something like
'export SCOREP_TOTAL_MEMORY=1G'
Btw, yes, you can add postfix scaling like M and G to make your 'SCOREP_TOTAL_MEMORY' setting more readable.

Thanks,
Actualy, I have used 640MB to try and the error is like 64MB:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] Trace buffer flush on rank 0.
[Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

And I just tried the 1G,2G,3G,but the error is same.
The src/measurement/tracing/SCOREP_Tracing.c:226:

/* ignore allocation failures, OTF2 will flush and free chunks */

#if HAVE( UTILS_DEBUG )
if ( !chunk )
{
UTILS_WARNING( "Cannot allocate %" PRIu64 " bytes for tracing; but OTF2 will flush and free chunks.", chunkSize );
}
#endif
return chunk;
}

Maybe it's helpful?

And for btw, I tuned the SCOREP_TOTAL_MEMORY by add or remove a '0', lol

@bmario
Copy link
Member

bmario commented Apr 29, 2020

Looks like you're setting the sampling rate to infinity?

##export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0

@Synlvejo
Copy link
Author

Looks like you're setting the sampling rate to infinity?

##export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0

Oh isn't it meaning default value?
This is the installation guide value, at the bottom of page 17.
Thanks!

@bmario
Copy link
Member

bmario commented Apr 29, 2020

The default value is 50000, which corresponds to 50ms.

@Synlvejo
Copy link
Author

The default value is 50000, which corresponds to 50ms.

Thanks,

I tried to set value to 50000 without set SCOREP_TOTAL_MEMORY. The shell message consists of :

[Score-P] Memory: Location-Misc
[Score-P] Memory allocated [bytes] 8192
[Score-P] Memory used [bytes] 984
[Score-P] Memory available [bytes] 7208
[Score-P] Number of pages allocated 1
[Score-P] Number of pages used 1

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] Trace buffer flush on rank 0.
[Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

#0 0x7ffb3d58bb9a in ???
#1 0x7ffb3d58adc3 in ???
#1 0x7ffb3d58adc3 in ???
#1 0x7ffb3d58adc3 in ???
#2 0x7ffb3ccbb3af in ???

And end with :
#15 0x7ffb3d05ae64 in ???
#16 0x7ffb3cd8388c in ???
#17 0xffffffffffffffff in ???
Aborted

But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] Trace buffer flush on rank 0.
[Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

And when I set it back to 16MB , the message is like the fist one.

@bmario
Copy link
Member

bmario commented Apr 29, 2020

nvm. I just realized you're using the sync plugin. Then there's no need to set this environment variable anyways. However, you may still run into a similar issue. I recommend looking into filtering some regions.

@Synlvejo
Copy link
Author

nvm. I just realized you're using the sync plugin. Then there's no need to set this environment variable anyways. However, you may still run into a similar issue. I recommend looking into filtering some regions.

It doesn't matter.
I just want to get the energy message by not using the HDEEM, so I choose the scorep_plugin_x86_energy.
Do you mean that I have no need to change SCOREP_TOTAL_MEMORY? Acording to the error message I can only do this .
And is the suggestion is filtering some region and try again?
I will try it.

@bmario
Copy link
Member

bmario commented Apr 29, 2020

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

@Synlvejo
Copy link
Author

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

Okay, thanks .
I will add a .filt file to filter some region and try again.

@Synlvejo
Copy link
Author

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

It doesn't work ,but in the scorep-measurement-tmp folder there are 14 .evt files. Are they helpful for this issue?
In addition, I have tried NPB/IS. Even class A there is this error. Maybe there are other possible resons?

@Synlvejo
Copy link
Author

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

I set tracing false and profiling true and I can get the consumption in profile.cubex.
Is this okay?

@umbreensabirmain
Copy link
Collaborator

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

I set tracing false and profiling true and I can get the consumption in profile.cubex.
Is this okay?

Yes, with the sync plugin this should be okay.

@Synlvejo
Copy link
Author

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

I set tracing false and profiling true and I can get the consumption in profile.cubex.
Is this okay?

Yes, with the sync plugin this should be okay.

Thank you very much.
I will use this config to do comperation experiment.

@AndreasGocht
Copy link
Collaborator

But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] Trace buffer flush on rank 0.
[Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

And when I set it back to 16MB , the message is like the fist one.

Another guess: do you use MPI? If so: Are there a lot of things happening before the actual MPI_INIT? This might lead Score-P into some Issues.

Best,
Andreas

@Synlvejo
Copy link
Author

Synlvejo commented Apr 30, 2020

But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks.
[Score-P] Trace buffer flush on rank 0.
[Score-P] Increase SCOREP_TOTAL_MEMORY and try again.
And when I set it back to 16MB , the message is like the fist one.

Another guess: do you use MPI? If so: Are there a lot of things happening before the actual MPI_INIT? This might lead Score-P into some Issues.
Best,
Andreas

I only use OpenMP.
And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?

Best,too

@umbreensabirmain
Copy link
Collaborator

I only use OpenMP.
And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?

Best,too

It looks like you are getting the Cube tuple format. Did you use the Environment variable SCOREP_PROFILING_FORMAT? yes you are right, in this format the order of values is (times, min value, max value) : average value, standard deviation.

@Synlvejo
Copy link
Author

I only use OpenMP.
And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?
Best,too

It looks like you are getting the Cube tuple format. Did you use the Environment variable SCOREP_PROFILING_FORMAT? yes you are right, in this format the order of values is (times, min value, max value) : average value, standard deviation.

Yes I can not enable tracing because the error. So if I want to get the energy value I have to get it from the the profile.cubex.
But in this way I have to deal the data by my python skill to analyse the region consumption. So painful.
Thanks for reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants