Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnableLogCleaner() interface is not thread safe? #753

Open
kimi20071025 opened this issue Dec 7, 2021 · 16 comments
Open

EnableLogCleaner() interface is not thread safe? #753

kimi20071025 opened this issue Dec 7, 2021 · 16 comments

Comments

@kimi20071025
Copy link

I read the functions of EnableLogCleanerDisableLogCleaner and LogCleaner::enabled().

I find the private member enabled_ of LogCleaner is not lock guard.

Once one thread call EnableLogCleaner,and other thread is logging, It may lead to undefined behaviour?

@sergiud
Copy link
Collaborator

sergiud commented Dec 7, 2021

Could you please point out the exact location where the problem occurs?

@kimi20071025
Copy link
Author

kimi20071025 commented Dec 8, 2021

Could you please point out the exact location where the problem occurs?

When one thread is logging to file, it will call the function LogFileObject::Write(). The function LogFileObject::Write() will call log_cleaner.enabled(). The log_cleaner.enabled() visits the enabled_ without lock guarded.

meanwhile other thread try to change the enabled_ by call function EnableLogCleaner or DisableLogCleaner which will write the enabled_ without lock guarded.

so may be it lead to undefined behaviour? I am just in doubt of this. May be I am wrong.

Thanks for response.

@sergiud
Copy link
Collaborator

sergiud commented Dec 9, 2021

Is it possible that you mean a race condition and not undefined behavior? The former can occur because EnableLogCleaner and DisableLogCleaner are not atomic. However, I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

cc @aesophor

@kimi20071025
Copy link
Author

kimi20071025 commented Dec 10, 2021

Is it possible that you mean a race condition and not undefined behavior? The former can occur because EnableLogCleaner and DisableLogCleaner are not atomic. However, I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

cc @aesophor

Yes.
One thread is reading enabled_ while another is writing it. May the race condition leads to undefined behavior ?

It must not leads to undefined behavior because enabled_ is bool ?

@sergiud
Copy link
Collaborator

sergiud commented Dec 10, 2021

I'm not sure what are you asking. Just because a boolean is not atomic it will not immediately cause undefined behavior. Also, undefined behavior is generally not prescribed by a library but by the language standard.

@kimi20071025
Copy link
Author

I'm not sure what are you asking. Just because a boolean is not atomic it will not immediately cause undefined behavior. Also, undefined behavior is generally not prescribed by a library but by the language standard.

OK. If the enabled_ is uint8_t, the race condition probably cause undefined behavior?

@aesophor
Copy link
Contributor

Quoting from @sergiud:

I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

@kimi20071025 We understand the possible race condition you've mentioned. Are you working on something that can trigger this race condition and is intolerable?

IMO, if we want to eliminate race condition completely, just making enabled_ an std::atomic is probably not enough: Imaging two threads calling LogCleaner::Enable(int days), the value of LogCleaner::overdue_days_ at the end can be indeterministic. However, if we use a mutex to protect the entire body of both LogCleaner::Enable() and LogCleaner::enabled(), then they will bring overhead for those who use this API with a single thread.

In this case, is there a better option other than a mutex?

@kimi20071025
Copy link
Author

Quoting from @sergiud:

I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

@kimi20071025 We understand the possible race condition you've mentioned. Are you working on something that can trigger this race condition and is intolerable?

IMO, if we want to eliminate race condition completely, just making enabled_ an std::atomic is probably not enough: Imaging two threads calling LogCleaner::Enable(int days), the value of LogCleaner::overdue_days_ at the end can be indeterministic. However, if we use a mutex to protect the entire body of both LogCleaner::Enable() and LogCleaner::enabled(), then they will bring overhead for those who use this API with a single thread.

In this case, is there a better option other than a mutex?

What I am worried about is not race condition but data race.

Glog library try to clean overdue logs every time when logging. I think it may cost much performance If LogCleaner is opened all the time. So Once a day, I enable log cleaner (EnableLogCleaner ) at night , do an error logging and close log cleaner (DisableLogCleaner ).

When a thread is logging, it will finally call LogFileObject::Write, the function LogFileObject::Write calls log_cleaner.enabled().
which means the thread reads enabled_ .

Meanwhile, other thread calls EnableLogCleaner or DisableLogCleaner , the thread writes enabled_ .

So what I worried is the condition may cause data race .

@sergiud
Copy link
Collaborator

sergiud commented Dec 13, 2021

I had to actually look this up. But it seems that concurrent reading/writing of/to a variable is indeed undefined behavior in C++.

However, at this point I would rather claim that your use case is not supported in the sense that you should not be calling EnableLogCleaner and DisableLogCleaner from different threads. As @aesophor already mentioned, a mutex is necessary to solve the data race which will however incur an overhead at little or no gain. We can certainly reconsider once we have the means for proper synchronization (i.e., C++11 <thread> functionality). But at this point, the effort making these functions thread safe is not worth it.

I think it may cost much performance

How did you determine the performance penalty, or is this only a guess? If it's the latter, you should measure quantitatively first and let us know how strong the performance degradation is if the cleaner has been enabled. This is something worth improving.

@kimi20071025
Copy link
Author

About the log cleaner performance, I shared the same opnion with issue #702 . Much performance cost when enable log clean.

And I also read the benchmark #761 , could @aesophor help to make other benchmark comparing the logging performance between enable log clean with disable log clean?

If there is much performance penalty comparing by disable log clean. It means the LogCleaner can not be enabled all the time.

@aesophor
Copy link
Contributor

Platform:

  • Arch Linux
  • CPU: Intel i7-10700 CPU @ 2.90GH (16-core)
  • Memory: 16 GB
    ...

When there are 100 files in log directories, the execution time of LogCleaner::Run() is 0.157 ms in avg.

Upon each call to LogFileObject::write(), if the log cleaner is enabled and if there are 100 files to check, it only takes extra 0.157 ms to run.

@sergiud
Copy link
Collaborator

sergiud commented Dec 15, 2021

About the log cleaner performance, I shared the same opnion with issue #702 . Much performance cost when enable log clean.

What's exactly "much performance" in absolute numbers? Issue #702 did not provide any details either. Therefore I don't see any factual arguments.

@kimi20071025
Copy link
Author

Platform

  • Arch:linux
  • CPU:Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz(4 cores)
  • Memory: 4 GB

Test code

#include "glog/logging.h"
#include <unistd.h>
#include <chrono>
#include <iostream>

using namespace std;
using namespace chrono;

int main(int argc, char *argv[])
{
    google::InitGoogleLogging("");    
    FLAGS_logbufsecs = 0;
    FLAGS_max_log_size = 100;

    google::SetLogDestination(google::GLOG_ERROR, "/tmp/glog/ERROR_");
    google::SetLogDestination(google::GLOG_WARNING, "");
    google::SetLogDestination(google::GLOG_INFO, "");

    FLAGS_colorlogtostderr = false;
    FLAGS_alsologtostderr = false;
    google::SetStderrLogging(google::GLOG_FATAL);


    google::EnableLogCleaner(1);

    auto enableLogCleanStart = system_clock::now();

    for (uint16_t i = 0; i < 50000; i++)
    {
        LOG(ERROR)<<"glog error";
    }
    
    auto enableLogCleanEnd = system_clock::now();

    std::chrono::duration<double, std::milli> enableLogCleanDuration = enableLogCleanEnd - enableLogCleanStart;

    cout<<"cost: "<<enableLogCleanDuration.count()<<" ms when enable  log clean"<<endl;


    google::DisableLogCleaner();

    auto disableLogCleanStart = system_clock::now();

    for (uint16_t i = 0; i < 50000; i++)
    {
        LOG(ERROR)<<"glog error";
    }

    auto disableLogCleanEnd = system_clock::now();

    std::chrono::duration<double, std::milli> disableLogCleanDuration = disableLogCleanEnd - disableLogCleanStart;

    cout<<"cost: "<<disableLogCleanDuration.count()<<" ms when disable log clean"<<endl;

    return 0;
}

Result

cost: 518.541 ms when enable  log clean
cost: 225.845 ms when disable log clean

conclusion

When there are 1 file in log directories and log cleaner is opened, the log performance will decrease 230% comparing with log cleaner is closed.

@aesophor
Copy link
Contributor

aesophor commented Dec 16, 2021

I believe that the overhead (518.541 ms) is because:

  • you write 50,000 times, each time triggers the log cleaner.
  • each time the log cleaner takes approx 0.01037 ms to run.
  • 518.541 ms / 50000 = 0.01037 ms

If your application needs to write so frequently but it's performance critical at the same time, then you can use another thread to occasionally perform cleanup. You are also welcome to submit a PR and improve it.

@aesophor
Copy link
Contributor

aesophor commented Dec 16, 2021

A possible optimization strategy: use CycleClock_Now() to maintain an additional timer such as next_cleanup_time_ .

@kimi20071025
Copy link
Author

kimi20071025 commented Dec 16, 2021

you can use another thread to occasionally perform cleanup

This is what I already do, But the date race of enabled_ should be avoid as I said at the begin of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants