Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appender may caused deadlock on AsyncLoggerDisruptor #2893

Open
quaff opened this issue Aug 28, 2024 · 8 comments
Open

Appender may caused deadlock on AsyncLoggerDisruptor #2893

quaff opened this issue Aug 28, 2024 · 8 comments
Labels
async Affects asynchronous loggers or appenders waiting-for-user More information is needed from the user

Comments

@quaff
Copy link
Contributor

quaff commented Aug 28, 2024

Description

Log operation is blocked when the RingBuffer is full, then waiting for appender to consume LogEvent's from the RingBuffer, but the appender itself may operate logging before the consumption to trigger deadlock.

Here is my actual case in production, I'm using Kafka appender, the Kafka client logging caused deadlock, see TransactionManager.java

"kafka-producer-network-thread | producer-1" #44 daemon prio=5 os_prio=0 tid=0x00007f468310c800 nid=0x256255 waiting for monitor entry [0x00007f467ad6c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:246)
    - waiting to lock <0x0000000701070490> (a java.lang.Object)
    at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
    at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
    at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:202)
    at org.apache.logging.log4j.core.async.AsyncLogger.access$100(AsyncLogger.java:67)
    at org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:157)
    at org.apache.logging.log4j.core.async.AsyncLogger.logMessage(AsyncLogger.java:130)
    at org.apache.logging.slf4j.Log4jLogger.log(Log4jLogger.java:378)
    at org.apache.kafka.common.utils.LogContext$LocationAwareKafkaLogger.writeLog(LogContext.java:434)
    at org.apache.kafka.common.utils.LogContext$LocationAwareKafkaLogger.info(LogContext.java:387)
    at org.apache.kafka.clients.producer.internals.TransactionManager.setProducerIdAndEpoch(TransactionManager.java:505)
    at org.apache.kafka.clients.producer.internals.TransactionManager.bumpIdempotentProducerEpoch(TransactionManager.java:542)
    at org.apache.kafka.clients.producer.internals.TransactionManager.bumpIdempotentEpochAndResetIdIfNeeded(TransactionManager.java:559)
    - locked <0x00000007016c0898> (a org.apache.kafka.clients.producer.internals.TransactionManager)
    at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:330)
    at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:247)
    at java.lang.Thread.run(Thread.java:748) 
"http-nio-8080-exec-273" #3682 daemon prio=5 os_prio=0 tid=0x00007f465b422800 nid=0x2908d3 runnable [0x00007f4648b48000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
    at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
    at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
    at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
    at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
    at org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:247)
    - locked <0x0000000701070490> (a java.lang.Object)
    at org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
    at org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
    at org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
    at org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
    at org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
    at org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
    at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2168)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2122)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2105)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1997)
    at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1862)
    at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:185)

Proposal

The issue is not Kafka specified, log4j should discard LogEvent from appenders if the underlying RingBuffer is full.

@tcmot
Copy link

tcmot commented Aug 28, 2024

Even if it is not a Kafka client component , any other client component that integrates log4j2 has similar issues.
The client component that integrates log4j2 also produces log event.

Disable log4j2 for client component.
Increase the capacity of RingBuffer.

@quaff
Copy link
Contributor Author

quaff commented Aug 28, 2024

Disable log4j2 for client component.

Is there a way to disable logging for particular lib except increasing logging level threshold?

Increase the capacity of RingBuffer.

Not an option, it will be full soon or later.

@vy vy added bug Incorrect, unexpected, or unintended behavior of existing code async Affects asynchronous loggers or appenders labels Aug 28, 2024
@ppkarwasz
Copy link
Contributor

ppkarwasz commented Aug 28, 2024

Tip

TL;DR: Please set the log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull Log4j configuration property to false. See Property Sources for a list of ways to do it.

@quaff,

There are already a lot of configuration properties that regulate what happens when the queue becomes full:

Note that Log4j Core never blocks on the asynchronous thread used by AsyncLoggerDisruptor:

private boolean synchronizeEnqueueWhenQueueFull() {
return DisruptorUtil.ASYNC_LOGGER_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL
// Background thread must never block
&& backgroundThreadId != Thread.currentThread().getId()
// Threads owned by log4j are most likely to result in
// deadlocks because they generally consume events.
// This prevents deadlocks between AsyncLoggerContext
// disruptors.
&& !(Thread.currentThread() instanceof Log4jThread);
}

Your problem arises, because the Kafka appender uses its own background thread to send log events and Log4j Core does not know that it should not block on that thread. You have several solutions to this problem:

  • You can simply tell Log4j Core not to block on any thread by setting log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull to false.
  • You can stop using asynchronous loggers with the Kafka Appender: IMHO it doesn't make sense to have two asynchronous barriers that log events must pass before being logged.
  • You can use a synchronous logger for org.apache.kafka and an asynchronous logger for the rest (see Mixed sync/async loggers):
    <AsyncRoot level="INFO">
      <AppenderRef ref="KAFKA"/>
    </AsyncRoot>
    <Logger name="org.apache.kafka">
      <AppenderRef ref="CONSOLE"/>
    </Logger>

Warning

To switch from the "All asynchronous loggers" to the "Mixed sync/async loggers", you need to restore the default value of the log4j2.contextSelector configuration property.

@vy
Copy link
Member

vy commented Aug 28, 2024

@quaff, @ppkarwasz is spot on with his analysis and suggestions. My 2 cents are...

  1. Stick to the simplest setup (i.e., only use synchronous loggers) unless you can prove that this causes a significant performance regression for your application.
  2. Kafka Appender is planned to be removed. You should consider more future-proof alternatives.

@vy vy added waiting-for-user More information is needed from the user and removed bug Incorrect, unexpected, or unintended behavior of existing code waiting-for-maintainer labels Aug 28, 2024
@quaff
Copy link
Contributor Author

quaff commented Aug 28, 2024

Thanks for your quick response.
Currently I'm setting log4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector, and using two type of appenders, RollingFile for local file and Kafka for centralized log collection, I think Mixed sync/async is suitable for such use case that make RollingFile async and Kafka sync, please correct me if I'm wrong.

@quaff
Copy link
Contributor Author

quaff commented Aug 28, 2024

log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull to false.

The correct system property key is AsyncLogger.SynchronizeEnqueueWhenQueueFull.

@ppkarwasz
Copy link
Contributor

Thanks for your quick response. Currently I'm setting log4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector, and using two type of appenders, RollingFile for local file and Kafka for centralized log collection, I think Mixed sync/async is suitable for such use case that make RollingFile async and Kafka sync, please correct me if I'm wrong.

Using RollingFile asynchronously for the org.apache.kafka package and Kafka synchrounously for the rest should work, although I would recommend to only use synchrounous loggers.

When you use asynchronous elements, you loose the ability of receiving LoggingExceptions if an appender fails. If logging is important for your application, consider using a synchronous setup with a Failover appender that uses Kafka as primary and Rollingfile as failover.

@ppkarwasz
Copy link
Contributor

log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull to false.

The correct system property key is AsyncLogger.SynchronizeEnqueueWhenQueueFull.

That is the legacy pre-2.10 name of the configuration property. While it still works for backward compatibility, AsyncLogger.SynchronizeEnqueueWhenQueueFull is deprecated and you should use

  • log4j2.asyncLoggerSynchronizeEnqueueWhenQueueFull if you use Java System properties or a log4j2.component.properties file
  • LOG4J_ASYNC_LOGGER_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL if you want to set an OS environment variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async Affects asynchronous loggers or appenders waiting-for-user More information is needed from the user
Projects
None yet
Development

No branches or pull requests

4 participants