-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Cleanup Global Error handler #2181
base: main
Are you sure you want to change the base?
Conversation
@@ -146,7 +145,7 @@ impl Drop for LoggerProviderInner { | |||
fn drop(&mut self) { | |||
for processor in &mut self.processors { | |||
if let Err(err) = processor.shutdown() { | |||
global::handle_error(err); | |||
otel_error!(name: "LoggerProviderInner.Drop", otel_name = "LoggerProviderInner.Drop", error = format!("{:?}", err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any warning/error level message must be user understandable and should not refer to internal constructs like 'Inner'.
name:LoggerProvideShutdownOnDropFailure
msg = "LoggerProvider shutdown() failed when invoked from Drop"
error = "error"
Its okay to handle this in a follow up - so this PR can focus on replacing global error handler with internal logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have name to follow pattern like:
name = <Module>.<Funtion>.<Action>
So in this case it would be LoggerProvider.Drop.Error
or LoggerProvider.Drop.ShutdownError
. The actual error message would be contained in the error attribute
. And if required, more attributes can be there for specific field values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module.func.action pattern is good pattern to follow, for debug logging that owners of this crate use to debug. I don't think we can have such a pattern for user-actionable ones, as they don't need to know which module/component is triggering this error.
@@ -175,7 +174,7 @@ impl<R: RuntimeChannel> LogProcessor for BatchLogProcessor<R> { | |||
name: "batch_log_processor_emit_error", | |||
error = format!("{:?}", err) | |||
); | |||
global::handle_error(LogError::Other(err.into())); | |||
otel_error!(name: "BatchLogProcessor.emit", otel_error = "BatchLogProcessor.emit", error = format!("{:?}", LogError::Other(err.into()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to re-visit this, as this can flood if channel is full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be discussed whether it's necessary to handle these errors internally by implementing logic to prevent log flooding. This could involve using an "already-sent" flag and tracking the last sent timestamp to reduce the frequency of repeated error messages.
Or else, let the user handle this flooding by adding a filter logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or else, let the user handle this flooding by adding a filter logic.
OTel SDK, by default, should not burden the user with such requirements!
Few thoughts:
- If the item is being dropped due to buffer being full, emit a log only once in the app lifetime. If item is dropped for other unknown reasons, it maybe fine to emit log for each occurrence. I prefer that this should be a simple bool flag only and not a sophisticated throttling mechanism.
- We should use an internal metric for items dropped. This may require more refactoring/designs.
- Keep the count of items dropped (using atomic uint etc.). During shutdown/drop, if that count is greater than zero, emit a warning log "N items were dropped due to buffer full".
All of these are good for future additions, so no need to tackle in this PR.
My only minimum bar for RC release - do not log for each time item is dropped due to buffer is full. (We had similar issue for Metrics overflow that was fixed recently)
@@ -47,40 +43,3 @@ impl<T> From<PoisonError<T>> for Error { | |||
Error::Other(err.to_string()) | |||
} | |||
} | |||
|
|||
struct ErrorHandler(Box<dyn Fn(Error) + Send + Sync>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to split the PR into smaller ones -
- Review and replace global handler with error! internal logs, one crate or even small scope in one PR
- When we are done with all crates, removing the global handler can be a simple PR.
- Revisit each internal log - if it is not user actionable, and only good for sdk owners, then it should be debug level.
@@ -50,12 +50,24 @@ macro_rules! otel_warn { | |||
{ | |||
tracing::warn!(name: $name, target: env!("CARGO_PKG_NAME"), ""); | |||
} | |||
#[cfg(not(feature = "internal-logs"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is best left for Error level only.
@@ -55,7 +55,9 @@ impl SpanRef<'_> { | |||
if let Some(ref inner) = self.0.inner { | |||
match inner.lock() { | |||
Ok(mut locked) => f(&mut locked), | |||
Err(err) => global::handle_error(err), | |||
Err(_err) => { | |||
otel_error!(name: "SpanRef.with_inner_mut", otel_name = "SpanRef.with_inner_mut", error = format!("{:?}", _err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of repeating otel_name, lets do it in the wrapper itself for now, until tracing:fmt
is fixed to display event name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I thought so, but then this will force this behavior for anyone using these macros in their custom exporters or processors. Thought not a big reason for not adding it in macros :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will force this behavior for anyone using these macros in their custom exporters or processors.
That seems okay. And when fmt
is fixed, we can just fix the wrapper, instead of asking everyone to change their code.
Fixes #2175
Changes
In Progress. TODO:
<signal>Error
,CoW
and few other types inside otel macros. As of now, the called needs to useformat!
to invoke their Debug impl, which is cumbersome.For future improvement:
Please provide a brief description of the changes here.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes