Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleExporter is not useable with OTLP/gRPC #2188

Open
cijothomas opened this issue Oct 9, 2024 · 12 comments
Open

SimpleExporter is not useable with OTLP/gRPC #2188

cijothomas opened this issue Oct 9, 2024 · 12 comments
Assignees

Comments

@cijothomas
Copy link
Member

cijothomas commented Oct 9, 2024

SimpleExporter is not useable with OTLP/gRPC (Tonic client used for gRPC)
Minimal repro:

use tracing::info;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::Registry;
use tracing_subscriber::util::SubscriberInitExt;
use opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge;
use opentelemetry_otlp::WithExportConfig;

fn main() {
    let logger_provider =  opentelemetry_otlp::new_pipeline()
    .logging()
    .with_exporter(
        opentelemetry_otlp::new_exporter()
            .tonic()
            .with_endpoint("http://localhost:4317"),
    )
    .install_simple().unwrap();

    // Create a new OpenTelemetryTracingBridge using the above LoggerProvider.
    let layer = OpenTelemetryTracingBridge::new(&logger_provider);
    tracing_subscriber::registry()
        .with(layer)
        .init();

    info!(name: "my-event", target: "my-target", "hello from {}. My price is {}", "apple", 1.99);

    let _ = logger_provider.shutdown();
}

Panics with

there is no reactor running, must be called from the context of a Tokio 1.x runtime

If making the main as tokio::main, then export hangs forever.

Need to modify exporters in such a way that they don't require async runtimes, and can do normal blocking calls (assuming there are APIs to do that).

@cijothomas cijothomas changed the title SimpleExporter is not useable with OTLP SimpleExporter is not useable with OTLP/gRPC Oct 9, 2024
@RichardChukwu
Copy link

is this open to be worked on @cijothomas ?

@lalitb
Copy link
Member

lalitb commented Oct 16, 2024

@RichardChukwu thanks for volunteering to fix the issue. Do you have any approach in mind - if we can discuss it here? The tonic crate used for gRPC seems to require the runtime in all scenarios, while SimpleExporter is supposed to work with the futures crate.

p.s. - @cijothomas is traveling so there could be a delay in response, assigning it to you for now.

@mzabaluev
Copy link

mzabaluev commented Oct 16, 2024

The problem is that libraries in the exporter network stack enter tracing spans of their own (normally on debug or tracing level), these go to the global subscriber, and with the simple exporter this leads to a worker thread calling OpenTelemetry tracer while a lock is held in the exporter by the main thread.
Try to prevent the reentrancy in tracing by filtering the OpenTelemetry layer with e.g.

use tracing_subscriber::filter;

let layer = OpenTelemetryTracingBridge::new(&logger_provider)
    .with_filter(
        filter::Targets::new().with_target("my-target", Level::INFO)
    );

@mzabaluev
Copy link

Panics with

there is no reactor running, must be called from the context of a Tokio 1.x runtime

Yes, opentelemetry-otlp tonic exporter requires a Tokio runtime context, regardless of the choice of simple vs. batch. And you likely want the batch processor with the networked exporter anyway.

@RichardChukwu
Copy link

RichardChukwu commented Oct 16, 2024

@mzabaluev would you say the Tokio runtime needs to be initialized properly to handle async tasks then?

@mzabaluev
Copy link

mzabaluev commented Oct 16, 2024

would you say the Tokio runtime needs to be initialized properly to handle async tasks then?

Yes, putting the initialization into the tokio runtime context will solve the panic.

@lalitb
Copy link
Member

lalitb commented Oct 16, 2024

@cijothomas I see you mentioned that exporter hangs with tokio::main. Was it because of the infinite loop scenario similar to #1745 , because otherwise the below code works fine for me:

fn init_logs() -> Result<opentelemetry_sdk::logs::LoggerProvider, LogError> {
    let tonic_exporter = opentelemetry_otlp::new_exporter()
        .tonic()
        .with_endpoint("http://localhost:4317");
    let res = opentelemetry_otlp::new_pipeline()
        .logging()
        .with_resource(RESOURCE.clone())
        .with_exporter(
            tonic_exporter,
        )
        .install_simple();
        //.install_batch(opentelemetry_sdk::runtime::Tokio);

        if let Err(ref err) = res {
            println!("Error initializing logs: {:?}", err);
        }
        res
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
    let logger_provider = init_logs().unwrap();
    // Create a new OpenTelemetryTracingBridge using the above LoggerProvider.
    let layer = OpenTelemetryTracingBridge::new(&logger_provider);
    let filter = EnvFilter::new("info")
        .add_directive("hyper=error".parse().unwrap())
        .add_directive("tonic=error".parse().unwrap())
        .add_directive("reqwest=error".parse().unwrap());

    tracing_subscriber::registry()
        .with(filter)
        .with(layer)
        .init();
    info!(name: "my-event", target: "my-target", "hello from {}. My price is {}", "apple", 1.99);

    logger_provider.shutdown()?;
}

@lalitb
Copy link
Member

lalitb commented Oct 16, 2024

Need to modify exporters in such a way that they don't require async runtimes, and can do normal blocking calls (assuming there are APIs to do that).

tonic has a hard dependency on the tokio runtime and it does not appear to support blocking gRPC calls. As @TommyCpp mentioned in last community meeting, we previously supported grpc-io, which does not rely on any runtime (it uses futures), but this was removed as part of #1534 due to the library's poor maintenance—the last release was over a year ago.

@cijothomas
Copy link
Member Author

The problem is that libraries in the exporter network stack enter tracing spans of their own (normally on debug or tracing level), these go to the global subscriber, and with the simple exporter this leads to a worker thread calling OpenTelemetry tracer while a lock is held in the exporter by the main thread. Try to prevent the reentrancy in tracing by filtering the OpenTelemetry layer with e.g.

use tracing_subscriber::filter;

let layer = OpenTelemetryTracingBridge::new(&logger_provider)
    .with_filter(
        filter::Targets::new().with_target("my-target")
    );

You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated.
Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.

@cijothomas
Copy link
Member Author

@cijothomas I see you mentioned that exporter hangs with tokio::main. Was it because of the infinite loop scenario similar to #1745

Yes exactly. Its deadlock caused by reentrancy in SimpleLogProcessor.

@mzabaluev
Copy link

mzabaluev commented Oct 17, 2024

You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.

I thought about that, but I doubt the tracing spans created in the exporter stack will always get the right Context. The thread where this happens may be a worker thread that doesn't have an inherited Context, and the OpenTelemetry layer does not have visibility on the causal relationships between export and any tracing caused by it.

@cijothomas
Copy link
Member Author

You are right! Added #2199 recently to prove this with tests, so the fix can be easily validated. Filtering is a temp/hack solution - this has to be solved using a SupressionFlag stored in Context.

I thought about that, but I doubt the tracing spans created in the exporter stack will always get the right Context. The OpenTelemetry layer may not have visibility on these due to whatever filtering the application has set up, and the thread where this happens may be a worker thread that doesn't have an inherited Context.

Spot on! That is the reason why solving the "Context" problem is important for OTel! It is non-trivial and @lalitb has done a lot of research into this, and came to same conclusion as you shared!

This is something we'll need to solve, most likely, after initial stable release of logs and metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants