[Bug]: Using `futures_executor::block_on` blocks forever in WASM #2047

mendess · 2024-08-23T14:46:35Z

What happened?

Using this library in WASM causes the program to block forever and use 100% CPU, due to the usage of futures_executor::block_on

API Version

0.24.0

SDK Version

0.24.1

What Exporter(s) are you seeing the problem on?

N/A

Using futures_executor::block_on is harmful to performance and correctness.

In single-threaded contexts, blocking in place leads to blocking the entire runtime forever, and block_on causes that thread to busy loop until the future resolves, using 100% of CPU.

In a multithreaded environment it is also harmful as it blocks an entire thread, and if that thread happens to have other tasks scheduled on it, this will cause those tasks to be starved of resources, worse still if the future the thread is blocking on is waiting for a task that's scheduled on that same thread via a mechanism such as the tokio LIFO slot, leading to a deadlock.

See this draft PR as a first attempt to address this problem.

In order to make this work for WASM I introduced a spawn_future function, which does the right thing in WASM but continues using the bad futures_executor::block_on function in other environments. With this patch I stopped having problems in the WASM project I'm working on, thus proving that this was the root of the issue.

Next I made some more methods async. The trait methods I made async are:

LogProcessor::emit
LogProcessor::force_flush
LogProcessor::shutdown
SpanProcessor::on_end
SpanProcessor::force_flush
SpanProcessor::shutdown

This avoids having to call spawn_future inside these functions. I couldn't however avoid calling this inside the Drop implementations that needed to call SpanProcessor::shutdown, as well as, inside periodic_reader.rs because I was unsure as to how best to proceed. So there are still some open questions.

The text was updated successfully, but these errors were encountered:

cijothomas · 2024-08-23T15:38:05Z

Thanks for opening the issue with detailed analysis and the PR! I left a comment on the PR https://github.com/open-telemetry/opentelemetry-rust/pull/2048/files#r1729165305 .

We need to see what is the right way to execute all the backgrounds tasks (batch exporting, metric reader). @ThomsonTan was also looking into this. Tagging @TommyCpp and @lalitb also to share ideas/concerns.

cijothomas · 2024-09-18T00:02:50Z

In a multithreaded environment it is also harmful as it blocks an entire thread, and if that thread happens to have other tasks scheduled on it, this will cause those tasks to be starved of resources, worse still if the future the thread is blocking on is waiting for a task that's scheduled on that same thread via a mechanism such as the tokio LIFO slot, leading to a deadlock.

We are exploring a design where OTel will have its own dedicated thread, where these issues are going to be avoided. Do you consider it an issue, if Otel does a blocking call in its own background thread? That thread is OTel's own, and won't have any other tasks scheduled on it.

demurgos · 2024-09-29T22:38:54Z

I keep hitting the same issues on regular x86_64 linux builds when using Tokio. I am not entirely sure what's causing this but all runtime.block_on calls hang. The presence of blocking calls seems to make most of the API brittle.

mzabaluev · 2024-10-16T11:04:39Z

Yes, block_on is problematic also in async contexts run by Tokio. This minimized test blocks:

use opentelemetry_sdk::{runtime, testing::trace::NoopSpanExporter, trace::TracerProvider};

#[tokio::test]
async fn shutdown_in_async_scope() {
    let provider = TracerProvider::builder()
        .with_batch_exporter(NoopSpanExporter::new(), runtime::Tokio)
        .build();

    provider.shutdown().unwrap();
}

One workaround is to call shutdown in a worker thread:

#[tokio::test]
async fn shutdown_in_tokio_worker() {
    let provider = TracerProvider::builder()
        .with_batch_exporter(NoopSpanExporter::new(), runtime::Tokio)
        .build();

    tokio::task::spawn_blocking(move || provider.shutdown())
        .await
        .unwrap()
        .unwrap();
}

mendess added bug Something isn't working triage:todo Needs to be traiged. labels Aug 23, 2024

mendess mentioned this issue Aug 23, 2024

[WIP] Remove usage of futures_executor::block_on in WASM targets #2048

Open

3 tasks

OliverEvans96 mentioned this issue Sep 2, 2024

[WIP] Remove Send bounds on wasm32 #2074

Open

4 tasks

This was referenced Oct 16, 2024

[Bug]: Shutdown doesn't flush when used with global subscriber #1961

Open

Adding tracing_opentelemetry::layer to tracing_subscriber::registry breaks span exporting to Jaeger tokio-rs/tracing-opentelemetry#159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Using `futures_executor::block_on` blocks forever in WASM #2047

[Bug]: Using `futures_executor::block_on` blocks forever in WASM #2047

mendess commented Aug 23, 2024 •

edited

Loading

cijothomas commented Aug 23, 2024

cijothomas commented Sep 18, 2024

demurgos commented Sep 29, 2024

mzabaluev commented Oct 16, 2024 •

edited

Loading

[Bug]: Using futures_executor::block_on blocks forever in WASM #2047

[Bug]: Using futures_executor::block_on blocks forever in WASM #2047

Comments

mendess commented Aug 23, 2024 • edited Loading

What happened?

API Version

SDK Version

What Exporter(s) are you seeing the problem on?

cijothomas commented Aug 23, 2024

cijothomas commented Sep 18, 2024

demurgos commented Sep 29, 2024

mzabaluev commented Oct 16, 2024 • edited Loading

[Bug]: Using `futures_executor::block_on` blocks forever in WASM #2047

[Bug]: Using `futures_executor::block_on` blocks forever in WASM #2047

mendess commented Aug 23, 2024 •

edited

Loading

mzabaluev commented Oct 16, 2024 •

edited

Loading