Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread safety issue in HTTP exporters #481

Conversation

justinhporter
Copy link

We were seeing occasional crashes that were caused by pendingLogRecords being accessed from multiple threads. After making these changes, the crashes went away.

We were seeing occasional crashes that were caused by pendingLogRecords being accessed from multiple threads.
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 26, 2023

CLA Not Signed

Copy link
Member

@nachoBonafonte nachoBonafonte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but it would be preferred you used a Lock instead of a queue, as it is done in other exporters, I have added some code with the usage.

@@ -14,7 +14,8 @@ public func defaultOltpHttpLoggingEndpoint() -> URL {
public class OtlpHttpLogExporter : OtlpHttpExporterBase, LogRecordExporter {

var pendingLogRecords: [ReadableLogRecord] = []

let dispatchQueue = DispatchQueue(label: "OtlpHttpLogExporter Queue")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you imported Locks.swift file which is in other parts of the project (e.g in the sdk, but its methods are internal) and you used it instead of GCD queues like:

private let exporterLock = Lock()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that Lock is internal to OpenTelemetrySdk. Would you like me to make it public, or copy it to OpenTelemetryProtocolHttp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy in the exporter folder. Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 27 to 32
var sendingLogRecords: [ReadableLogRecord]!
dispatchQueue.sync {
pendingLogRecords.append(contentsOf: logRecords)
sendingLogRecords = pendingLogRecords
pendingLogRecords = []
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this you can use:

var sendingLogRecords: [ReadableLogRecord] = []
exporterLock.withLockVoid {
    pendingLogRecords.append(contentsOf: logRecords)
    sendingLogRecords = pendingLogRecords
    pendingLogRecords = []      
  }

let body = Opentelemetry_Proto_Collector_Logs_V1_ExportLogsServiceRequest.with { request in
request.resourceLogs = LogRecordAdapter.toProtoResourceRecordLog(logRecordList: sendingLogRecords)
}

var request = createRequest(body: body, endpoint: endpoint)
request.timeoutInterval = min(explicitTimeout ?? TimeInterval.greatestFiniteMagnitude , config.timeout)
httpClient.send(request: request) { [weak self] result in
guard let self = self else { return }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong, we would ideally like to know if there was an error, which now is not printing anymore

switch result {
case .success(_):
break
case .failure(let error):
self?.pendingLogRecords.append(contentsOf: sendingLogRecords)
self.dispatchQueue.sync { [weak self] in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the exporterLock.withLockVoid

@@ -52,7 +59,10 @@ public class OtlpHttpLogExporter : OtlpHttpExporterBase, LogRecordExporter {

public func flush(explicitTimeout: TimeInterval? = nil) -> ExportResult {
var exporterResult: ExportResult = .success

var pendingLogRecords: [ReadableLogRecord]!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with the exporterLock.withLockVoid

@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

Attention: 42 lines in your changes are missing coverage. Please review.

Files Coverage Δ
...penTelemetryProtocolCommon/trace/SpanAdapter.swift 99.00% <100.00%> (ø)
...etryProtocolGrpc/trace/OtlpTraceJsonExporter.swift 88.57% <100.00%> (ø)
...lemetryProtocolHttp/logs/OtlpHttpLogExporter.swift 67.64% <100.00%> (+4.93%) ⬆️
...ryProtocolHttp/metric/OltpHTTPMetricExporter.swift 66.12% <100.00%> (+6.51%) ⬆️
...ocolHttp/metric/StableOtlpHTTPMetricExporter.swift 73.33% <100.00%> (+4.10%) ⬆️
...etryProtocolHttp/trace/OtlpHttpTraceExporter.swift 60.00% <100.00%> (+6.66%) ⬆️
...ers/OpenTelemetryProtocolHttp/Internal/Locks.swift 40.00% <40.00%> (ø)

📢 Thoughts on this report? Let us know!.

@nachoBonafonte
Copy link
Member

You will have to sign the easyCLA before the PR can be merged.

@nachoBonafonte
Copy link
Member

Any updates about the CLA @justinhporter ?

@justinhporter
Copy link
Author

Any updates about the CLA @justinhporter ?

We're working on getting it signed.

@nachoBonafonte
Copy link
Member

Any updates @justinhporter ?

@justinhporter
Copy link
Author

Any updates @justinhporter ?

Sorry, it's taking longer than I expected to get sign off from our legal department. People are out this week, but I'll try again after Thanksgiving.

@bryce-b
Copy link
Member

bryce-b commented Dec 7, 2023

@justinhporter any progress on getting the CLA signed? If you can't get it done soon, we'll have to recreate the pr to get this fixed.

@justinhporter
Copy link
Author

@justinhporter any progress on getting the CLA signed? If you can't get it done soon, we'll have to recreate the pr to get this fixed.

Still trying, but at this point feel free to recreate this PR.

@bryce-b bryce-b closed this Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants