Clear selected pair on ICE failed #612

boks1971 · 2023-09-02T09:50:42Z

When ICE times out and fails, all candidates are deleted.

That means all the candidates are closed and their underlying conns are clsoed.

But, the selected pair could still be valid. On a subsequenct Write, ICE transport conn will get the selected pair and write to the pair. As the pair is still valid, write will flow through to the local candidate writeTo.

But, as all candidates and their underlying conns are closed, Write will return a io.ErrClosedPipe error.

There are cases where it is not ignored and causes a broken pipe after an ICERestart.

When the Write error propagates back to sctp/association, the writeLoop is exited.

So, sending data channel traffic after a successful ICERestart still fails as the SCTP association errored out and write loop exited.

I have copied the changes that are done when ICERestart happens to when ICE state is set to failed (except for gathering state and resetting ufrag/pwd). In my testing, it is working well, i. e. can continue data channel after ICE Restart whereas previously it was failing every time. But, I am not sure of all the implications of this change.

codecov · 2023-09-02T09:51:44Z

Codecov Report

Patch coverage is 100.00% of modified lines.

❗ Current head 17c38f0 differs from pull request most recent head 54ae7ea. Consider uploading reports for the commit 54ae7ea to get more accurate results

Files Changed	Coverage
agent.go	`100.00%`

📢 Thoughts on this report? Let us know!.

When ICE times out and fails, all candidates are deleted. That means all the candidates are closed and their underlying conns are clsoed. But, the selected pair could still be valid. On a subsequenct `Write`, ICE transport conn will get the selected pair and write to the pair. As the pair is still valid, write will flow through to the local candidate `writeTo`. But, as all candidates and their underlying conns are closed, `Write` will return a `io.ErrClosedPipe` error. There are cases where it is not ignored and causes a broken pipe after an ICERestart. When the `Write` error propagates back to sctp/association, the writeLoop is exited. So, sending data channel traffic after a successful ICERestart still fails as the SCTP association errored out and write loop exited. I have copied the changes that are done when ICERestart happens to when ICE state is set to failed (except for gathering state and resetting ufrag/pwd). In my testing, it is working well, i. e. can continue data channel after ICE Restart whereas previously it was failing every time. But, I am not sure of all the implications of this change. Update authors Update AUTHORS.txt

boks1971 · 2023-09-02T12:22:50Z

To elaborate more on my concerns that I mentioned in the PR commit notes,

With most of the actions happening in the task queue, I could not quite understand all the interactions. Something like validateSelectedPair has a comment Note: the caller should hold the agent lock.. But, that method is also directly called from selector. Is all the locking/interaction getting disturbed by my changes?
Is it okay to reset checkList and pendingBindingRequests when ICE state changes to failed? I am guessing it is fine. But, would binding requests happen in that state and add back to pending? If it does, is it bad?
Don't think this is still air tight (not talking change in this PR only, but the ICERestart scenario also). Here is the sequence I am thinking about
- transport.go conn WriteTo does getSelectedPair which is not going via task queue (quite rightly).
- That could return a valid pair.
- And then the ICERestart code or the newly added reset on failed could run and close the underlying conn.
- After that, the transport write could run and see a closed connection and return the error io.ErrClosedPipe and that is the same thing as the root of this issue.

Agreed that the possibility of that happening is small, but it is still a possibility.

davidzhao

This looks good to me!

I don't see any issues with locking interactions. For the second concern, I think the expectation is that a restart is necessary after reaching failed state. so when ICE Restart is performed, the same logic will be executed again.

Agreed on the last one.. I guess that can be addressed separately since that race is not introduced by this PR.

boks1971 · 2023-09-02T17:29:35Z

Merging this based on @davidzhao 's review + approval. If there is further feedback, I will address it in a subsequent PR.

boks1971 force-pushed the raja_ice_failed branch from 17c38f0 to 54ae7ea Compare September 2, 2023 10:02

boks1971 requested review from stv0g, davidzhao, Sean-Der and cnderrauber September 2, 2023 10:06

davidzhao approved these changes Sep 2, 2023

View reviewed changes

boks1971 merged commit c62fd28 into master Sep 2, 2023
16 checks passed

boks1971 deleted the raja_ice_failed branch September 2, 2023 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear selected pair on ICE failed #612

Clear selected pair on ICE failed #612

boks1971 commented Sep 2, 2023 •

edited

Loading

codecov bot commented Sep 2, 2023 •

edited

Loading

boks1971 commented Sep 2, 2023 •

edited

Loading

davidzhao left a comment

boks1971 commented Sep 2, 2023

Clear selected pair on ICE failed #612

Clear selected pair on ICE failed #612

Conversation

boks1971 commented Sep 2, 2023 • edited Loading

codecov bot commented Sep 2, 2023 • edited Loading

Codecov Report

boks1971 commented Sep 2, 2023 • edited Loading

davidzhao left a comment

Choose a reason for hiding this comment

boks1971 commented Sep 2, 2023

boks1971 commented Sep 2, 2023 •

edited

Loading

codecov bot commented Sep 2, 2023 •

edited

Loading

boks1971 commented Sep 2, 2023 •

edited

Loading