Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear selected pair on ICE failed #612

Merged
merged 1 commit into from
Sep 2, 2023
Merged

Clear selected pair on ICE failed #612

merged 1 commit into from
Sep 2, 2023

Conversation

boks1971
Copy link
Contributor

@boks1971 boks1971 commented Sep 2, 2023

When ICE times out and fails, all candidates are deleted.

That means all the candidates are closed and their underlying conns are clsoed.

But, the selected pair could still be valid. On a subsequenct Write, ICE transport conn will get the selected pair and write to the pair. As the pair is still valid, write will flow through to the local candidate writeTo.

But, as all candidates and their underlying conns are closed, Write will return a io.ErrClosedPipe error.

There are cases where it is not ignored and causes a broken pipe after an ICERestart.

When the Write error propagates back to sctp/association, the writeLoop is exited.

So, sending data channel traffic after a successful ICERestart still fails as the SCTP association errored out and write loop exited.

I have copied the changes that are done when ICERestart happens to when ICE state is set to failed (except for gathering state and resetting ufrag/pwd). In my testing, it is working well, i. e. can continue data channel after ICE Restart whereas previously it was failing every time. But, I am not sure of all the implications of this change.

@codecov
Copy link

codecov bot commented Sep 2, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

❗ Current head 17c38f0 differs from pull request most recent head 54ae7ea. Consider uploading reports for the commit 54ae7ea to get more accurate results

Files Changed Coverage
agent.go 100.00%

📢 Thoughts on this report? Let us know!.

When ICE times out and fails, all candidates are deleted.

That means all the candidates are closed and their underlying
conns are clsoed.

But, the selected pair could still be valid. On a subsequenct
`Write`, ICE transport conn will get the selected pair and
write to the pair. As the pair is still valid, write will
flow through to the local candidate `writeTo`.

But, as all candidates and their underlying conns are closed,
`Write` will return a `io.ErrClosedPipe` error.

There are cases where it is not ignored and causes a broken
pipe after an ICERestart.

When the `Write` error propagates back to sctp/association,
the writeLoop is exited.

So, sending data channel traffic after a successful ICERestart
still fails as the SCTP association errored out and write loop exited.

I have copied the changes that are done when ICERestart happens
to when ICE state is set to failed (except for gathering state
and resetting ufrag/pwd). In my testing, it is working well,
i. e. can continue data channel after ICE Restart whereas
previously it was failing every time. But, I am not sure of all
the implications of this change.

Update authors

Update AUTHORS.txt
@boks1971
Copy link
Contributor Author

boks1971 commented Sep 2, 2023

To elaborate more on my concerns that I mentioned in the PR commit notes,

  • With most of the actions happening in the task queue, I could not quite understand all the interactions. Something like validateSelectedPair has a comment Note: the caller should hold the agent lock.. But, that method is also directly called from selector. Is all the locking/interaction getting disturbed by my changes?
  • Is it okay to reset checkList and pendingBindingRequests when ICE state changes to failed? I am guessing it is fine. But, would binding requests happen in that state and add back to pending? If it does, is it bad?
  • Don't think this is still air tight (not talking change in this PR only, but the ICERestart scenario also). Here is the sequence I am thinking about
    • transport.go conn WriteTo does getSelectedPair which is not going via task queue (quite rightly).
    • That could return a valid pair.
    • And then the ICERestart code or the newly added reset on failed could run and close the underlying conn.
    • After that, the transport write could run and see a closed connection and return the error io.ErrClosedPipe and that is the same thing as the root of this issue.

Agreed that the possibility of that happening is small, but it is still a possibility.

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

I don't see any issues with locking interactions. For the second concern, I think the expectation is that a restart is necessary after reaching failed state. so when ICE Restart is performed, the same logic will be executed again.

Agreed on the last one.. I guess that can be addressed separately since that race is not introduced by this PR.

@boks1971
Copy link
Contributor Author

boks1971 commented Sep 2, 2023

Merging this based on @davidzhao 's review + approval. If there is further feedback, I will address it in a subsequent PR.

@boks1971 boks1971 merged commit c62fd28 into master Sep 2, 2023
16 checks passed
@boks1971 boks1971 deleted the raja_ice_failed branch September 2, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants