Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tessera does not consistently connect to all peers in the network in Recovery mode #18

Merged
merged 2 commits into from
Jul 11, 2024

Conversation

john-sobrepena-partior
Copy link

@john-sobrepena-partior john-sobrepena-partior commented Jul 4, 2024

Fixes the Issue where Node is not able to consistently connect to peers during recovery mode

In recovery mode, Tessera waits for 10 seconds before initiating the main recovery procedure. This 10-second value is currently hard-coded. However, if a peer's partyInfoInterval exceeds 10 seconds, the node in recovery may not provide sufficient time for peers to inform it about the peer's party info. Consequently, there is a risk that the recovery node might consider the peer inactive.

To address this issue, the 10-second wait time should be adjusted dynamically to twice the partyInfoInterval. This adjustment ensures that recovery nodes and their peers have adequate time to synchronize their partyInfo, enhancing the accuracy and reliability of the recovery process.

Fixed Issue(s)

  • SET-530

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

@john-sobrepena-partior john-sobrepena-partior marked this pull request as ready for review July 5, 2024 02:57
LOGGER.info("Waiting for nodes to synchronise with peers");
Thread.sleep(10000);
final var waitTimeBeforeRecoveryStartsInMillis =
intervalPropertyHelper.partyInfoInterval() * 2L;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this portion is grey, party info of another node might be scheduled at a different interval from your own tessera

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. Since the startup delay is now configurable, the recovery node should set its partyInfoInterval to the maximum partyInfoInterval of other peers. This will allow the recovery node to have enough time to wait for all active peers to sync up with it.

@krishnan-narayanan-partior krishnan-narayanan-partior merged commit 50e4acd into main Jul 11, 2024
16 checks passed
@krishnan-narayanan-partior krishnan-narayanan-partior deleted the hotfix/set530_fix_recovery_mode branch July 11, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants