Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs mainly caused by snapshot #118

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Commits on Aug 27, 2021

  1. small fix

    .gitignore: *.a *.so *.gcda .hypothesis *.gcov CLinkedListQueue/ tests/main_test.c tests_main python
    make clean: GCOV_OUTPUT tests.c CLinkedListQueue .hypothesis
    Makefile CFLAGS: add override keyword to override cmd assignment
    virtraft2.py: remove namedtuple verbose argument (removed since python3.7)
    tangruize committed Aug 27, 2021
    Configuration menu
    Copy the full SHA
    0030906 View commit details
    Browse the repository at this point in the history
  2. add testcases

    test_cluster.h: virtual network, raft controller, fault injection, and invariant checking functions
    test_cluster.c: testcases that violate invariants (constructed from TLA+ traces)
    test_cluster_more.c: generated testcases
    tangruize committed Aug 27, 2021
    Configuration menu
    Copy the full SHA
    fed1fa6 View commit details
    Browse the repository at this point in the history
  3. Fix bugs found by TLA+.

    1. (critical) raft_recv_appendentries: in the case ae->prev_log_idx==0 and
    log has been compacted, causing logs that have been compacted appended to log
    again (further leading to log inconsistency, such as committed log not
    replicated to majority servers).
    
    2. raft_recv_appendentries_response: next_idx can be decreased to equal match
    idx, making matched idx retransmitted
    
    3. (critical) raft_send_appendentries: empty AE can be sent while it has log
    entries to send, causing the Follower to update the commit idx incorrectly
    (further leading to log inconsistency, such as committed log being rolled back)
    
    4. (critical) raft_begin_load_snapshot: log may mismatch. It is necessary to
    load the snapshot in this case. (causing the server lagged behind until
    receiving next snapshot)
    
    5. (critical) raft_begin_load_snapshot: changes current_term directly, which is
    dangerous. And can lead to two leaders in one term.
    
    6. (critical) raft_recv_appendentries: log at prev_log_idx may have been
    compacted. In this case, compacted log should be treated as committed logs and
    remaining matching logs should be appended to log.
    
    7. (critical) raft_begin_load_snapshot memory leak. The pointer reference of
    me->nodes[0] is definitely lost if me->nodes[0] is not the me node
    
    8. raft_send_appendentries_all returns immediately if raft_send_appendentries
    returns a non-zero value. However, the return value maybe
    RAFT_ERR_NEEDS_SNAPSHOT. It seems that sending AE to different servers should
    be irrelevant. (plus bug 4, it can cause the entire cluster to fail to progress)
    
    9. (critical) raft_get_last_log_term current_idx maybe a snapshot, in this
    condition, returning 0 causing other servers not grant vote. If all server in
    the cluster's log are compacted, then no Leader can be elected.
    tangruize committed Aug 27, 2021
    Configuration menu
    Copy the full SHA
    0159f41 View commit details
    Browse the repository at this point in the history
  4. Fix tests.

    test_server.c: ae.prev_log_term should be 0 if ae.prev_log_idx is 0.
    
    TestRaft_follower_load_from_snapshot committed entry confict
    
    TestRaft_follower_load_from_snapshot_does_not_break_cluster_safety
    log mismatch
    
    TestRaft_leader_sends_snapshot_when_node_next_index_was_compacted
    should send snapshot if next idx was compacted
    
    TestRaft_leader_not_send_appendentries_when_snapshotted
    should send snapshot
    
    TestRaft_leader_sends_snapshot_if_log_was_compacted
    it is ok to send snapshot when request timeout
    tangruize committed Aug 27, 2021
    Configuration menu
    Copy the full SHA
    9d03b48 View commit details
    Browse the repository at this point in the history