Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incentives: cache top online accounts and use when building AbsentParticipationAccounts #6085

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

cce
Copy link
Contributor

@cce cce commented Jul 26, 2024

Summary

In #5757 a mechanism was introduced to suspend "absentee" accounts that don't participate (by making a proposal, or heartbeat as in #5799), by adding a block header AbsentParticipationAccounts, similar to ExpiredParticipationAccounts.

Currently, the list is generated by considering any account touched by a transaction in the current block, since this data is readily available at endOfBlock(). This PR adds a periodically-updated cache of top online accounts to the ledger, to find additional online accounts not mentioned in the current block.

All of these tracked addresses will now be checked for absentee or expired status each round. To get a recent list of top online accounts, this PR uses recent work done by the votersTracker and state proof worker. (Every 256 rounds, the state proof system performs a TopOnlineAccounts query.) This adds access to the votersTracker to fetch the most recent list of top online addresses, and for each address looks up the latest round's data from the online account cache.

LastProposed and LastHeartbeat are added to the online accounts table's DB representation in this PR. This also fixes an issue introduced in #5965 where uses of ledgercore.OnlineAccountData (which didn't have LastHeartbeat/LastProposed fields) were replaced by basics.OnlineAccountData (which did) and ended up with those fields not being set in a couple of conversions from AccountData.

Test Plan

  • update TestAbsenteeChecks
  • update TestExpiredAccountGeneration
  • maybe update TestExpiredAccountGenerationWithDiskFailure?
  • update TestAbsentTracking
  • added new TestLatestCompletedVotersUpTo
  • update TestAbsenteeChallenges
  • update TestEvalFunctionForExpiredAccounts
  • disable Payouts for state proof E2E test TestTotalWeightChanges (TODO return later after auto-heartbeating added, to ensure the accounts aren't suspended)
  • update test/e2e-go/features/incentives/suspension_test.go (TODO return later after heartbeats)

@cce cce force-pushed the track-incentive-candidates branch from 975ddb4 to 21db44d Compare July 26, 2024 17:35
ledger/ledgercore/onlineacct.go Outdated Show resolved Hide resolved
Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just a thought - why not to init top online on startup and then maintain the list in acctonline while processing incoming blocks?

@cce
Copy link
Contributor Author

cce commented Jul 27, 2024

Just a thought - why not to init top online on startup and then maintain the list in acctonline while processing incoming blocks?

My first approach was to make it a field in the onlineAccounts tracker like the ao.voters sub-tracker, but as I kept cutting it down to MVP and removed its access to the onlineAccounts.deltas, and took it out of being called in onlineAccounts.newBlockImpl(), I moved it out since it had no dependencies left. It felt like I kept adding duplicate state that was already being maintained in onlineAccounts, and duplicate logic for looking it up. So I wanted to try just the absolute minimal approach to start, by relying totally on the onlineAccounts tracker's caching system (which we already know works correctly) rather than add a new cached list of online accounts and delta-processing code..

cmd/tealdbg/localLedger.go Outdated Show resolved Hide resolved
ledger/eval/eval.go Outdated Show resolved Hide resolved
ledger/eval/prefetcher/prefetcher_alignment_test.go Outdated Show resolved Hide resolved
ledger/toponline.go Outdated Show resolved Hide resolved
ledger/eval/eval.go Show resolved Hide resolved
Copy link

codecov bot commented Aug 9, 2024

Codecov Report

Attention: Patch coverage is 77.57009% with 24 lines in your changes missing coverage. Please review.

Project coverage is 56.24%. Comparing base (619d257) to head (c558d59).
Report is 16 commits behind head on master.

Files with missing lines Patch % Lines
ledger/eval/eval.go 87.27% 4 Missing and 3 partials ⚠️
ledger/ledger.go 75.00% 2 Missing and 2 partials ⚠️
ledger/ledgercore/votersForRound.go 0.00% 4 Missing ⚠️
ledger/tracker.go 0.00% 3 Missing ⚠️
cmd/tealdbg/localLedger.go 0.00% 2 Missing ⚠️
daemon/algod/api/server/v2/dryrun.go 0.00% 2 Missing ⚠️
ledger/ledgercore/accountdata.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6085      +/-   ##
==========================================
+ Coverage   56.22%   56.24%   +0.02%     
==========================================
  Files         494      494              
  Lines       69954    70040      +86     
==========================================
+ Hits        39330    39394      +64     
- Misses      27947    27962      +15     
- Partials     2677     2684       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ledger/eval/eval.go Show resolved Hide resolved
ledger/eval/eval.go Show resolved Hide resolved
ledger/ledger.go Outdated Show resolved Hide resolved
Co-authored-by: John Jannotti <[email protected]>
@cce cce requested a review from algorandskiy August 21, 2024 15:22
ledger/ledger.go Outdated Show resolved Hide resolved
@cce cce marked this pull request as ready for review October 4, 2024 17:31
@@ -458,7 +458,7 @@ func TestOnlineAcctModelSimple(t *testing.T) {
})
// test same scenario on double ledger
t.Run("DoubleLedger", func(t *testing.T) {
m := newDoubleLedgerAcctModel(t, protocol.ConsensusFuture, true)
m := newDoubleLedgerAcctModel(t, protocol.ConsensusV39, true) // TODO simulate heartbeats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep this on future?

Copy link
Contributor Author

@cce cce Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails because heartbeats aren't implemented, but proposers aren't being set, so the big accounts are challenged and kicked offline, and all the stake numbers don't match the test expectations. I could have tried to fix this by ensuring all the test accounts show up as proposers as often as necessary to avoid suspension, but I thought maybe it would be better to see after heartbeats were implemented whether that would make the tests pass without as much modification.

@@ -47,6 +47,7 @@ type roundCowParent interface {
// lookup retrieves agreement data about an address, querying the ledger if necessary.
lookupAgreement(basics.Address) (basics.OnlineAccountData, error)
onlineStake() (basics.MicroAlgos, error)
knockOfflineCandidates() (map[basics.Address]basics.OnlineAccountData, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a NIT: should we actually call this top online accounts or similar naming? It's very clear from comments that's what we are requesting, more a debate over if the name should be based on what it's sourced from vs the use-case we have for this atm.

Copy link
Contributor Author

@cce cce Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a potentially stale list of top online accounts, if new accounts appeared online in the last 256 rounds (since the last state proof) they wouldn't appear. So the word "candidates" was intended to make it seem a little less definitive that this was the complete list of top online accounts for the round... but happy to pick any other name, I wasn't particularly happy with this name.

This is already being used in a method JJ called "generateKnockOfflineAccountsList" in #5757 which is where the "knockOffline" part came from.

ledger/eval/eval.go Show resolved Hide resolved
@@ -810,6 +810,9 @@ func TestTotalWeightChanges(t *testing.T) {
a := require.New(fixtures.SynchronizedTest(t))

consensusParams := getDefaultStateProofConsensusParams()
consensusParams.Payouts = config.ProposerPayoutRules{} // TODO re-enable payouts when nodes aren't suspended
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we tracking these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should make an issue to address the "update this test once heartbeats are implemented" TODOs in this PR

Copy link
Contributor

@jannotti jannotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will not have the top online cache before the first state proof, right? Maybe it would make sense to seed it during genesis (since the onlince accounts are listed out for us in the genesis file, I think). That could avoid special cases in the tests.

ledger/eval/eval.go Show resolved Hide resolved
func (eval *BlockEvaluator) endOfBlock() error {
// When generating a block, participating addresses are passed to prevent a
// proposer from suspending itself.
func (eval *BlockEvaluator) endOfBlock(participating ...basics.Address) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ...basics.Address instead of []basics.Address? I assume callers always have a slice, as opposed to call sites with, say, 5 explicit arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true, this is just me optimizing for a smaller diff, to not change other endOfBlock callers, but the idea is to pass a slice — can change

IncentiveEligible bool // currently unused below, but may be needed in the future
}
candidates := make(map[basics.Address]candidateData)
partAddrs := util.MakeSet(participating...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do anything else with this slice? Maybe we should push the Set type up through the callers, so that it is built as a Set when it is first created to pass to endOfBlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in GenerateBlock while making a map of end-of-block account state for participating addresses, to include in the UnfinishedBlock ... if we pushed it up to GenerateBlock then it could protect against looking up the same participating address twice, if duplicate addresses were passed to GenerateBlock.

ledger/eval/eval.go Show resolved Hide resolved
if maxSuspensions > 0 {
knockOfflineCandidates, err := eval.state.knockOfflineCandidates()
if err != nil {
// Log an error and keep going; generating lists of absent and expired
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this implies some nodes can "choose" not to search for absent/expired accounts.


// Now, check these candidate accounts to see if they are expired or absent.
for accountAddr, acctData := range candidates {
if acctData.MicroAlgosWithRewards.IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% of time, zero balance implies being closed?

//
// This function is passed a list of participating addresses so a node will not
// propose a block that suspends or expires itself.
func (eval *BlockEvaluator) generateKnockOfflineAccountsList(participating []basics.Address) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

participating is really "participating accounts excluding any I host"

Copy link
Contributor

@gmalouf gmalouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good in general, a few small comments.

blkEval = l.nextBlock(t)
//require.Empty(t, vb.Block().ExpiredParticipationAccounts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this added commented out?

challenge := byte(0)
for i := uint64(0); i < uint64(1210); i++ { // A bit past one grace period (200) past challenge at 1000.
vb := l.endBlock(t, blkEval)
for i := uint64(0); i < uint64(1200); i++ { // Just before first suspension at 1171
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not go past first suspension - why 1200?

}

st := txn.Sign(keys[0])
err = eval.Transaction(st, transactions.ApplyData{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all of these eval.Transaction calls?

}

// fetch fresh data up to this round from online account cache. These accounts should all
// be in cache, as long as proto.StateProofTopVoters < onlineAccountsCacheMaxSize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a condition to call out in the consensus file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants