Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish: build checkpoint files during ledger close #4446

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

marta-lokhova
Copy link
Contributor

@marta-lokhova marta-lokhova commented Aug 29, 2024

Resolves #4212
Resolves #4469

The change removes publish dependency on txhistory and ledgerheaders SQL tables, and instead constructs ledger headers/tx sets/tx results checkpoints at ledger close. This allows us to deprecate and drop support for txhistory entirely, which is beneficial for performance. Additionally, the code became simpler, and the xdr->sql->xdr roundtrip processing has been removed (as part of this, txsethistory table was dropped as well). The tradeoff in this approach is that we lose atomicity we had with SQL transactions. As a result, core now manually has to handle crashes. This is done by always constructing "tmp" checkpoints first, and only promoting them after a checkpoint ledger has been committed to the database. In the event that tmp file has been written, but core crashed in apply and rolled back the transaction, there's automatic file cleanup on startup, that trims any uncommitted ledgers from the temp file.

This change also drops publishqueue SQL table completely in favor of checkpoint files containing HAS. A similar recovery mechanism is used to cleanup files in case of a crash, except it's simpler, since we can just delete whole checkpoint files instead of truncating them.

Note: now that we drop txhistory, we can also deprecate the --force-back flag.

@marta-lokhova
Copy link
Contributor Author

Note: the current version of this PR drops txhistory table for simplicity, however depending on who depends on it, we might need to still support it, while also building checkpoint files directly.

@marta-lokhova marta-lokhova changed the title Build checkpoint files during ledger close publish: build checkpoint files during ledger close Sep 5, 2024
@marta-lokhova marta-lokhova force-pushed the publishSQlPath branch 5 times, most recently from ae408dc to ccf6b27 Compare September 12, 2024 01:33
@marta-lokhova marta-lokhova force-pushed the publishSQlPath branch 2 times, most recently from 8315d30 to 17bd805 Compare October 11, 2024 03:50
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly speaking: great work! Handful of nits, handful of clarifying questions / requests for documentation about what's going on, and a couple places I'm uncertain about the recovery logic that I'd like to tighten up a bit (or understand what I'm currently failing to). But overall happy to see, and happy to collaborate on wrapping it up tomorrow.

@@ -368,7 +365,7 @@ class SimulationHelper

// State has been rebuilt and node is properly in sync
REQUIRE(checkState(*app));
REQUIRE(app->getLedgerManager().getLastClosedLedgerNum() ==
REQUIRE(app->getLedgerManager().getLastClosedLedgerNum() + 1 >=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this change about?

Comment on lines +508 to +511
hm.mThrowOnLastAppend = true;
REQUIRE_THROWS_AS(
catchupSimulation.ensureOfflineCatchupPossible(
checkpointLedger),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my reading (which might be wrong!) I think mThrowOnLastAppend will not actually produce a corrupt file, it'll just throw after writing the last (correct) entry to the checkpoint file.

I might also be wrong though, or perhaps you're relying on (say) only the transactions checkpoint file being short because it would have been written after the ledger checkpoint file that throws? Anyway I would suggest that -- given the goal here is to try our best to recover from a short write -- that this test might benefit from checking after this throw that the file(s) that got mis-written really are short/corrupt.

Otherwise testing that truncation-recovery works is potentially false confidence in the thing working, y'know? Like this is the key thing we're interested in being-sure-of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mThrowOnLastAppend throws after an entry is appended to a checkpoint, but before ledger is committed. So the ledger rolls back, leaving "dirty" data in the checkpoint. The test verifies that on restart the dirty data is cleaned up, and appended again when the same ledger commits for real (this is tested via running a successful catchup).
I can certainly add checks that the files are indeed corrupt, but wanted to clarify what the test is doing.

Comment on lines -113 to -118
// [1]: Probably our read transaction was serialized ahead of the write
// transaction composing the history itself, despite occurring in the
// opposite wall-clock order, this is legal behavior in SERIALIZABLE
// transaction-isolation level -- the highest offered! -- as txns only have
// to be applied in isolation and in _some_ order, not the wall-clock order
// we issued them. Anyway this is transient and should go away upon retry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly we never checked in any way that the SCP message count was "correct" (in the sense of not-exhibiting a concurrency anomaly), in the same way as this is trying to check against the anticipated ledger count. I guess this is .. ok? Ish? We make no promises about SCP messages, they're best effort, but I wonder if there's a way to decide we got enough vs. didn't and should retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I'm not certain what the guarantees are exactly wrt SCP messages - my understanding is that we can be missing arbitrary ledgers (if we got in sync via catchup, for example). I agree it might be nice to strengthen this check, but I think it's probably out of scope for this PR, since I'm just preserving the behavior in master for now.

@@ -65,15 +68,12 @@ StateSnapshot::writeHistoryBlocks() const
// All files are streamed out of the database, entry-by-entry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't totally accurate anymore. Maybe mention that only one file is streamed from the db and that the others are formed incrementally with the CheckpointBuilder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It might even be worth renaming this method? It's not exactly forming a "history block" anymore)

fs::mkpath(HistoryManager::publishQueuePath(cfg));
}

std::string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use std::filesystem stuff in new code

bool
isPublishFile(std::string const& name)
{
std::regex re("^[a-z0-9]{8}\\.json$");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[a-f0-9], it's only hex :)

Also it should be static so you don't re-parse and re-compile the regex on each call.

{
uint32_t seq;
std::stringstream ss;
ss << std::hex << file.substr(0, 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm std::hex is a numeric formatter but you're not inserting a number here at all, you're inserting an 8-char substring. I guess std::hex is influencing the next line for the extraction? but even still it can fail silently unless you also check the stream status.

I would just go with std::stoul(file.substr(0,8), nullptr, 16) -- less moving pieces and it fails with an exception

// committed state.
// 3. Finalize any new checkpoint files _after_ the commit. If a crash
// occurs
// between commit and this step, core will try to rotate files again on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Reflow comment here.

(more serious) I don't quite know what each word in "try to rotate files again" means. If it is only "trying", can it fail as well? What does "rotate" mean exactly (will it recover-by-copying-prefixes, or delete-and-re-enqueue finished files, or something else)? If it's doing so "again", did it .. already do so sometime earlier?

Again sorry for being fussy but this is the focal point of the commit path where something-going-wrong either is or isn't recoverable, and I want to be quite sure that it is, and how!

@@ -94,7 +94,8 @@ class LedgerManagerImpl : public LedgerManager
std::unique_ptr<LedgerCloseMetaFrame> const& ledgerCloseMeta,
uint32_t initialLedgerVers);

void storeCurrentLedger(LedgerHeader const& header, bool storeHeader);
void storeCurrentLedger(LedgerHeader const& header, bool storeHeader,
bool checkpoint);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread/misunderstood checkpoint as a parameter name and thought it meant "this ledger is a checkpoint boundary" not "this call should store to the checkpoint-in-progress". maybe rename as alsoAppendToCheckpoint?

ZoneScoped;
if (!mStartupValidationComplete &&
header.ledgerSeq != LedgerManager::GENESIS_LEDGER_SEQ &&
!skipStartupCheck)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm a bit nervous about two things here:

  • The use of mStartupValidationComplete. Is it possible to validate the CheckpointBuilder's on-disk state on-construction, rather than having this half-done state where it's been constructed but might be invalid? When is that state useful?
  • Which leads me to: skipStartupCheck, which seems to be ignoring the flag! Does this enable appending to an invalid (non-recovered) state? That seems like something we should probably never do. It looks like it's called from the DB migration code, which I am assuming usually has a trivially-valid state (no CheckpointBuilder-managed publish queue at all) but .. if we got half-way through building one and failed during migration, then crashed and restarted migration, would we then be writing a second copy of the data on the end of the half-migrated data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants