Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in-memory realtime sync #1115

Merged
merged 22 commits into from
Oct 10, 2024
Merged

in-memory realtime sync #1115

merged 22 commits into from
Oct 10, 2024

Conversation

kyscott18
Copy link
Collaborator

@kyscott18 kyscott18 commented Sep 20, 2024

The "sync" portion of ponder handling extracting and ordering events is entirely in-memory, removing the need for database queries. Another property is that only finalized data is added to the sync-store. This is a combination of #1092 and #1103. Closes #1091.

This involves:

  • storing child address information in memory
  • filtering, formatting, and ordering events (replacing getEvents())

Child address methodology

  • separate unfinalizedChildAddresses and finalizedChildAddresses
  • track factoryLogsPerBlock, so that unfinalizedChildAddresses can always be recomputed
  • on reorg, evict reorged blocks from factoryLogsPerBlock and recompute unfinalizedChildAddresses
  • on finalization, add child addresses from finalized blocks to finalizedChildAddresses, evict finalized blocks from factoryLogsPerBlock and recompute unfinalizedChildAddresses

Sync ordering methodology

  • store unindexed events in unindexedEvents
  • store unfinalized event data in unfinalizedEventData
  • when a new block is received, format with buildEvents() and add to unindexedEvents
  • when the overall checkpoint is moved forward, select events that are now able to be indexed, remove them from unindexedEvents, and pass them to the indexing functions
  • on reorg, removed events with reorged block hashes from unindexedEvents and unfinalizedEventData
  • on finalization, print a warning if the chain's finalized checkpoint is farther along than the overall indexing checkpoint
  • on finalization, removed newly finalized data from unfinalizedEventData and insert it into sync-store

Other

A metric to track how far away from tip the indexing progress is (known block - indexed block) might be helpful. However, it also may be unintuitive for many consumers when using omnichain ordering as the metric will appear far behind tip quite consistently

@kyscott18 kyscott18 marked this pull request as ready for review October 8, 2024 00:24
@kyscott18 kyscott18 requested a review from 0xOlias October 8, 2024 00:24
Copy link
Collaborator

@0xOlias 0xOlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, couple questions.

packages/core/src/sync-realtime/index.ts Show resolved Hide resolved
packages/core/src/sync-realtime/index.ts Outdated Show resolved Hide resolved
packages/core/src/sync-realtime/index.ts Show resolved Hide resolved
) as SyncCallTrace[];
callTraces = traces
.filter((trace) => trace.type === "call")
.filter((trace) => trace.result !== null) as SyncCallTrace[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the story behind this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we were relying on the getEvents query to filter out traces with error. Now it happens here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makese sense. Though, this means we'll technically be storing different data in the database in historical vs realtime, correct? In realtime, we're filtering before insertion but the opposite in historical.

transactionIndex: eb.ref("excluded.transactionIndex"),
})),
)
.onConflict((oc) => oc.column("hash").doNothing())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the context behind this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This previous logic was to handle the case when a reorg occurred. It no longer applies because all data in the sync-store is finalized.

const events: RawEvent[] = unindexedEvents.filter(
({ checkpoint }) => checkpoint <= to,
);
unindexedEvents = unindexedEvents.filter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit confusing that this is called unindexedEvents, because this service does not actually do the indexing. Am I correct that this is the list of events (across all chains) that the sync service is holding onto, but can't pass up to the runtime because there is another network with a lagging checkpoint? Essentially a buffer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. Is eventBuffer more clear?

Copy link
Collaborator

@0xOlias 0xOlias Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe bufferedEvents or pendingEvents ?

Comment on lines 742 to 746
...args.sources
.filter(({ filter }) => filter.chainId === network.chainId)
.map(({ filter }) =>
args.syncStore.insertInterval({ filter, interval }),
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice that this is now happening in the same spot - not necessary for this PR, but ultimately I think it would be great to use a database transaction here.

Comment on lines 784 to 797
localSyncContext.get(network)!.unfinalizedEventData =
unfinalizedEventData.filter(
(led) =>
hexToNumber(led.block.number) <= hexToNumber(event.block.number),
);

break;
const reorgedHashes = new Set<Hash>();
for (const b of event.reorgedBlocks) {
reorgedHashes.add(b.hash);
}

default:
never(event);
unindexedEvents = unindexedEvents.filter(
(e) => reorgedHashes.has(e.block.hash) === false,
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a bit confused by the difference between unfinalizedEventData and unindexedEvents here. It seems like they contain the same information, why do we need both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do contain the same information, where unindexedEvents is derived from unfinalizedEventData. However, when building events, point-in-time child addresses are needed. Also, the lifetime of each of these variables is different.

Comment on lines -788 to -796
eventQueue.pause();
eventQueue.clear();
promises.push(eventQueue.onIdle());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why'd you remove these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sync module no longer contains a queue. Instead, it is synchronous with sync-realtime. This is primarily to maintain consistency for factory addresses. Without this change, it would be almost impossible to maintain unfinalizedChildAddresses and pass the correct value for the correct block height to buildEvents.

// @ts-ignore
block.transactions = undefined;

await args.onEvent({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add the await here?

Copy link
Collaborator Author

@kyscott18 kyscott18 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]);

// Add corresponding intervals to the sync-store
// Note: this should happen after so the database doesn't become corrupted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, I bet this would have bit us.

Copy link
Collaborator

@0xOlias 0xOlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0xOlias 0xOlias merged commit 1a7d1f8 into main Oct 10, 2024
8 of 9 checks passed
@0xOlias 0xOlias deleted the kjs/mem branch October 10, 2024 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hold all factory child addresses in memory
2 participants