Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain a time index to support an akka read journal #103

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jypma
Copy link

@jypma jypma commented Oct 15, 2015

This adds an index table to cassandra, so events can be queried "roughly" by time. The akka journal query plugin implementation is in a separate library.

The way this works, is for every time window (say, 1 minute) to add a persistenceId to the index table once, if it's changed in that time window. Index size will be somewhat limited by only indexing the first change to a persistenceId during a time window.

The query API can then find what changed when, up to the accuracy of a time window. This allows remote / distributed views to resume into the event stream, without having to re-start from 0.

There are working integration tests in the query implementation repository.

@krasserm
Copy link
Owner

@jypma thanks for your pull request. I'll review and comment in the next couple of days. Cheers, Martin

@jypma
Copy link
Author

jypma commented Oct 15, 2015

Thanks a lot. I think the initial feedback that might be covered could include:

  • Is this feature general enough to warrant the index table always being created?
  • Is the configuration mechanism to specify what to index generic enough?
  • Is there a smarter table structure to store this lookup information in cassandra?

@krasserm
Copy link
Owner

@jypma we designed writes to Cassandra in a way that they always go to a single partition in order to avoid issues discussed in #48. With your addition, writes may again go to different partitions, resulting in a logged batch which suffers from the problems described in #48.

We are currently discussing a general architecture for creating indices and supporting akka-persistence-query in #77 (/cc @zapletal-martin). The index is created asynchronously so that writing additional tables is not on the fast write path of akka-persistence. It would be great to additionally implement a time index based on this architecture. WDYT?

@jypma jypma mentioned this pull request Oct 19, 2015
6 tasks
@jypma
Copy link
Author

jypma commented Oct 19, 2015

Just commented over there. I fully agree this should go in the same direction. However, our project timelines might require us to go on with this forked branch for the moment. I'll at least add some test cases for the query side seeing index values and main table values out of order.

Secondly, I'll play with the idea to have the time/window be extracted from the main event (as an offline indexer would have to do). That would make it easier to upgrade/transition later.

@krasserm
Copy link
Owner

@jypma I fully understand that waiting for #77 to be ready is in conflict with your project timelines. I'm willing to merge your contribution as a temporary solution to support your query plugin but it needs modification so that writes go to a single partition. Can you imagine creating the time index with a background indexer running concurrently to the journal actor? Or do you plan to continue with the
current implementation on your fork?

@jypma
Copy link
Author

jypma commented Oct 20, 2015

@krasserm I understand your concerns. I've changed the PR itself to at least not touch the main events table, and derive an event's timestamp from the event itself (which would be needed anyways to allow async, replayable indexing). This way, upgrading should be a little easier.

I'm undecided whether I'd go on to make the actual indexing async at this point. I expect I'd run into some of the same challenges that #77 tries to address. Plus, our particular application is somewhat latency-sensitive (time from an event being emitted to any real-time source picking it up shouldn't be more than a second or so).

Let's just keep the PR open for now, for reference, and come back to it when an async indexer is in play.

By the way, since akka 2.4 targets Java 8+, can this plugin as well? I prefer to use java.time.Instant over the type-less Long.

@krasserm
Copy link
Owner

@jypma ok, let's keep the PR open for now. Thanks anyway for your contribution. Regarding Java version, we should of course also target to 8+.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants