Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run solid_queue with SQLite #29

Merged
merged 1 commit into from
Nov 1, 2023
Merged

Run solid_queue with SQLite #29

merged 1 commit into from
Nov 1, 2023

Conversation

djmb
Copy link
Contributor

@djmb djmb commented Oct 16, 2023

Update the tests to run solid_queue on SQLite.

They can be triggered by:

TARGET_DB=sqlite rails test

The adapter is prone to raising SQLite3::BusyExceptions when concurrent transactions occur. Preventing this requires a couple of patches to the adapter.

  1. Implement retry with backoff - add a retries config setting to the adapter and sleep progressively longer for each retry. This setting is currently in the Rails main branch, but that implementation has no backoff. That doesn't work well in our case.
  2. Always create immediate transactions - SQLite by default creates deferred transactions, which don't take a write lock. Then later if there is a write it tries to upgrade the lock. This won't work if the transaction has a stale read so retrying the write by itself is not possible. Starting with an immediate transaction moves the write lock to that point and ensures that we only get blocked on a retryable transaction.

cc @rosa, @kevinmcconnell

@@ -5,7 +5,7 @@ default: &default
default:
pool_size: 5
scheduler:
polling_interval: 300
polling_interval: 1
Copy link
Contributor Author

@djmb djmb Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test that SIGKILLs the supervisor

Without this the scheduler process hangs round for a long time, as it only notices that the supervisor is gone after the polling interval elapses.

@djmb djmb force-pushed the sqlite branch 5 times, most recently from 30ad653 to de38dd5 Compare October 16, 2023 14:40
Update the tests to run solid_queue on SQLite.

They can be triggered by:
```
TARGET_DB=sqlite rails test
```

The adapter is prone to raising `SQLite3::BusyException`s when
concurrent transactions occur. Preventing this requires a couple of
patches to the adapter.

1. Implement retry with backoff - add a `retries` config setting
to the adapter and sleep progressively longer for each retry. This
setting is currently in the Rails main branch, but that implementation
has no backoff. That doesn't work well in our case.
1. Always create immediate transactions - SQLite by default creates
deferred transactions, which don't take a write lock. Then later if
there is a write it tries to upgrade the lock. This won't work if the
transaction has a stale read so retrying the write by itself is not
possible. Starting with an immediate transaction moves the write lock to
that point and ensures that we only get blocked on a retryable
transaction.
@@ -47,6 +47,7 @@ class SupervisorTest < ActiveSupport::TestCase
end

test "abort if there's already a pidfile for a supervisor" do
FileUtils.mkdir_p(File.dirname(@pidfile))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed a couple of times, I think it was relying on the test order to ensure that the directory was already there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh yes! My bad, thanks for catching it 🙏

def begin_db_transaction
log("begin immediate transaction", "TRANSACTION") do
with_raw_connection(allow_retry: true, materialize_transactions: false) do |conn|
conn.transaction(:immediate)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if Solid Queue could do this for its own operations without the host application having to patch the adapter. If I'm understanding it correctly, we need this to avoid it doing stale reads when fetching items from the queue, so a SQLite app that wants to use Solid Queue will have to do this, is that right? And having Solid Queue patch it globally might be too heavy-handed.

Perhaps we'll find it's a general enough problem that it should go into Rails anyway. I'm not sure whether that's the case or not.

An alternative would be to use a write operation to fetch from the queue (that's what we're doing in Campfire at the moment). That means having different statements for the different adapters, but given that the SQLite adapter already ignores part of the intended statement (the lock part), the behaviour differences do exist, so maybe not a bad idea for it to be explicit?

I could see it working either way, so, food for thought I guess :) What do you think?

If we do stick with the patch, we'll probably need to document this requirement, at least.

Copy link
Contributor Author

@djmb djmb Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite will prevent you from reading stale data, even if you use a deferred transaction (the default for Rails). But when it realises that it has stale data we will get a busy exception without any retries. So without things will be correct, but with lots of exceptions.

If you know you are going to write in the transaction I think you's always want an immediate one, which means that for a lot of apps all transactions probably be immediate, but that's not something we should be deciding, so I think a way to choose per transaction would be useful (or maybe mark a model as using immediate transactions?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeah that's what I meant 👍 It's not that it would use stale data, it's that reading it will send it into that error state where the transaction is effectively broken.

I agree that having some way to mark which transactions should be immediate would be nice, and then Solid Queue could get the desired behaviour regardless of what the rest of the app is doing.

I think that's why I haven't run into this with my other queue code too, since it uses a write operation to fetch from the queue, so this particular situation doesn't come up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on why you haven't had this issue, begin deferred transaction + a write is equivalent to calling begin immediate just before the write occurs (see https://www.sqlite.org/lang_transaction.html#deferred_immediate_and_exclusive_transactions).

A big reason why I think we'd want to default to immediate transactions is the transactions created internally by ActiveRecord updates (calling save!, create etc).

There could be reads before writes with these depending on what the before_ callbacks do, if so we'd hit the stale read issue with these, unless we wrapped the call in our own immediate transaction.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally. It does seem like it'd make sense for immediate transactions to be the Rails default when using SQLite, because of this behaviour. We could always provide a way to start a deferred transaction manually, and/or override the default. But having a safer default would be useful for making SQLite apps work better out of the box.

It feels like overreaching for an individual library to patch that, but that's not the case for Rails itself. It already provides good defaults for lots of things, and this seems similarly appropriate.

In the short term we can patch it in our app code anyway, that's fine. My original point was just that for Solid Queue to be easily used with SQLite by other folks, it'd be nice if the problem is solved elsewhere so apps don't have to be concerned with it. But maybe that just means upstreaming the patch.

if @config[:retries]
retries = self.class.type_cast_config_to_integer(@config[:retries])
raw_connection.busy_handler do |count|
(count <= retries).tap { |result| sleep count * 0.001 if result }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a clever way of expressing this logic. Nice! 👍

I have to imagine we'll need a backoff like this in Rails. This is how I was handling it too, and it was working reliably. But since the new default removes the sleep between retries, it seems like it'd be easy to blow through all the retries very quickly while waiting for a lock to resolve. Is that what you were seeing when you tried it with the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly what I was seeing 👍

@djmb djmb merged commit 290b360 into main Nov 1, 2023
4 checks passed
@djmb djmb deleted the sqlite branch November 1, 2023 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants