Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run solid_queue with SQLite #29

Merged
merged 1 commit into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
fail-fast: false
matrix:
ruby-version: [3.2.2]
database: [mysql]
database: [mysql, sqlite]
services:
mysql:
image: mysql:8.0.31
Expand Down
18 changes: 18 additions & 0 deletions test/dummy/config/database.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,22 @@
# gem "mysql2"
#

<% if ENV["TARGET_DB"] == "sqlite" %>
default: &default
adapter: sqlite3
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 50 } %>
retries: 100

development:
<<: *default
database: db/development.sqlite3

test:
<<: *default
pool: 20
database: db/test.sqlite3

<% else %>
default: &default
adapter: mysql2
username: root
Expand All @@ -20,3 +36,5 @@ test:
<<: *default
pool: 20
database: solid_queue_test
<% end %>

30 changes: 30 additions & 0 deletions test/dummy/config/initializers/sqlite3.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
module SqliteImmediateTransactions
def begin_db_transaction
log("begin immediate transaction", "TRANSACTION") do
with_raw_connection(allow_retry: true, materialize_transactions: false) do |conn|
conn.transaction(:immediate)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if Solid Queue could do this for its own operations without the host application having to patch the adapter. If I'm understanding it correctly, we need this to avoid it doing stale reads when fetching items from the queue, so a SQLite app that wants to use Solid Queue will have to do this, is that right? And having Solid Queue patch it globally might be too heavy-handed.

Perhaps we'll find it's a general enough problem that it should go into Rails anyway. I'm not sure whether that's the case or not.

An alternative would be to use a write operation to fetch from the queue (that's what we're doing in Campfire at the moment). That means having different statements for the different adapters, but given that the SQLite adapter already ignores part of the intended statement (the lock part), the behaviour differences do exist, so maybe not a bad idea for it to be explicit?

I could see it working either way, so, food for thought I guess :) What do you think?

If we do stick with the patch, we'll probably need to document this requirement, at least.

Copy link
Contributor Author

@djmb djmb Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite will prevent you from reading stale data, even if you use a deferred transaction (the default for Rails). But when it realises that it has stale data we will get a busy exception without any retries. So without things will be correct, but with lots of exceptions.

If you know you are going to write in the transaction I think you's always want an immediate one, which means that for a lot of apps all transactions probably be immediate, but that's not something we should be deciding, so I think a way to choose per transaction would be useful (or maybe mark a model as using immediate transactions?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeah that's what I meant 👍 It's not that it would use stale data, it's that reading it will send it into that error state where the transaction is effectively broken.

I agree that having some way to mark which transactions should be immediate would be nice, and then Solid Queue could get the desired behaviour regardless of what the rest of the app is doing.

I think that's why I haven't run into this with my other queue code too, since it uses a write operation to fetch from the queue, so this particular situation doesn't come up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on why you haven't had this issue, begin deferred transaction + a write is equivalent to calling begin immediate just before the write occurs (see https://www.sqlite.org/lang_transaction.html#deferred_immediate_and_exclusive_transactions).

A big reason why I think we'd want to default to immediate transactions is the transactions created internally by ActiveRecord updates (calling save!, create etc).

There could be reads before writes with these depending on what the before_ callbacks do, if so we'd hit the stale read issue with these, unless we wrapped the call in our own immediate transaction.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally. It does seem like it'd make sense for immediate transactions to be the Rails default when using SQLite, because of this behaviour. We could always provide a way to start a deferred transaction manually, and/or override the default. But having a safer default would be useful for making SQLite apps work better out of the box.

It feels like overreaching for an individual library to patch that, but that's not the case for Rails itself. It already provides good defaults for lots of things, and this seems similarly appropriate.

In the short term we can patch it in our app code anyway, that's fine. My original point was just that for Solid Queue to be easily used with SQLite by other folks, it'd be nice if the problem is solved elsewhere so apps don't have to be concerned with it. But maybe that just means upstreaming the patch.

end
end
end
end

module SQLite3Configuration
private
def configure_connection
super

if @config[:retries]
retries = self.class.type_cast_config_to_integer(@config[:retries])
raw_connection.busy_handler do |count|
(count <= retries).tap { |result| sleep count * 0.001 if result }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a clever way of expressing this logic. Nice! 👍

I have to imagine we'll need a backoff like this in Rails. This is how I was handling it too, and it was working reliably. But since the new default removes the sleep between retries, it seems like it'd be easy to blow through all the retries very quickly while waiting for a lock to resolve. Is that what you were seeing when you tried it with the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly what I was seeing 👍

end
end
end
end

ActiveSupport.on_load :active_record do
if defined?(ActiveRecord::ConnectionAdapters::SQLite3Adapter)
ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend SqliteImmediateTransactions
ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend SQLite3Configuration
end
end
2 changes: 1 addition & 1 deletion test/dummy/config/solid_queue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ default: &default
default:
pool_size: 5
scheduler:
polling_interval: 300
polling_interval: 1
Copy link
Contributor Author

@djmb djmb Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test that SIGKILLs the supervisor

Without this the scheduler process hangs round for a long time, as it only notices that the supervisor is gone after the polling interval elapses.

batch_size: 500

development:
Expand Down
2 changes: 1 addition & 1 deletion test/integration/jobs_lifecycle_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ class JobsLifecycleTest < ActiveSupport::TestCase

travel_to 5.days.from_now

wait_for_jobs_to_finish_for(0.5.seconds)
wait_for_jobs_to_finish_for(5.seconds)

assert_equal 2, JobBuffer.size
assert_equal "I'm scheduled later", JobBuffer.last_value
Expand Down
2 changes: 1 addition & 1 deletion test/integration/processes_lifecycle_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ class ProcessLifecycleTest < ActiveSupport::TestCase

test "term supervisor exceeding timeout while there are jobs in-flight" do
no_pause = enqueue_store_result_job("no pause")
pause = enqueue_store_result_job("pause", pause: SolidQueue.shutdown_timeout + 0.1.second)
pause = enqueue_store_result_job("pause", pause: SolidQueue.shutdown_timeout + 1.second)

signal_process(@pid, :TERM, wait: 0.1.second)
wait_for_jobs_to_finish_for(SolidQueue.shutdown_timeout + 0.1.second)
Expand Down
10 changes: 10 additions & 0 deletions test/test_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,16 @@
ActiveSupport::TestCase.fixtures :all
end

module BlockLogDeviceTimeoutExceptions
def write(...)
# Prevents Timeout exceptions from occurring during log writing, where they will be swallowed
# See https://bugs.ruby-lang.org/issues/9115
Thread.handle_interrupt(Timeout::Error => :never, Timeout::ExitException => :never) { super }
end
end

Logger::LogDevice.prepend(BlockLogDeviceTimeoutExceptions)

class ActiveSupport::TestCase
setup do
SolidQueue.logger = ActiveSupport::Logger.new(nil)
Expand Down
1 change: 1 addition & 0 deletions test/unit/supervisor_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ class SupervisorTest < ActiveSupport::TestCase
end

test "abort if there's already a pidfile for a supervisor" do
FileUtils.mkdir_p(File.dirname(@pidfile))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed a couple of times, I think it was relying on the test order to ensure that the directory was already there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh yes! My bad, thanks for catching it 🙏

File.write(@pidfile, ::Process.pid.to_s)

pid = run_supervisor_as_fork(mode: :all)
Expand Down