Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance upgrade for tag subqueries #921

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tylercrocker
Copy link

@tylercrocker tylercrocker commented Sep 19, 2018

We recently noticed when upgrading a bunch of gems in our system that acts-as-taggable-on has some performance issues when doing a match_all query.

See below a fairly simple example of me fetching images that have two tags:

SELECT `assets`.* 
FROM   `assets` 
       INNER JOIN `taggings` `asset_taggings_1c1918a` 
               ON `asset_taggings_1c1918a`.`taggable_id` = `assets`.`id` 
                  AND `asset_taggings_1c1918a`.`taggable_type` = 'Asset' 
                  AND `asset_taggings_1c1918a`.`tag_id` IN (SELECT `tags`.`id` 
                                                            FROM   `tags` 
                                                            WHERE 
                          Lower(`tags`.`name`) LIKE 'school!_logos' escape '!') 
       INNER JOIN `taggings` `asset_taggings_0430ff7` 
               ON `asset_taggings_0430ff7`.`taggable_id` = `assets`.`id` 
                  AND `asset_taggings_0430ff7`.`taggable_type` = 'Asset' 
                  AND `asset_taggings_0430ff7`.`tag_id` IN (SELECT `tags`.`id` 
                                                            FROM   `tags` 
                                                            WHERE 
                          Lower(`tags`.`name`) LIKE 'uofdenver' escape '!') 
       LEFT OUTER JOIN `taggings` 
                    ON `taggings`.`taggable_id` = `assets`.`id` 
                       AND `taggings`.`taggable_type` = 'Asset' 
WHERE  `assets`.`type` IN ( 'Assets::Image' ) 
GROUP  BY `assets`.`id` 
HAVING Count(`taggings`.`taggable_id`) = (SELECT Count(*) 
                                          FROM   `tags` 
                                          WHERE  ( 
              Lower(`tags`.`name`) LIKE 'school!_logos' escape '!' 
               OR Lower(`tags`.`name`) LIKE 'uofdenver' escape '!' )) 
LIMIT  1;

After fiddling with the query a bit to see what part was causing the inefficiencies I landed on the subquery comparisons in the two INNER JOINs, where they say that tag_id IN (SUBQUERY). If I simply changed the IN to an equals (=) then I saw performance improvements of ~110ms.

This change should be fine as long as we're not using wildcards, since the tags table has a unique constraint on name anyway.

Note that we're working with about 6500 tags, 1.8M taggings, and 330k assets, MySQL 5.7, Ruby 2.5.1, Rails 5.2, and I'm on a newer macbook pro (to bring some context to the performance numbers I saw).

I don't know what you usually do for version bumping in pull requests, but I bumped the version from what it was, that's what we do for our internal gems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants