Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize url_rewrite performance #1240

Closed
wants to merge 1 commit into from
Closed

Conversation

greglamy
Copy link

@greglamy greglamy commented Sep 29, 2020

Description (*)

The goal of this PR is to optimize url rewrite performance for big catalogs.

Use of the 'like' operator in sql queries can generate particularly long execution times when the number of records concerned by the query is large.

Mage_Catalog_Model_Url defines a new field (url_type) for categories and products $rewriteData array.

SQL queries in Mage_Catalog_Helper_Category_Url_Rewrite and Mage_Catalog_Helper_Product_Url_Rewrite have been modified to avoid use of 'like' operator.
'eq' operator is used instead.

catalog_setup has been modified to :

  • create 'url_type' new field in core_url_rewrite table and associated index
  • update existing core_url_rewrite content to match new behavior

Catalog version now defined as 1.6.0.0.19.1.7

Contribution checklist (*)

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All automated tests passed successfully (all builds are green)

@tmotyl tmotyl added performance Performance related url rewrites labels Sep 29, 2020
@tmotyl
Copy link
Contributor

tmotyl commented Sep 29, 2020

@greglamy do you have any stats showing the performance improvement?
Does your patch also handles the updating the core_rewrite table when url is changed? AT first glips I didn't find a code for it.
FYI, there are two issues which might be interesting for you regarding performance of the url_rewrite (not directly related to this PR, just FYI)
#931 #932

@github-actions github-actions bot added the Component: Catalog Relates to Mage_Catalog label Sep 29, 2020
@joshua-bn
Copy link
Contributor

Just curious, does it still work without anything after category_id IS NOT NULL? Seems like if you don't even need the additional column/condition.

@colinmollenhour
Copy link
Member

Thanks for the PR!

Please provide some queries with timing comparisons. The id_path column does have an index on it and LIKE can use this index when the clause is a prefix search ('category/%') so I wouldn't be surprised if the results were similar.

Also note that you could use an ENUM('category','product','other') instead of VARCHAR for a more efficient storage and index.

@joshua-bn
Copy link
Contributor

@greglamy you still interested in this PR? Looks like it would be useful.

@greglamy
Copy link
Author

greglamy commented Feb 4, 2022

@greglamy you still interested in this PR? Looks like it would be useful.

This modification is included in all my projects.
But I'm not sure to have enough time to continue working on performance testing.
I would need for this a huge catalog of 120K+ test products within 1K+ categories which was the real life context that made me develop this patch to win precious minutes.
Can try giving some more time to this PR.

/** @var Mage_Catalog_Model_Resource_Setup $installer */
$installer = $this;
$connection = $installer->getConnection();
$connection->addColumn($installer->getTable('core_url_rewrite'), 'url_type', 'varchar(50) NULL AFTER `product_id`');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's missing an index here. Probably several indexes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better like colinmollenhour say's: change varchar to enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, it should still have an index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you leave off the AFTER product_id part MySQL 8.0 will be able to add the column without copying the entire table which can reduce downtime while upgrading.

empiricompany
empiricompany previously approved these changes Aug 10, 2022
@elidrissidev
Copy link
Member

This looks to be a good change. Would love if someone who tried this with a large catalog can share their query times.

@colinmollenhour
Copy link
Member

Here is a script that you might find handy for this type of need to capture queries from a request. With it you can enable MySQL "general log" in seconds, watch the queries happening in real time, press Ctrl+C to stop monitoring and capture the output to a temporary file for later inspection.
https://gist.github.com/colinmollenhour/5646619

@fballiano fballiano changed the base branch from 1.9.4.x to main May 23, 2023 19:13
@fballiano
Copy link
Contributor

@joshua-bn @luigifab @colinmollenhour could you check your comments and decide what to do with this PR?

@colinmollenhour
Copy link
Member

colinmollenhour commented May 24, 2023

I vote close if there is no activity for now as it is not proven if it has a benefit and there are some changes requested. A query on indexed varchar using a prefix match should be just about as fast or as fast as an indexed exact string comparison.

It is a good observation, though. A proper compound index might give a better result. For example, this compound index:

UNIQUE KEY `UNQ_CORE_URL_REWRITE_ID_PATH_IS_SYSTEM_STORE_ID` (`id_path`,`is_system`,`store_id`),

was probably meant to address these queries but doesn't work - it should be changed to:

UNIQUE KEY `UNQ_CORE_URL_REWRITE_ID_PATH_IS_SYSTEM_STORE_ID` (`is_system`,`store_id`,`id_path`),

So that it can be used with is_system=1 AND store_id=? AND id_path LIKE .... Currently it cannot since id_path is not the right-most column of the index.

@fballiano
Copy link
Contributor

it seems a very good idea so it deserves a "don't forget this pr"

@fballiano fballiano closed this May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants