Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace name and entity regular expressions with specific functions for ~15% performance improvement #216

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lovell
Copy link
Contributor

@lovell lovell commented Jun 28, 2017

Hello,

Regular expressions are generally fast and become increasingly more efficient with longer strings.

However this module tests character-by-character, so extracting the character code and using equality and range checks greatly increases the performance of element and entity name detection.

Using the node-expat benchmark tests reveals this gain is around 15%.

sax v1.2.4:

sax x 174,389 ops/sec ±1.67% (86 runs sampled)
node-xml x 138,412 ops/sec ±1.25% (88 runs sampled)
libxmljs x 240,261 ops/sec ±1.00% (84 runs sampled)
node-expat x 468,442 ops/sec ±0.88% (90 runs sampled)

with this change:

sax x 208,245 ops/sec ±0.77% (88 runs sampled)
node-xml x 138,796 ops/sec ±1.07% (88 runs sampled)
libxmljs x 253,781 ops/sec ±0.98% (85 runs sampled)
node-expat x 469,078 ops/sec ±0.84% (90 runs sampled)

The existing test suite continues to pass after this change.

This is the third and most likely final performance improvement I'm going to be able to make to sax, at least in the short term. When this change is viewed with #204 and #208 it appears we've been able to improve performance by at least a factor of 3x since v1.2.1.

Once again thank you for all your time maintaining this highly depended upon module.

with specific functions using char code equality or range checks

Increases performance by 10-15%
@jacktuck
Copy link

jacktuck commented Jun 28, 2017

LGTM - more readable too :)

Nice work @lovell!

Curious what sax would yield now on https://github.com/AndreasMadsen/htmlparser-benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants