Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters inside …
Browse files Browse the repository at this point in the history
…`CDATA [ PAYLOAD ]` (#172)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent c1b64c1 commit 9f1415a
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 1 deletion.
3 changes: 2 additions & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ module Private
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
INSTRUCTION_TERM = "?>"
COMMENT_TERM = "-->"
CDATA_TERM = "]]>"
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
Expand Down Expand Up @@ -431,7 +432,7 @@ def pull_event

return [ :comment, md[1] ]
else
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true)
md = @source.match(/\[CDATA\[(.*?)\]\]>/um, true, term: Private::CDATA_TERM)
return [ :cdata, md[1] ] if md
end
raise REXML::ParseException.new( "Declarations can only occur "+
Expand Down
17 changes: 17 additions & 0 deletions test/parse/test_cdata.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
require "test/unit"
require "core_assertions"

require "rexml/document"

module REXMLTests
class TestParseCData < Test::Unit::TestCase
include Test::Unit::CoreAssertions

def test_gt_linear_performance
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
REXML::Document.new('<description><![CDATA[ ' + ">" * n + ' ]]></description>')
end
end
end
end

0 comments on commit 9f1415a

Please sign in to comment.