Skip to content

Commit

Permalink
Update extract algorithm to include space.
Browse files Browse the repository at this point in the history
  • Loading branch information
BernieHuang2008 authored Oct 3, 2023
1 parent b992987 commit c125098
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion devtools/search/extract_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ def fetch_all():
content = browser.execute_script("return document.body.innerText")

content = replace_punctuations(content) # Remove punctuations
content = content.strip().replace('\n', ' ') # Remove leading and trailing spaces and newlines
content = re.sub(r'[\n]+', '\n', s)
content = content.strip().replace('\n', ' <br> ') # Remove leading and trailing spaces and newlines
content = re.sub(r'\s+', ' ', content) # Remove extra spaces

with open(f"{curr_dir}/../search/text/{filename}.txt", 'w') as f:
Expand Down

0 comments on commit c125098

Please sign in to comment.