Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement $replace modifier, improve option parsing #3897

Merged
merged 88 commits into from
Aug 26, 2024

Conversation

seia-soto
Copy link
Member

@seia-soto seia-soto commented Apr 9, 2024

fixes #3886 built top on #3887

https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier

Following Adguard spec:

  • $replace rules apply to any text response, but will not apply to binary (media, image, object`, etc). (feat: perform html filtering on headers received seia-soto/adblocker#14)
  • $replace rules do not apply if the size of the original response is more than 10 MB (feat: perform html filtering on headers received seia-soto/adblocker#14)
  • $replace rules have a higher priority than other basic rules (including exception rules). So if a request corresponds to two different rules one of which has the $replace modifier, this rule will be applied.
  • Document-level exception rules with $content or $document modifiers do disable $replace rules for requests matching them.
  • Other document-level exception rules ($generichide, $elemhide or $jsinject modifiers) are applied alongside $replace rules. It means that you can modify the page content with a $replace rule and disable cosmetic rules there at the same time.

@seia-soto seia-soto changed the title fix: properly find the filter options index feat: implement $replace modifier Apr 9, 2024
@seia-soto
Copy link
Member Author

Note: only option was set in the test.

@seia-soto
Copy link
Member Author

seia-soto commented Apr 9, 2024

@chrmod chrmod added the PR: New Feature 🚀 Increment minor version when merged label Apr 15, 2024
@remusao
Copy link
Collaborator

remusao commented Apr 22, 2024

@seia-soto Could you share more information on the $replace option? In particular:

  1. How many filters rely on this option currently?
  2. Can you list some (or all) of them as examples explaining the behavior?

Thanks, I think it will help review the implementation.

packages/adblocker/test/parsing.test.ts Outdated Show resolved Hide resolved
packages/adblocker/src/filters/network.ts Outdated Show resolved Hide resolved
packages/adblocker/src/filters/network.ts Outdated Show resolved Hide resolved
packages/adblocker/src/filters/network.ts Show resolved Hide resolved
@seia-soto
Copy link
Member Author

@seia-soto Could you share more information on the $replace option? In particular:

  1. How many filters rely on this option currently?
  2. Can you list some (or all) of them as examples explaining the behavior?

Thanks, I think it will help review the implementation.

@remusao In aspect of adblocker library, the importance of $replace is pretty much lower than implementing the full html capability. I'd like to build an integrated system for html filtering because most of them are handled with regexp. However, in aspect of Ghostery product, this is in high priority since these filters do important role in YouTube blocking.

@chrmod
Copy link
Member

chrmod commented Apr 29, 2024

All uBO filters replace that are currently in use:

filters/filters-2024.txt
30:||alliptvlinks.com/tktk-content/plugins/$script,1p,replace=/\bconst now.+?, 100/clearInterval(timer);resolve();}, 100/gms
579:/theme/002/js/application.js?2.0|$script,1p,replace=/video\.maxPop/0/

filters/unbreak.txt
4802:||s3media.247sports.com/Scripts/Bundle/*/videoPlayer.js^$script,1p,replace=/;if\(!\([a-z]+\|\|\(null===[^{]+/;if(false)/

filters/filters-2023.txt
2931:||dehlinks.ir/link_download.php?Mozojadid_Id=$doc,replace=/content="15;/content="0;/
3017:||rekidai-info.github.io/_app/immutable/components/pages/index/_page.svelte-$script,replace=/try\{.*?catch.*?push\(\)\}catch\{//
3018:||rekidai-info.github.io/_app/immutable/components/pages/index/_page.svelte-$script,replace=/throw new Error\("Error Loading Rekidai Data."\)\}throw new Error\("Ad block detected."\)//
5289:||veev.to/assets/videoplayer/*.js$script,replace=/\bhttps:\/\/pagead2\.googlesyndication\.com\/pagead\/js\/adsbygoogle\.js/https:\/\/veev.to\/assets\/videoplayer\/17c088d.js/

filters/filters-2022.txt
3502:||theappstore.org/script.js?v=$script,1p,replace=/result\.length \> 10000/result.length < 10000/g
3606:/loader.min.js$xhr,script,domain=loawa.com|ygosu.com|sportalkorea.com|enetnews.co.kr|edaily.co.kr|economist.co.kr|etoday.co.kr|hankyung.com|isplus.com|hometownstation.com|inven.co.kr|honkailab.com|warcraftrumbledeck.com|genshinlab.com|thestockmarketwatch.com|thephoblographer.com|issuya.com|dogdrip.net|worldhistory.org|bamgosu.site,replace=/\)\{var [a-z]{1,2},[a-z]{1,2},[a-z]{1,2},[a-z]{1,2}\=[a-z]{2};return [a-z]\(\)/){return;/g
3607:/loader.min.js$xhr,script,domain=loawa.com|ygosu.com|sportalkorea.com|enetnews.co.kr|edaily.co.kr|economist.co.kr|etoday.co.kr|hankyung.com|isplus.com|hometownstation.com|inven.co.kr|honkailab.com|warcraftrumbledeck.com|genshinlab.com|thestockmarketwatch.com|thephoblographer.com|issuya.com|dogdrip.net|worldhistory.org|bamgosu.site,replace=/\)\{var [a-z]{1,2},[a-z]{1,2},[a-z]{1,2};.*?return [a-z]\(\)/){return; return c()/g
3608:/loader.min.js$xhr,script,domain=loawa.com|ygosu.com|sportalkorea.com|enetnews.co.kr|edaily.co.kr|economist.co.kr|etoday.co.kr|hankyung.com|isplus.com|hometownstation.com|inven.co.kr|honkailab.com|warcraftrumbledeck.com|genshinlab.com|thestockmarketwatch.com|thephoblographer.com|issuya.com|dogdrip.net|worldhistory.org,replace=/\.mark\(\(function [a-z0-9]{1,2}\([a-z0-9]{1,2},[a-z0-9]{1,2}\){var.*\]\]\)\}\)\)\),/.mark((function neutralized(a,b){var none = false;}))),/g
4298:||bitcotasks.com/assets/js/mainjs.php$script,1p,replace=/entry.duration > 0/entry.duration < 10/

filters/quick-fixes.txt
129:||d3lj2s469wtjp0.cloudfront.net/build/js/public/$script,3p,replace=/\{try\{.*?clip-path.*?catch\(/{try{}catch(/,domain=puzzle-loop.com|puzzle-words.com|puzzle-chess.com|puzzle-thermometers.com|puzzle-norinori.com|puzzle-minesweeper.com|puzzle-slant.com|puzzle-lits.com|puzzle-galaxies.com|puzzle-tents.com|puzzle-battleships.com|puzzle-pipes.com|puzzle-hitori.com|puzzle-heyawake.com|puzzle-shingoki.com|puzzle-masyu.com|puzzle-stitches.com|puzzle-aquarium.com|puzzle-tapa.com|puzzle-star-battle.com|puzzle-kakurasu.com|puzzle-skyscrapers.com|puzzle-futoshiki.com|puzzle-shakashaka.com|puzzle-kakuro.com|puzzle-jigsaw-sudoku.com|puzzle-killer-sudoku.com|puzzle-binairo.com|puzzle-nonograms.com|puzzle-sudoku.com|puzzle-light-up.com|puzzle-bridges.com|puzzle-shikaku.com|puzzle-nurikabe.com|puzzle-dominosa.com
139:||statics.1mv.xyz/statics/*.js|$script,3p,replace=/;return _0x[a-z0-9]+\['[_a-z]+'\]\['s'\]/;return false/
140:||statics.1mv.xyz/statics/*.js|$script,3p,replace=/;if\(null!==\(_0x[a-z0-9]+=this\['[_a-z]+'\]\)[^)]+\)return;/;if(true)return;/
153:||in-jpn.com^$script,replace=/var w_status[\s\S\n]+?doSakigake\(\);[\s\S\n]+?\}//,badfilter
154:||in-jpn.com^$script,replace=/var w_\w+[\s\S\n]+?doSakigake\(\);[\s\S\n]+?\}//

filters/annoyances-others.txt
396:||www.facebook.com/api/graphql/$xhr,replace=/\{"brs_content_label":[^,]+,"category":"ENGAGEMENT[^\n]+"cursor":"[^"]+"\}/{}/g
7177:||solarmovie.vip/js/$script,1p,replace=/\(\{checkers\:.*?\]\}\)/({checkers:[]})/g
7484:||tver.jp/_next/static/chunks/$replace=/e\?(e\(\):\(n\.play\(\))/!1?\$1/,script

filters/filters.txt
25:||www.youtube.com/playlist?list=$xhr,1p,replace=/"adPlacements.*?([A-Z]"\}|"\}{2\,4})\}\]\,//
26:||www.youtube.com/playlist?list=$xhr,1p,replace=/"adSlots.*?\}\]\}\}\]\,//
27:||www.youtube.com/watch?v=$xhr,1p,replace=/"adPlacements.*?([A-Z]"\}|"\}{2\,4})\}\]\,//
28:||www.youtube.com/watch?v=$xhr,1p,replace=/"adSlots.*?\}\]\}\}\]\,//
29:||www.youtube.com/youtubei/v1/player?$xhr,1p,replace=/"adPlacements.*?([A-Z]"\}|"\}{2\,4})\}\]\,//
30:||www.youtube.com/youtubei/v1/player?$xhr,1p,replace=/"adSlots.*?\}\]\}\}\]\,//
489:||www.facebook.com/api/graphql/$xhr,replace=/\{"brs_content_label":[^,]+,"category":"SPONSORED"[^\n]+"cursor":"[^"]+"\}/{}/
490:||www.facebook.com/api/graphql/$xhr,replace=/\{"node":\{"role":"SEARCH_ADS"[^\n]+?cursor":[^}]+\}/{}/g
491:||www.facebook.com/api/graphql/$xhr,replace=/\{"node":\{"__typename":"MarketplaceFeedAdStory"[^\n]+?"cursor":(?:null|"\{[^\n]+?\}"|[^\n]+?MarketplaceSearchFeedStoriesEdge")\}/{}/g

@seia-soto
Copy link
Member Author

Just a note: better filter selection should be done from performHTMLFiltering

@chrmod
Copy link
Member

chrmod commented Apr 29, 2024

Current matching logic in Ghostery 10 for Firefox:
https://github.com/ghostery/ghostery-extension/blob/d2542406174fb59ff939095b6d6d925bea79a3b9/extension-manifest-v3/src/background/adblocker.js#L356

will have to changed from:

  1. for main frames - apply html cosmetic filters
  2. for all other - match network filter

to:

  1. match all network filters and html cosmetic filters
  2. for main frames - apply html cosmetic filters and html network filters
  3. for all other - block if block network filter matched, filter html if any html filter matched

@seia-soto
Copy link
Member Author

Note: This PR requires additional updates coming from: seia-soto#3

@seia-soto seia-soto force-pushed the support-replace-mod branch 6 times, most recently from 2d34e34 to 18bd35b Compare June 6, 2024 13:44
@seia-soto seia-soto changed the title feat: implement $replace modifier feat: implement $replace modifier, improve option parsing Jun 13, 2024
@seia-soto
Copy link
Member Author

seia-soto commented Jun 13, 2024

This also requires a performance improvement.

The last result is not valid since the logic had a parsing issue. Below is the corrected values.

Ratio offset from initialisation time: 1.1422831189.
Corrected value for the list parsing time of ghostery:master: 111,494.7324346903.

(seia-soto:support-replace-mod)

Avg serialization time (100 samples): 193.007 μs
Avg deserialization time (100 samples): 35.182 μs
Serialized size: 1,866 KiB
List parsing time: 140,669.292 μs (~26% incr)
Initialization time: 23,179.875 μs

(ghostery:master)
Avg serialization time (100 samples): 270.544 μs
Avg deserialization time (100 samples): 22.112 μs
Serialized size: 1,866 KiB
List parsing time: 97,606.916 μs
Initialization time: 20,292.583 μs

*sha:dc9d1b6a0711c22464c6e3006338165cc26d4ba2

packages/adblocker-webextension/adblocker.ts Outdated Show resolved Hide resolved
packages/adblocker/test/utils.test.ts Outdated Show resolved Hide resolved
packages/adblocker/test/parsing.test.ts Outdated Show resolved Hide resolved
packages/adblocker/test/html-filtering.test.ts Outdated Show resolved Hide resolved
packages/adblocker/src/utils.ts Outdated Show resolved Hide resolved
packages/adblocker-webextension/adblocker.ts Outdated Show resolved Hide resolved
packages/adblocker/src/filters/network.ts Outdated Show resolved Hide resolved
packages/adblocker/src/filters/network.ts Outdated Show resolved Hide resolved
packages/adblocker/src/engine/bucket/html.ts Outdated Show resolved Hide resolved
packages/adblocker/src/engine/bucket/html.ts Outdated Show resolved Hide resolved
Comment on lines +99 to +108
type CosmeticFilterMatchingContext =
| {
url: string;
callerContext: any; // Additional context given from user
filterType: FilterType.COSMETIC;
}
| {
request: Request; // For HTML Filters
filterType: FilterType.COSMETIC;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an attribute which can be used to discriminate (as in "tagged union") which of the two options this is when manipulating a value? Or do we want to implicitly discriminate based on presence of url, etc.

Suggested change
type CosmeticFilterMatchingContext =
| {
url: string;
callerContext: any; // Additional context given from user
filterType: FilterType.COSMETIC;
}
| {
request: Request; // For HTML Filters
filterType: FilterType.COSMETIC;
};
type CosmeticFilterMatchingContext =
| {
type: 'elementHiding',
url: string;
callerContext: any; // Additional context given from user
filterType: FilterType.COSMETIC;
}
| {
type: 'htmlFiltering',
request: Request; // For HTML Filters
filterType: FilterType.COSMETIC;
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly I don't know yet - I'm fine to keep it as is

packages/adblocker/src/engine/engine.ts Outdated Show resolved Hide resolved
packages/adblocker/src/engine/engine.ts Show resolved Hide resolved
chrmod and others added 8 commits August 23, 2024 11:51
Latest firefox automatically lowers all header names.

```
import Fastify from 'fastify'
const fastify = Fastify({
  logger: true
})

// Declare a route
fastify.get('/', async function handler(request, reply) {
  reply.header('UPPERCASED', '1')
  return { hello: 'world' }
})

// Run the server!
try {
  await fastify.listen({ port: 3000 })
} catch (err) {
  fastify.log.error(err)
  process.exit(1)
}
```
feat: perform html filtering on headers received
packages/adblocker/src/filters/network.ts Outdated Show resolved Hide resolved
@chrmod chrmod merged commit 8b4ae41 into ghostery:master Aug 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: New Feature 🚀 Increment minor version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support $replace
4 participants