Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using default browser context #471

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ibrahima
Copy link

@ibrahima ibrahima commented Jul 4, 2024

This PR adds an option for Ferrum to use the default browser context instead of a created BrowserContext that doesn't have access to the persisted browser state.

I know that historically for testing purposes it's been recommended here to use clean browser contexts for reproducibility, but I think this is a reasonable option for some use cases (that may not be related to testing). Other similar libraries provide such options. For example, Playwright lets you do launchPersistentContext to launch a browser with a persistent context. Puppeteer seems to default to using the default browser context and only creates new BrowserContexts if you manually create them. I don't have experience with these libraries, but I did a lot of digging through their source to see how they handled Target creation and BrowserContext creation.

Fixes #47

Re: this comment from that thread:

I think the default context that Chrome creates has a lot of limitations like inability to create pages inside it and so on.

I seem to be able to create pages/targets within the default context fine. However, I haven't explored much so there could very well be bugs associated with this change. All I know is that it works for my use case and it would be very useful to have this in the upstream gem. I'm happy to discuss whether this change makes sense or not and I don't mind if the answer is that it doesn't. But at least in my case, the save/load cookies feature from 99cfa84 doesn't solve my use case because the persisted browser state that I'm trying to restore is coming from a browser hosting service (https://www.browserbase.com/), so the only way I can access the state is through the browser's default context.

Thank you for your consideration!

This has access to the browser's persisted state (e.g. cookies,
sessions, history, etc).
This would grant access to persistent state such as cookies that are
associated with the default browser profile.
This was probably leftover from my local monkeypatch.
@contexts[default_context_id] = ::Ferrum::Context.new(@client, self, nil)
end

# Compute the default context ID by looking for contexts not returned by Target.getBrowserContexts
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda hacky but I didn't find a better way to find this looking through https://chromedevtools.github.io/devtools-protocol. Target.getBrowserContexts does not return the default context, but when you create targets without specifying the browserContextId it returns this context ID. This is why we need to create an entry in @contexts with that ID to receive the targetCreated and put the Target in the right Context.

Another way to get the ID that might be more reliable is to create a target in the default context and then get the ID from there, but that would require creating a throwaway target or something. I guess it'd be fine to create a Target, get its browserContextId, and then destroy it.

Copy link
Author

@ibrahima ibrahima Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this method was also suggested (independently) by one of the Chromium engineers, so it might be fine. I am slightly nervous about the case when no tabs exist in the default context which presumably could happen, but I think generally people using browser automation should be able to ensure that the correct prerequisites are met.

One other approach might be to have the Contexts class (optionally) enumerate all existing contexts at startup and create entries for them, and then any targets that aren't part of the discovered contexts are probably from the default context. This could get out of sync potentially, but I'm not sure if there is really any problem with a Target (in Ruby) being assigned to the incorrect Context object from the Contexts map, since the Target seems to have sufficient information to operate on its own and doesn't really need info from the Context. I think the main purpose of that association is that when Context#dispose is called, it closes the connections to the Targets in that Context, so that's probably not going to be an issue in practice because if Ferrum is not aware of that Context you're never going to call #dispose on it.

Edit: The above might be slightly confusing, but this is what I mean: ibrahima@b1b7732

@@ -125,6 +125,9 @@ class Browser
# @option options [Hash] :env
# Environment variables you'd like to pass through to the process.
#
# @option options [Boolean] :use_default_context
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the best name for this is... it's kinda confusing since Contexts#default_context is already a thing and the current behavior of that method is to create a new BrowserContext. But I felt like "default" should be the right thing because the effective change is to not pass a browserContextId when creating a target.

Perhaps a less confusing name would be use_persistent_context? Similar to how Playwright has launchPersistentContext.

@@ -11,6 +11,7 @@ class Contexts
def initialize(client)
@contexts = Concurrent::Map.new
@client = client
@default_context = create_default_context if @client.options.use_default_context
Copy link
Author

@ibrahima ibrahima Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One side effect of setting this in the constructor is that any existing tabs/targets in the browser's default context will be available as targets (e.g. browser.default_context.targets will return a nonzero number of targets). This felt right to me, because then it handles use cases like #320 as well. Similar to the fork mentioned in #320 (comment) but in my opinion cleaner because it just gives you access to all the existing tabs to do with what you will, rather than somewhat arbitrarily picking the default (latest?) tab to be the default target.

It also seems to me that doing this actually makes the discover method call in the constructor do something, which I was wondering about. Without setting up any initial contexts, calling discover will start enumerating targets but because no Contexts exist yet, it won't do anything with those targets.

@route
Copy link
Member

route commented Jul 10, 2024

@ibrahima I'm on vacation currently and will take a look on Monday or sooner if I have time

@yann120
Copy link

yann120 commented Jul 22, 2024

Hey @ibrahima, thanks for this PR.
This is exactly the feature I am missing to moving to Ferrum!
I need to stay logged in on a website, and don't want to log in on the website every 5 minutes

@route have you found time to review this PR ? 🙈
Thanks a lot!

@matti
Copy link

matti commented Aug 26, 2024

@route how about reviewing this?

@sebyx07
Copy link
Contributor

sebyx07 commented Aug 28, 2024

@yann120 also reusing the chrome data dir will help you

@ryanstout
Copy link

ryanstout commented Aug 29, 2024

Hey @ibrahima. Thanks for working on this! I was trying to add the same functionality.

I'm seeing where if I run the following:

require "ferrum"

browser = Ferrum::Browser.new(
  headless: false,
  use_default_context: true,
)

browser.go_to("https://google.com")

I get:

/Users/ryanstout/Sites/arsenal/instagrab_2024/ferrum/lib/ferrum/context.rb:52:in `create_target': Ferrum::NoSuchTargetError (Ferrum::NoSuchTargetError)
        from /Users/ryanstout/Sites/arsenal/instagrab_2024/ferrum/lib/ferrum/context.rb:20:in `default_target'
        from /Users/ryanstout/Sites/arsenal/instagrab_2024/ferrum/lib/ferrum/context.rb:24:in `page'
        from /Users/ryanstout/.asdf/installs/ruby/3.2.2/lib/ruby/3.2.0/forwardable.rb:240:in `page'
        from /Users/ryanstout/.asdf/installs/ruby/3.2.2/lib/ruby/3.2.0/forwardable.rb:234:in `go_to'
        from test.rb:15:in `<main>'

I'm digging in on it, but I wasn't sure if you had seen this. Thanks!

@yann120
Copy link

yann120 commented Sep 19, 2024

@yann120 also reusing the chrome data dir will help you

Thanks, I ended up saving the cookies, and reload them, and it works. Maybe the easiest solution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Ferrum to use default (non incognito) context
6 participants