Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser management. #172

Open
divol89 opened this issue Oct 2, 2024 · 4 comments
Open

Browser management. #172

divol89 opened this issue Oct 2, 2024 · 4 comments

Comments

@divol89
Copy link

divol89 commented Oct 2, 2024

The full potential of Agent-Zero would be possible if it were
feasible to integrate Google browser management in a visual
way, like asking to open YouTube and having it do so. Let's say
you want to automate interaction with a website instead of
creating a full bot, doing it with Agent Zero running
automatically by giving only the prompts? It should be able to
read or snapshot the website to understand what to do there?

Right now, it's not possible.

@alexey2baranov
Copy link

It would be great!
As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage.
I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.

At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.

@divol89
Copy link
Author

divol89 commented Oct 3, 2024

It would be great! As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage. I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.

At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.

yes exactly , imagine configuration for your agent-zero on a trading website platform running it on automatic with out strees insane .

@TerminallyLazy
Copy link

I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.

@divol89
Copy link
Author

divol89 commented Oct 5, 2024

I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.

great , will be waiting to collaborate if its possible .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants