AI agents that can control a web browser and perform tasks, just like humans would, are no longer limited to concept. Tools like ChatGPT Operator are powerful but come at a price. Instead of paying up, I went looking for a free alternative—and I found one that works surprisingly well.
Browser Use: The Best Open-Source Alternative I Found
ChatGPT Operator can control a web browser and perform actions such as clicking and scrolling all by itself. You just have to tell ChatGPT what needs to be done, such as booking tickets or writing text in Google Docs, and it will do so. But to gain access, one has to pay a hefty price tag—$200 per month as part of ChatGPT’s Pro tier. I couldn’t afford to purchase this subscription, and went out looking for an alternative and found about Browser Use.
Related
What Are AI Agents and How Do They Work?
AI Agents can help you solve complex problems, but how do they actually work?
Browser Use is an open-source AI agent similar to ChatGPT Operator. It can interact with a web browser, navigate through websites, and perform actions. However, it costs only a fraction of ChatGPT’s offering. Moreover, there are two options to choose from.
The first option is to pay a $30 subscription, which runs the AI agent on their cloud service. The other option is to set it up yourself locally, and it costs the least (you will only be charged for API usage). I went with the most affordable option.
Setting up Browser Use isn’t as straightforward as ChatGPT Operator, but with a few lines of code, I had it up and running. If I could do it, you can too!
How I Set Up Browser Use on My PC
To get started, you’ll need two things: Python 3.11 installed on your computer and API access from OpenAI (or a locally hosted LLM if you prefer).
Since Browser Use is an AI agent, it requires a large language model (LLM) to function. For that, you can get API access from OpenAI’s website or any other API that works with Browser Use. The benefit of using an API is that you get the flexibility to choose between different models (such as GPT-3.5 and GPT-4), and you only have to pay for what you use—instead of an upfront subscription fee.
In my testing, I used the ChatGPT 4-o model. I was charged less than $1 for all seven tasks I asked Browser Use to perform. However, if you pair it with DeepSeek API, it will be several times cheaper.
You could also use a local LLM on your computer. However, running a local LLM comparable to ChatGPT 4-o requires significant computing power, which most people likely won’t have. I did test out DeepSeek’s 7B LLM model on my computer, and the performance was unsurprisingly bad. So, I would recommend sticking with an API for now.
Once you obtain API access, you can create a virtual environment in VS Code by going to view > Command Palette and typing create environment. Then, open a new terminal and install Browser-use using pip.
pip install browser-use
Create a .env file inside the folder and add your API key.
OPENAI_API_KEY="Your API Here"
Create a new Python file with the name app.py and paste the following code.
pip install browser-use
OPENAI_API_KEY="Your API Here"
from langchain_openai import ChatOpenAIfrom browser_use import Agent
import asyncio
from dotenv import load_dotenv
load_dotenv()
async def main():
agent = Agent(
task="Go to Reddit, search for 'browser-use', click on the first post and return the first comment.",
llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()
print(result)
asyncio.run(main())
Replace the prompt with your own, like “Search for Albert Einstein and open his Wikipedia page.” Finally, Run the app.py python file using the terminal.
python app.py
Putting It to the Test: Does It Live Up to Expectations?
I started my testing with simple tasks, such as Googling “Albert Einstein” and opening his Wikipedia page. When I ran the script, the AI agent opened a new browser window and executed the task flawlessly.
Next, I asked it to search for gaming laptops on Amazon and open the first result. Again, the AI agent completed the task successfully.
At this point, I was convinced that Browser Use could intelligently navigate the web. To push it further, I instructed it to visit Yahoo News and summarize the top five articles. To my surprise, Browser Use could complete the task within a few minutes. The summaries were short and to the point. You can see the results below.
However, things became tricky for Browser Use when I asked it to search for flights from London to Paris on skyscanner.com. Initially, the website blocked access due to bot detection, so I had to intervene and bypass the bot detection. Still, Browser Use struggled—it clicked the search button without correctly entering “London” and “Paris” into the respective fields.
You can pair Browser Use with your main browser, where all your accounts are logged in. This allows the AI agent to enter data into a Google Sheet or paste Yahoo News summaries into a Google Doc. However, I ran into issues setting it up with my active browser, so I put it on hold for now.
Overall, it was a fun experiment. Watching an AI agent navigate the web and perform tasks was fascinating. While Browser Use is not perfect, it’s far from a solid AI agent that can browse the web.
Still, this technology is in its infancy, so we can expect improvements in the future.
For now, if you’re willing to tinker with the setup and don’t mind occasional hiccups, boot up your computer and install Browser Use. Feel free to mention it in the threads if you get stuck and need a helping hand.