Introducing /interact. Scrape any page, then let your agent take over to click, type, and extract data for you. Try it now →
OpenClaw Web Search: How to Make Your Agent Actually Read the Web
placeholderHiba Fathima
Mar 27, 2026 (updated)
OpenClaw Web Search: How to Make Your Agent Actually Read the Web image

TL;DR

  • web_search sends a query to your configured provider (Brave by default) and returns results: title, URL, and snippet per result
  • web_fetch takes a specific URL, does an HTTP fetch, and extracts readable content from the HTML as markdown or plain text
  • Both tools are enabled together under group:web but can be allowlisted individually (web_search / web_fetch)
  • web_fetch does not execute JavaScript, so JS-rendered pages return empty or incomplete content without a fallback
  • Adding your Firecrawl API key gives web_fetch a real-browser fallback for pages Readability can't extract
  • Installing the Firecrawl CLI skill adds a firecrawl search command that returns search results and full page content in a single step
  • The Firecrawl /interact endpoint lets your agent act on a page after scraping it: click buttons, fill forms, and navigate to reach content that only appears after an interaction — something no search provider can do on its own

Send your OpenClaw agent a research task and the failure mode is predictable: web_search returns URLs, web_fetch tries to read them, and a lot of the modern web doesn't cooperate with plain HTTP requests.

This guide explains how the pipeline works, what breaks it, and how Firecrawl fixes it.

For a broader look at the full Firecrawl integration with OpenClaw including browser automation, see the OpenClaw + Firecrawl guide.

How OpenClaw's web tools actually work

OpenClaw ships two distinct web tools: web_search and web_fetch. They serve different purposes and are configured separately. Both are enabled together under group:web but can be allowlisted individually.

web_search sends a search query to your configured provider and returns a list of results. With Brave (the default), each result is a structured object: title, URL, and a short snippet. Returns 5 results by default (configurable up to 10), cached for 15 minutes. The tool won't run without an API key. If none is configured, it returns a setup error rather than silently failing.

web_fetch takes a specific URL, makes a plain HTTP GET request, and extracts readable content from the HTML response as markdown or plain text. It does not execute JavaScript.

In practice, these two tools run in sequence. The agent searches for URLs, then fetches each one to read the content. But that handoff is where things break. Brave gives the agent URLs. web_fetch tries to read them. Many modern sites return JavaScript shells to plain HTTP requests: the HTML loads, but the meaningful content renders later in the browser. Others serve 403 errors to anything that doesn't look like an active browser session. web_fetch gets back an empty page or nothing, and the agent proceeds with whatever it has.

The internal extraction order for web_fetch is:

  1. Readability: local main-content extraction from the raw HTML
  2. Firecrawl: if an API key is configured, routes through Firecrawl's API with real browser rendering and bot circumvention
  3. Basic HTML cleanup: strips tags and returns whatever text remains

If Readability fails and Firecrawl isn't configured, the agent falls through to basic cleanup, which often returns navigation links, cookie banners, and other noise instead of article content.

web_searchweb_fetch
InputSearch query stringSpecific URL
OutputTitle, URL, and snippet per resultFull page content as markdown or plain text
JavaScript executionN/ANo — plain HTTP GET only
Default results5 (configurable up to 10)Single page
Cache15 minutesConfigurable via Firecrawl maxAgeMs (default 2 days)
Requires API keyYes — provider key (Brave, Perplexity, or Gemini)No — but a Firecrawl API key adds real-browser rendering as a fallback
Allowlist tokenweb_searchweb_fetch

The search provider options

OpenClaw supports three built-in providers for web_search. If no provider is explicitly set, OpenClaw auto-detects based on which API keys are present, checking in order: Brave → Gemini → Perplexity → Grok. For a full comparison of OpenClaw search providers — including Firecrawl, Tavily, SearXNG, and pricing for each — see the dedicated guide. For a broader view of web search for AI agents — covering Exa, Tavily, and Perplexity's standalone APIs used outside OpenClaw — see the search engine comparison.

ProviderWhat it returnsAPI key
Brave (default)Title, URL, snippet per resultBRAVE_API_KEY
Perplexity SonarAI-synthesized answer with inline citationsPERPLEXITY_API_KEY or OPENROUTER_API_KEY
GeminiAI-synthesized answer grounded in Google SearchGEMINI_API_KEY

Firecrawl is not a web_search provider in the configuration sense above. It doesn't plug into the web_search tool. Instead it connects to the pipeline in two other ways: as a web_fetch fallback (via the API key config), and as the Firecrawl CLI skill, which gives your agent a firecrawl search command that runs search independently of the web_search tool entirely. More on both below.

What Firecrawl adds to the pipeline

Firecrawl connects to the OpenClaw web pipeline in two places. It's worth being precise about which is which, because they solve different problems.

Improving web_fetch

Adding your Firecrawl API key to the web_fetch config gives it a second extraction attempt for pages where Readability fails. Instead of falling through to basic HTML cleanup, it routes the request through Firecrawl's API, which uses real browser rendering and bot circumvention automatically.

{
  "tools": {
    "web": {
      "fetch": {
        "firecrawl": {
          "apiKey": "fc-YOUR-API-KEY",
          "onlyMainContent": true,
          "maxAgeMs": 172800000
        }
      }
    }
  }
}

maxAgeMs controls how fresh cached results need to be (in milliseconds). The default is 2 days, fine for content that doesn't change often. For time-sensitive pages like pricing or release notes, lower this to force fresher fetches.

This configuration doesn't change how web_search works. The agent still searches via Brave and still calls web_fetch as a second step. But when web_fetch would otherwise fail on a JS-heavy or bot-protected page, Firecrawl catches it and returns actual content. See the OpenClaw Firecrawl docs for the full config reference.

The CLI skill: search with content in one step

The Firecrawl CLI skill changes the search step itself. Instead of web_search returning a list of URLs that the agent must then fetch individually, your agent runs firecrawl search, which returns search results and the scraped content of each result in a single call.

Install the skill with:

npx -y firecrawl-cli@latest init --all

Or install everything separately:

npm install -g firecrawl-cli
firecrawl init skills
export FIRECRAWL_API_KEY="fc-YOUR-API-KEY"

Verify the setup:

firecrawl --status

Once installed, your agent can run:

# Search and return top results
firecrawl search "OpenClaw release notes February 2026" --limit 10
 
# Search and return results with full scraped content
firecrawl search "OpenClaw release notes February 2026" --scrape --scrape-formats markdown --limit 5

Each result in the --scrape response includes the URL, title, description, and the full markdown content of the page. No separate web_fetch call needed, and no 403 errors, because Firecrawl handles the actual extraction. For a deeper look at what the search endpoint returns, see Mastering the Firecrawl Search Endpoint.

web_fetch fallback vs. CLI skill: which to use

These two integrations are independent and serve different purposes:

web_fetch fallbackFirecrawl CLI skill
Configured viaJSON config (tools.web.fetch.firecrawl)CLI install (npx -y firecrawl-cli@latest init --all)
What it affectsweb_fetch only, as a fallback when Readability failsAdds firecrawl search, firecrawl scrape, crawl, and map as agent commands
Search stepNo change: agent still uses web_search (Brave etc.)Replaces the search step: firecrawl search returns results and content
Best forFixing fetch failures on JS-heavy or bot-protected pagesResearch workflows where you want content alongside results from the start

You can run both at the same time. Use the API key config to harden web_fetch, and use the CLI skill when the task calls for search-first workflows with full page content.

Scraping, crawling, and mapping

The CLI skill also gives your agent scraping, crawling, and map capabilities for when search isn't the right tool.

# Scrape a single page
firecrawl https://example.com --only-main-content
 
# Scrape with specific formats
firecrawl https://example.com --format markdown,links --pretty

This is useful when you need to pull structured data from a known URL rather than find it first, or when you want to crawl an entire docs site and process the output.

Interact: when the data is behind an action

Scraping stops at the page. Most of the web data agents actually care about sits behind a search form, a "load more" button, a login, or a filter dropdown. Static scraping — whether via web_fetch or firecrawl scrape — returns what the page renders on first load, and nothing else.

No search provider solves this. Brave, Perplexity, and Gemini all return links or synthesized answers. None of them let your agent take an action inside a live page and extract what comes back. web_fetch doesn't execute JavaScript. firecrawl scrape returns initial render. The problem isn't finding the page — it's what happens after you land on it.

Firecrawl's /interact endpoint addresses this directly. After scraping a page with firecrawl scrape, you stay in that browser session and tell it what to do next — in plain English or Playwright code:

# Scrape a page to open it
firecrawl scrape https://example.com/search
 
# Then interact with it
firecrawl interact "Search for 'quarterly earnings' and click the first result"
firecrawl interact "Extract the table on this page"
firecrawl interact stop

Sessions stay live for up to 10 minutes, and you can chain as many interaction calls as needed within that window.

For OpenClaw agents doing research that hits paywalls, paginated results, or login-gated content, /interact is the part of the pipeline that scraping alone can't cover.

Browser: when scraping isn't enough

OpenClaw's default is to drive a local browser. That works for simple workflows but the costs show up quickly: the agent runs in the same environment as your real browsing state, parallel sessions spike RAM, and runs get flaky under load. Local browsers behave like dev tooling, not infrastructure. Running agents in an isolated agent sandbox removes those risks — no shared state, no local resource pressure.

Firecrawl Browser Sandbox moves that work into a secure, remote, disposable environment. No local Chromium install, no driver setup. agent-browser and Playwright are pre-installed. Your OpenClaw agent can run on a free-tier EC2 instance or a Raspberry Pi while the actual browsing happens elsewhere.

Your agent just issues intent-level commands (open, click, fill, snapshot, scrape) through the firecrawl browser shorthand. Playwright is still available if you need it.

firecrawl browser "open https://news.ycombinator.com"
firecrawl browser "snapshot"
firecrawl browser "scrape"
firecrawl browser close

A few mechanics worth knowing:

  • Shorthand auto-session: the shorthand form (firecrawl browser "...") auto-launches a sandbox session if one isn't active, so your agent doesn't need to manage session lifecycle up front
  • Token efficiency: the agent gets back clean artifacts (snapshot, extracted content) instead of raw DOM or driver logs in the context window
  • Context offloading: fetched pages and interactions are saved to the file system and queried only when needed

You can give your agent a prompt like: "Use Firecrawl Browser Sandbox to open Hacker News and get the top 5 stories and the first 10 comments on each." The agent figures out the rest.

See the Browser Sandbox docs for the full command reference.

Search patterns that work

A few patterns that get consistent results from the OpenClaw web pipeline:

Targeted queries over broad ones. Specific search terms outperform general ones. "OpenClaw changelog entries February 2026" gives the agent something actionable; "OpenClaw news" surfaces a mixed bag. When your agent's task depends on finding accurate, specific information, guide it toward narrow queries.

Multiple queries instead of one. A single broad query returns mixed results. Running two or three targeted queries in sequence, like "OpenClaw memory tool documentation" and "OpenClaw memory tool community issues" as separate calls, and then combining the results gives the agent better raw material than one catch-all query.

Use --scrape for content-heavy tasks. When the task requires reading actual page content rather than just titles and snippets, firecrawl search ... --scrape returns full markdown in one call and skips the web_fetch round-trip entirely.

Prompt examples that work well:

Search for the three most recent Firecrawl changelog entries and summarize what changed in each.

Find the pricing page for [product] and extract plan names, monthly prices, and any seat or usage limits.

Use Firecrawl to search for "OpenClaw memory tool site:github.com" and read the top result in full.

Find community discussion about OpenClaw search providers from the past week and summarize the most common complaints.

Conclusion

The web_search to web_fetch pipeline is the right mental model for understanding OpenClaw's web access. Each tool has a distinct role, and the failure modes are specific: the search provider delivers links, web_fetch tries to read them, and a lot of the modern web doesn't cooperate with plain HTTP requests.

Firecrawl addresses this at two levels. The API key config patches web_fetch for pages Readability can't handle, adding real browser rendering as a fallback with no change to how search works. The CLI skill goes further: firecrawl search replaces the two-step search-then-fetch pattern entirely, returning content alongside results in one call. And for pages that need an actual browser session, firecrawl browser handles interactive automation in a remote sandbox with no local Chromium required.

For the full configuration reference, the OpenClaw web tools docs cover every parameter for both web_search and web_fetch. And if you're setting up Firecrawl with OpenClaw for the first time, the OpenClaw + Firecrawl guide covers the broader integration including browser automation and deployment. To expand what your agent can do beyond web access, the best OpenClaw skills on ClawHub covers top picks across Gmail, GitHub, memory, and more.

Frequently Asked Questions

How can I make OpenClaw actually read the full content of search results instead of just snippets?

web_search only returns a title, URL, and short snippet per result — not the page content. To get full content, the agent calls web_fetch on each URL as a second step. The problem is that web_fetch makes a plain HTTP GET request and can't execute JavaScript, so JS-rendered pages or bot-protected sites often return empty or incomplete content. The cleanest fix is the Firecrawl CLI skill: run firecrawl search with the --scrape flag and each result comes back with the full scraped page content in a single call, no separate web_fetch round-trip needed. For cases where the agent still uses web_fetch, adding your Firecrawl API key to the config gives it a real-browser fallback for pages Readability can't handle.

Can Firecrawl be used for search in OpenClaw?

Yes, but not through the web_search tool. Firecrawl search works via the CLI skill: install it with npx -y firecrawl-cli init --all and your agent can run firecrawl search to get back results with scraped page content in a single call. That means no separate web_fetch round-trip, no 403 errors, and full markdown content for each result. Firecrawl also connects separately as a web_fetch fallback via the API key config.

What's the difference between the Firecrawl API key config and the Firecrawl CLI skill?

The API key config (tools.web.fetch.firecrawl.apiKey) only affects web_fetch. It gives the tool a fallback extractor for pages that Readability can't handle: JS-heavy sites, pages behind bot protection, etc. The Firecrawl CLI skill is separate. It installs the firecrawl command on your agent and adds search, scrape, crawl, and map capabilities that operate independently of web_search and web_fetch entirely. You can use both at the same time.

Does web_fetch execute JavaScript?

No. web_fetch makes a plain HTTP GET request and extracts readable content from the raw HTML response. It does not execute JavaScript. Pages that render content client-side will return an empty or incomplete result. The Firecrawl fallback (configured via the API key) uses real browser rendering and handles these pages correctly. For full browser automation, the Firecrawl CLI skill also provides browser commands via firecrawl browser.

How do I enable web search in OpenClaw?

Web search is enabled by default under the group:web tool group. To activate it, add an API key for at least one supported provider: set BRAVE_API_KEY for Brave (the default), GEMINI_API_KEY for Gemini, or PERPLEXITY_API_KEY for Perplexity Sonar. OpenClaw auto-detects which provider to use based on which key is present, checking in order: Brave, then Gemini, then Perplexity, then Grok. If you want to enable web_search without web_fetch (or vice versa), allowlist them individually by name instead of using group:web.

Where and how do I configure the web search provider in the openclaw.json file?

Provider selection is driven by environment variables, not openclaw.json directly. Set BRAVE_API_KEY, GEMINI_API_KEY, or PERPLEXITY_API_KEY in your environment and OpenClaw picks the right provider automatically based on which key is present. If you want to pin a specific provider rather than relying on auto-detection, set the provider field explicitly in your config. The one thing that does go in openclaw.json is the Firecrawl web_fetch fallback: add your Firecrawl API key under tools.web.fetch.firecrawl.apiKey. For the full config schema covering both web_search and web_fetch parameters, see the OpenClaw web tools docs at docs.openclaw.ai/tools/web.

Why do I get a Brave Search API 422 error with web_search?

This is a known issue on non-English locale setups, particularly Chinese (issue #42746). OpenClaw auto-detects your locale and passes it as search_lang to Brave, but Brave only accepts specific regional codes — zh-hans for Simplified Chinese and zh-hant for Traditional Chinese. The bare zh code fails Brave's validation and causes a 422 error, which can cascade into task failures or timeouts. The workaround while a fix ships: set tools.web.search.search_lang to zh-hans (or zh-hant) explicitly in your config to bypass auto-detection. Other locales may have similar issues if they produce codes Brave doesn't accept — if you see 422s in a non-Chinese locale, try pinning search_lang manually.

Why is OpenClaw's web search performance inconsistent or 'weak' on some setups?

Usually one of three causes. First, the provider: Brave returns a title, URL, and short snippet per result — not full page content. If your agent is working from snippets alone without calling web_fetch, it has limited material to work with. Second, web_fetch itself: it makes a plain HTTP GET request and does not execute JavaScript. JS-rendered pages and bot-protected sites return empty or incomplete content. Without a Firecrawl API key configured, web_fetch falls through to basic HTML cleanup, which often returns navigation links and cookie banners instead of article text. Third, result caching: web_search results are cached for 15 minutes, so repeated queries on fast-changing topics can surface stale data. Adding the Firecrawl API key to the web_fetch config and using firecrawl search --scrape instead of web_search addresses the first two causes directly.

What happens if I don't configure any search provider API key?

web_search is enabled by default but requires an API key to function. OpenClaw auto-detects which provider to use based on available keys, checking in the order: Brave → Gemini → Perplexity → Grok. If no keys are found, it returns a short error prompting you to configure one. An alternative is the Firecrawl CLI skill, which provides firecrawl search without needing a web_search provider configured at all.

How do I verify my Firecrawl setup in OpenClaw?

Run firecrawl --status from the terminal to check that the CLI skill is installed and authenticated. For the web_fetch fallback, run openclaw doctor to verify your full tool configuration. You can also test directly by asking your agent to fetch a JS-heavy page and checking whether it returns actual content or empty output.

Can my OpenClaw agent interact with a page after scraping it?

Yes, via the Firecrawl /interact endpoint. After running firecrawl scrape on a URL, the session stays open and the agent can issue follow-up commands: click buttons, fill forms, navigate, and extract content that only appears after an interaction. Actions can be described in plain English or written as Playwright code. No other part of the OpenClaw web pipeline — not web_search, not web_fetch, not any search provider — gives the agent this capability. See the /interact docs at docs.firecrawl.dev/features/interact.

Why use Firecrawl Browser Sandbox instead of OpenClaw's local browser?

OpenClaw's default browser runs locally, which creates two problems. First, the agent operates inside the same environment as your real browsing state, which is a security risk. Second, parallel sessions spike RAM and make runs flaky — local browsers behave like dev tooling, not infrastructure. Firecrawl Browser Sandbox moves each session into a secure, remote, disposable environment. No local Chromium install, no driver setup, no RAM pressure. Your agent can run on a free-tier EC2 instance or a Raspberry Pi while the actual browsing happens elsewhere. agent-browser and Playwright are pre-installed, and the agent gets back clean artifacts instead of raw DOM or driver logs in its context window.

FOOTER
The easiest way to extract
data from the web
Backed by
Y Combinator
LinkedinGithubYouTube
SOC II · Type 2
AICPA
SOC 2
X (Twitter)
Discord