How to let an AI agent read web pages without getting prompt-injected

You ask your AI agent to “check this page and summarise it,” or you give it a tool that browses documentation while it works. It fetches the URL, reads the text, and folds what it found into its thinking. That is the entire point of handing an agent the web — and it is also the opening.

To a language model there is no firm line between the page I was told to read and an instruction I should follow. A web page can carry text a human never sees but the model reads as plainly as anything else: invisible Unicode characters, white-on-white prose, content tucked inside HTML comments, a counterfeit “system” delimiter designed to look like the start of a real instruction. Security researchers have turned exactly this into working attacks — pages that quietly told a coding agent to leak an API token, run a command, or rewrite its own configuration. “I read a web page” becomes “the web page wrote my next move.”

safe-fetch puts a barrier in that path. It fetches the page inside a locked-down, throwaway container, strips the hidden tricks, and hands the result back clearly labelled as untrusted data — something to read, never a set of orders to obey.

The problem, in plain terms

The danger is not a virus in the usual sense; it is language. Anything an agent reads becomes part of the context it reasons from, and an attacker who can get text in front of the model — by owning a page it visits, or planting it in a file the agent opens — can try to phrase that text as a command. This is called indirect prompt injection, and OWASP ranks prompt injection as the number-one risk for LLM applications. The unsettling part is that the payload is invisible to you: the page looks normal in a browser while carrying instructions only the model will “hear.”

Who runs into this

Anyone whose AI reads things it did not write: Claude Code following a link, a custom agent scraping documentation, a browsing tool summarising search results. The more useful you make an agent by letting it read the open web, the more this matters.

What safe-fetch does — and how it works

safe-fetch <url> is a drop-in fetcher: point it at a page and it returns the cleaned, wrapped text instead of the raw HTML.

A booby-trapped page can’t become a command: safe-fetch labels it as untrusted data first.

Four independent layers

safe-fetch does not rely on a single trick. The fetch runs inside a hardened container (no host access, no network escalation, no state kept between calls), so even a flaw in the cleaner can’t reach your machine. Inside, a sanitizer strips the known injection vectors — zero-width and bidirectional Unicode, homoglyphs, scripts, comments, off-screen and same-colour CSS, base64 payloads, fake model delimiters. The cleaned text is then wrapped in an untrusted-content envelope your agent is told to treat as data, with any attempt to forge the wrapper inside the content neutralised. Finally, --install-claude-hooks writes the model rule that makes the agent honour the envelope: read for facts, never run what is inside.

Honest limits

A plain-prose attack with no technical tells — literally typing “ignore your previous instructions and…” — is not pattern-matched by the sanitizer, on purpose: regex over natural language produces false alarms on real prose and misses every rephrasing, while giving false confidence. That class is the job of the model rule (layer four), which is why the --install-claude-hooks step matters. safe-fetch is explicit about what each layer does and does not cover.

Install & use it

Install

brew install sharkyger/tap/safe-fetch

Use it

safe-fetch https://example.com          # fetch, sanitise, wrap
safe-fetch --install-claude-hooks       # add the model rule + Claude Code hooks

Point your agent at safe-fetch instead of a raw HTTP fetch, and run the install-hooks step once so the untrusted- data rule is in place. Full options, the threat model and source are on GitHub: github.com/sharkyger/safe-fetch (MIT).

FAQ

What is a prompt-injection sanitizer?

A filter that removes the hidden tricks an attacker plants in web content to hijack an AI — invisible Unicode, off-screen text, fake delimiters, encoded payloads — before the page reaches the model. safe-fetch runs that filter on every page it fetches.

How do I let an AI agent read a web page safely?

Fetch it through safe-fetch instead of directly. The page is downloaded inside a locked-down container, stripped of known injection vectors, and handed back wrapped in an untrusted-content envelope your agent treats as data — so a booby-trapped page can’t become a command.

Does it catch every prompt injection?

No, and it says so. Plain-prose attacks with no technical tells aren’t pattern-matched — that class is better handled by a model rule, which safe-fetch installs via --install-claude-hooks. The container isolation and the untrusted-data wrapper are the layers that always apply.

safe-fetch is one of the 5bats AI-agent security tools. For Claude Desktop and other MCP clients there is mcp-safe-fetch; to stop fetched or sub-agent text being run as instructions inside Claude Code, see claude-code-prompt-injection-gate.

See the threat model, options and source on GitHub →