Miasma: The Open-Source Tarpit That Feeds AI Crawlers an Endless Buffet of Garbage

One developer got so tired of AI scrapers hammering his site that he built a poison well and left the door open.

Miasma is a Rust-based tool that traps AI web crawlers in an infinite loop of fake content and self-referencing links. It doesn’t block bots. It doesn’t rate-limit them. It wastes their time and poisons their training data. And as of this week, it’s sitting on Hacker News’ front page with 222 points and 171 comments — because apparently a lot of people have been waiting for exactly this kind of weapon.

The Problem That Won’t Go Away

Here’s where we are in 2026: AI companies collectively crawl billions of pages per day. Meta’s bots alone account for 52% of all AI crawler traffic — more than double Google’s 23% or OpenAI’s 20%. One website owner reported getting hit with 39,000 requests per minute at peak load. The Read the Docs project found that blocking AI crawlers dropped their bandwidth from 800GB to 200GB daily, saving around $1,500 a month.

And robots.txt? Most AI crawlers ignore it. A Clutch survey found 57% of small and medium businesses are now actively blocking AI crawlers, and over 80% of Cloudflare customers have chosen to block these bots entirely.

The frustration isn’t abstract. It costs real money, degrades real performance, and the companies doing the crawling aren’t paying a cent for the content they’re vacuuming up. Blocking works to a point, but bots using residential proxies with single-IP requests make traditional IP blocking basically useless.

So Austin Weeks took a different approach: instead of keeping bots out, invite them in — and make them regret it.

How Miasma Actually Works

The mechanism is clever in its simplicity. Say it in three steps.

You embed invisible HTML links on your website pointing to a /bots path. These links use display: none, aria-hidden=”true”, and tabindex=”1″ — completely invisible to humans and screen readers, but visible to crawlers that parse raw HTML. A real person will never click these links. A bot will.

When a crawler follows one of those hidden links, it hits the Miasma server. Miasma responds with a page full of poisoned content pulled from an external “poison fountain” source, plus five more self-referencing links. The bot follows those links, gets more garbage and more links, follows those, gets more garbage… you see where this is going. It’s a roach motel. The bot checks in but it never checks out.

The poisoned data isn’t random noise — it’s Markov-chain generated text that looks plausible enough to pass initial filters but is semantically worthless. If that data makes it into a training set, it degrades model quality. One Hacker News commenter claimed they “made up references to a nonexistent Python library” and later found it in ChatGPT. Whether that’s anecdotal or systematic, the principle stands: garbage in, garbage out.

Miasma is written in Rust, which means it’s fast and lightweight. At 50 concurrent connections, it peaks at about 50-60 MB of memory. It runs behind a reverse proxy like Nginx, so deployment is straightforward. The whole thing is designed so that defending your site costs you almost nothing while costing the bot operator everything.

What Makes Miasma Different from Nepenthes and Iocaine

Miasma isn’t the first AI tarpit. Nepenthes and Iocaine have been around since early 2025, and they share the same basic philosophy: waste bot resources and corrupt training data. But the implementation details matter.

Nepenthes creates an infinite maze of static files filled with Markov-generated nonsense. It’s effective but heavier — it generates and serves its own content. Reports say OpenAI’s crawler has actually managed to escape Nepenthes’ traps in some cases.

Iocaine takes the opposite approach. Instead of trapping crawlers, it lets them crawl freely but injects subtly corrupted data into the content they collect. It operates as a reverse proxy and reportedly killed 94% of bot traffic on one test site. The trade-off is that your real content is still being served — just with poison mixed in.

Miasma sits in between. It doesn’t touch your real content at all — bots only see poison once they follow the hidden links into the trap. And instead of generating content locally, it proxies from an external poison source, keeping its own resource footprint minimal. The configuration is also more granular: you can set concurrent request limits, customize link counts per page, force gzip compression to reduce bandwidth costs, and define custom path prefixes.

One line from Miasma’s README captures its philosophy: “An endless buffet of slop for the slop machines.” Fair enough.

The Hacker News Debate: Does This Actually Work?

The 172-comment HN thread reveals a community that’s genuinely split.

The skeptics make fair points. Several commenters noted that inserting hidden links violates Google’s webmaster guidelines — if Google detects display:none honeypot links, your site could get penalized in search rankings. Miasma’s workaround is to whitelist Googlebot, Bingbot, and DuckDuckBot in robots.txt, but that’s an imperfect solution. Google’s guidelines specifically say hidden links are a no-no regardless of intent.

Others argue this is the digital equivalent of keeping a phone scammer on the line for 45 minutes — satisfying but ultimately not scalable as a defense strategy. Sophisticated scrapers could easily filter out display:none elements before following links. RLHF processes in model training might catch obviously garbage data before it ever enters a training set.

But the defenders have a point too: you don’t need to stop every bot. You just need to make scraping your site expensive enough that the cost-benefit tips against it. Even if 50% of bots detect the trap, the other 50% are stuck in an infinite loop burning compute. And the data poisoning angle adds a second layer — even if the bot eventually escapes, the garbage it collected on the way in still has to be filtered out.

The broader debate underneath all of this is about data rights. Is web scraping theft? Is publicly posted content fair game? Nobody has a clean answer, and Miasma doesn’t pretend to be one. It’s a tool for people who’ve decided they don’t want to feed the machine and are willing to fight dirty about it.

The Practical Considerations

Before you rush to deploy this, a few things worth knowing.

The Google penalty risk is real. If you depend on organic search traffic, you need to think carefully about how you implement the hidden links. Miasma recommends putting the honeypot links only on pages you’re comfortable experimenting with, and maintaining a clean robots.txt for legitimate crawlers.

Memory usage scales linearly with concurrent connections. The default max-in-flight limit is 500 requests, and anything beyond that gets a 429 response. For most sites, this is more than enough. But if you’re running a high-traffic site that’s already under heavy bot pressure, you’ll want to tune this.

The project is licensed under GPL-3.0 and is at version 0.1.18 as of March 2026. It’s installable via cargo install miasma. And here’s a detail that’s too good not to mention: the README explicitly states that “primarily AI-generated contributions will be automatically rejected.” A tool built to fight AI, refusing to let AI help build it. There’s a purity to that.

The current GitHub stats show 317 stars and 49 commits — small but growing fast given the HN attention.

FAQ

Is Miasma free to use?
Yes. Miasma is fully open source under the GPL-3.0 license. You install it with cargo (the Rust package manager) and run it on your own server. There are no paid tiers or cloud services involved.

Will Miasma hurt my Google search rankings?
Potentially. Hidden links using display:none violate Google’s webmaster guidelines. Miasma recommends whitelisting major search engine bots in your robots.txt file to mitigate this, but the risk isn’t zero. If SEO is critical to your business, test carefully before deploying site-wide.

How does Miasma compare to just blocking AI crawlers?
Blocking tells bots to go away, and most simply ignore the request. Miasma takes the opposite approach — it lets bots in and wastes their resources while feeding them useless data. The philosophy is that if blocking doesn’t work, make crawling as expensive as possible for the bot operator.

Does Miasma work against all AI scrapers?
Not necessarily. Sophisticated crawlers that pre-filter hidden HTML elements or use headless browsers that render CSS could potentially avoid the trap. But many current AI scrapers parse raw HTML without rendering, which makes them vulnerable to Miasma’s honeypot links.

Can I use Miasma alongside other anti-bot tools?
Yes. Miasma runs as a standalone server behind a reverse proxy. You can combine it with rate limiting, Cloudflare, or tools like Anubis for a layered defense strategy.

Top AI Product

Leave a comment Cancel reply