◉ ◉

Methodology

How we measure AI-generated content and bot activity across the web.


Overview

The Dead Internet Monitor tracks two distinct phenomena: the creation of AI-generated content (“AI Slop”) and the consumption of content by automated accounts (“AI Slurp”). We sample content from major platforms, classify it using large language models, and analyse author behaviour for bot-like patterns.

Our goal is not perfect accuracy — which remains elusive even for specialised detectors — but consistent, transparent measurement of trends over time.

   ┌──────────────┐
   │  COLLECTION  │  7 sources, monthly
   └──────┬───────┘
          │
    ┌─────┴─────┐
    ▼           ▼
┌────────┐ ┌──────────┐
│CLASSIFY│ │BOT DETECT│  parallel
│ (LLM)  │ │(7-signal)│
└───┬────┘ └────┬─────┘
    │           │
    └─────┬─────┘
          ▼
   ┌──────────────┐
   │ AGGREGATION  │  DII + Autopsy Matrix
   └──────┬───────┘
          ▼
   ┌──────────────┐
   │  DASHBOARD   │  deadinternetmonitor.com
   └──────────────┘
The classification pipeline.

Data Collection

Content is collected from seven sources on a monthly schedule. Each source receives a proportional share of the classification budget to ensure balanced representation.

SourceType~Items/run
HackerNewsTech forum~4,000
YouTubeComments~5,000
MastodonFediverse~1,000
BlueskySocial~500
Stack OverflowQ&A~400
LobstersTech forum~200
RedditSocialPaused

Mastodon and Lobsters serve as control groups — decentralised or invite-only platforms with lower bot incentive.


Classification

Each item is classified by a large language model using a structured prompt (v3.0) that applies Bayesian calibration with platform-specific base rates derived from Ahrefs and Originality.ai research. This counters the documented tendency of LLM classifiers to default to “human” ( RAID 2024 found 10–15% false negative rates).

Models

RoleModelProvider
PrimaryGemini 2.5 Flash LiteGoogle
FallbackClaude Haiku 4.5Anthropic

Fallback triggers when primary confidence is below 0.5. Models are hot-swappable via configuration — no redeployment required.

AI Indicators

The classifier looks for research-validated signals of AI generation:

Human Indicators

Output

Each classification returns: a label (ai_generated, human_created, or uncertain), a confidence score (0.0–1.0), specific indicators observed, and a brief reasoning explanation.

Post-Processing

After the LLM returns its classification, a post-processing step corrects for the documented human-default bias. Items classified as “human” but carrying multiple AI indicators are reclassified as uncertain or AI-generated. Short content (<100 characters) is capped at 0.60 confidence.

┌──────────────┐
│ Content Item │
└──────┬───────┘
       ▼
┌──────────────┐   confidence
│Primary Model │──── ≥ 0.5 ──▶ RESULT
│ Gemini Flash │
└──────┬───────┘
       │ < 0.5
       ▼
┌──────────────┐
│Fallback Model│──── ≥ 0.5 ──▶ RESULT
│ Claude Haiku │
└──────┬───────┘
       │ < 0.5
       ▼
  "uncertain"
       │
       ▼
┌──────────────┐
│Post-Process  │  correct false negatives
│ Recalibrate  │  using signal evidence
└──────┬───────┘
       ▼
   FINAL LABEL
ai / human / uncertain
Classification and post-processing flow.

Bot Detection

Separate from content classification, we analyse author behaviour using a 7-signal weighted scoring system grounded in peer-reviewed research. Authors with 2+ collected items receive a bot score.

SignalWeightResearch
Posting frequency — posts per hour0.20Gilani et al. 2017
AI content ratio — % of posts classified as AI0.20Novel signal
Content diversity — topic/subreddit entropy0.15Oentaryo et al. 2016
Timing entropy — Shannon entropy of posting hours0.15Chu et al. 2012
Response latency — median seconds between posts0.10Ferrara et al. 2016
Karma velocity — karma gained per day0.10Multiple studies
Account age ratio — age vs activity volume0.10Cresci et al. 2015

Scores above 0.7 are flagged as likely bots. Between 0.4–0.7 is suspicious. Below 0.4 is likely human.


The Autopsy

The homepage Autopsy Matrix crosses content origin (human vs AI) with audience type (human vs bot) to produce four quadrants:

Human AudienceBot Audience
Human ContentAliveZombified
AI ContentPollutedDead

Bot audience share is estimated using Cloudflare Radar data blended with the 2025 Imperva Bad Bot Report, which found automated traffic surpassed human traffic for the first time at 51% of all web requests in 2024.


Dead Internet Index

The DII is a composite score (0–100) measuring how “dead” the internet is. It combines four weighted components:

ComponentWeight
AI content % — classified as AI-generated0.40
Bot engagement % — engagement from bot-flagged authors0.25
Slop×Slurp % — AI content from bot authors (the “dead” quadrant)0.20
Low-confidence human % — “human” classifications below 0.7 confidence0.15

When consumption data is available (Cloudflare Radar, robots.txt monitoring), a fifth component (0.20 weight) is added and the other weights adjust downward.


Limitations


Transparency

Every classification record stores the model provider, model name, prompt version, token counts, estimated cost, and latency. This metadata enables full auditability and comparison across models over time.

The trend matters more than any single number. We are watching the watchers.