Papers

Research I have been working on.

March 2026 · cs.IR / cs.AI · Research paper

The Semantic Object ModelA Token-Efficient Web Representation for AI Agents

Introduces SOM, a structured format that compresses web pages into semantic JSON for LLM consumption. Evaluates compression across 100 real-world websites with nightly CI-driven coverage.

March 2026 · cs.AI / cs.CY · Position paper

The Agentic WebRethinking Web Infrastructure for Machine Consumption

A position paper arguing that the web is entering a fourth state and proposing three infrastructure primitives: SOM, Agent Web Protocol, and cooperative content negotiation via robots.txt directives.

March 2026 · cs.NI / cs.SE · Protocol spec

Agent Web ProtocolA Purpose-Built Communication Protocol for AI Agent-Web Interaction

Deep technical specification of AWP, a protocol designed for AI agents interacting with web content. Covers all 7 MVP methods, intent-based interaction via semantic element targeting, SOM integration, WebAssembly skill extensibility, and a detailed comparison with CDP.

March 2026 · cs.CY / cs.IR · Proposal

Cooperative Content Negotiation for the Agentic WebExtending robots.txt for AI Agents

Proposes SOM directives for robots.txt that let publishers offer structured semantic representations to AI agents instead of blocking them entirely. Covers the publisher-agent conflict, directive syntax, complementary signaling mechanisms, security considerations, and an adoption pathway.

March 2026 · cs.AI / cs.CY · Research paper

The Hidden TaxQuantifying Token Waste in Agent-Web Interaction

Estimates the annual economic cost of HTML presentation noise in agent workloads at $1B to $5B per year. Combines Cloudflare crawl volume data, HTTP Archive page sizes, WebTaskBench token measurements, and a survey of 10 agent frameworks.

March 2026 · cs.IR / cs.AI · Benchmark

Does Format Matter?Agent Task Performance Across Web Representations

Introduces WebTaskBench, a task-based benchmark that measures how page representations affect agent cost and speed. Reports token and latency results for HTML vs markdown vs SOM across GPT-4o and Claude Sonnet 4, and specifies the rubric framework for accuracy and hallucination evaluation in follow-up revisions.

March 2026 · cs.CY / cs.AI · Research paper

The Publisher's CalculusA Cost-Benefit Analysis of Serving Structured Representations to AI Agents

Presents a comprehensive cost-benefit framework for web publishers evaluating SOM adoption. Models four publisher strategies across three tiers (10K to 50M agent requests/month), finding that SOM-first serving reduces per-request infrastructure cost by 60 to 80% with break-even at approximately 50,000 to 170,000 agent requests per month.

March 2026 · cs.AI / cs.IR · Research paper

Information Fidelity Under Semantic CompressionMeasuring Task Accuracy, Hallucination, and Grounding Across Web Representations for AI Agents

Evaluates whether SOM's 4x token compression preserves the information agents need for correct task completion. Introduces a web-agent hallucination taxonomy (structural, content, attribution, inference), a grounding verifiability score enabled by SOM provenance metadata, and the accuracy-efficiency frontier across 150 tasks, 4 models, and 3 representations. Extends WebTaskBench with gold-label annotations and a new Interactive task category.