Open specification · v1.0 · 2026-05-14
The Agent-Ready Web Standard
A public, dated specification for how a website should present itself to AI agents. Six dimensions. Fifty-plus checks. Three conformance tiers. Implement-it-yourself, license-free.
1. Status of this Document
This document is the v1.0 release of the Agent-Ready Web Standard (“ARWS”), an open specification for how a public website communicates with AI agents (large-language-model-driven crawlers, assistants, and orchestration runtimes). It is published by StartupHub.ai and dated May 14, 2026. The reference implementation is the public scanner at /agent-readiness, shipped April 20, 2026, which scores every URL on a 0–100 scale against the checks below.
ARWS is anchored on existing IETF and W3C primitives where they exist (RFC 8288, RFC 7234, RFC 9110, Schema.org, OpenAPI 3.1, the Model Context Protocol) and introduces new structure only where the existing primitives leave gaps. The keywords MUST, SHOULD,MAY, REQUIRED, RECOMMENDED, and OPTIONALare to be interpreted per RFC 2119 and RFC 8174.
2. Abstract
A site is “agent-ready” when an autonomous AI agent that arrives at the site can: (a) discover what the site contains without rendering JavaScript, (b) retrieve content in a machine-readable form, (c) respect the site’s access-control intent, (d) act on the site’s exposed capabilities (API, MCP server, structured forms), (e) transactwith the site where commerce is the intent, and (f) trust the content because the site signals editorial quality and provenance. Each is a dimension of agent readiness. A site that scores 100/100 on one dimension and zero on the others is not agent-ready.
3. Scope
In scope: the public HTTP surface of a site. Headers, response bodies, well-known files, sitemap, schema.org markup, robots/llms files, OpenAPI documents, MCP server cards, structured pricing, and visible editorial signals.
Out of scope: authenticated content, private APIs, in-app experiences, browser-only behavior that doesn’t round-trip through HTTP, and the agent’s own reasoning quality.
4. Terminology
- Agent: any HTTP user-agent identifying itself as an LLM-driven crawler, runtime, or assistant. Examples:
GPTBot,ClaudeBot,PerplexityBot,Anthropic-AI,Claude-Web. - Origin: the scheme + host + port triple of a public site, per RFC 6454.
- Markdown twin: a markdown representation of a resource served at the same URL via HTTP content negotiation, or at a parallel
.mdURL. - Discovery card: the
/.well-known/mcp/server-card.jsondocument (or equivalent), declaring the site’s MCP server endpoint, capabilities, and authentication. - Capabilities document: the OpenAPI 3.1 (or AsyncAPI) document declaring the public REST/GraphQL API surface the site exposes to agents.
- Conformance tier: one of Basic, Optimized, First as defined in §7.
5. The Six Dimensions
Every check in this specification belongs to exactly one of the following six dimensions. A site’s overall score is the weighted average of its per-dimension score; per-dimension scores are published alongside the overall score so a site can see exactly where it falls short.
5.1 Discoverability
Can an agent find what the site contains without rendering JavaScript?
Discoverability covers the surfaces an agent uses to enumerate the site without executing client-side code — well-known files, sitemap structure, response headers advertising alternate representations, and canonical URL hygiene. The dimension comprises a family of checks anchored on RFC 8288 (Web Linking), sitemaps.org, robots.txt convention, and the emergingllms.txt convention. Sites scoring well in this dimension can be fully crawled and indexed by an agent in a single fetch budget.
5.2 Content
Can an agent retrieve content without parsing JavaScript-rendered HTML?
Content covers how the body of each page is exposed to an agent once discovered — the noscript HTML surface, machine-readable alternates, structured data (Schema.org / JSON-LD), language and encoding declarations, and how compact the agent-facing representation is relative to the human-facing one. A site that renders meaningfully without JavaScript and signals its structure through Schema.org scores well here.
5.3 Access
Does the site respect access-control intent expressed by the agent?
Access covers how the site negotiates with agents that identify themselves — both via user-agent string and via theAccept header — and how the site advertises policy (training-data permissions, rate-limit conventions, bot allow/deny lists). Anchored on RFC 9110 content negotiation semantics, well-known URI conventions (RFC 8615), and the draft IETF rate-limit headers.
5.4 Capabilities
Can an agent act on the capabilities the site exposes?
Capabilities covers the structured action surfaces an agent can invoke once it knows the site — MCP servers, public APIs documented via OpenAPI 3.1 or GraphQL introspection, structuredschema:Action markup on key flows, and webhook schemas. A site that exposes well-typed action surfaces is agent-actionable, not just agent-readable.
5.5 Commerce
Can an agent transact with the site where transaction is the intent?
Commerce covers how transactional intent is exposed and made machine-readable — pricing surfaces, structured offers, programmatic checkout, product feeds, and primary contact actions for sales-led sites. Anchored on Schema.org commerce vocabularies and emerging agent-payment surfaces.
5.6 Quality
Does the content signal editorial quality and provenance?
Quality covers the trust and provenance signals a serious agent evaluates before treating content as authoritative — declared authorship, publication and modification dates, links to authoritative entity profiles, citations, and HTTPS hygiene. A site that signals provenance clearly is more likely to be cited verbatim by a downstream agent.
The full per-check enumeration for each dimension is non-public. Detailed conformance test recipes — including check IDs, individual MUST / SHOULD / MAYrequirements, and machine-evaluable assertions — are available under scanner resultsfor any URL you submit, and are licensed separately as part of the implementation suite. Publishing the recipes publicly would enable trivial reproduction of the scoring engine; this specification deliberately publishes the dimensions, scoring, and tiers as a public contract while keeping the per-check details under the scanner’s licensing terms.
6. Scoring
Each check returns one of pass, fail,warn, or n/a. Per-dimension score is the percentage of applicable checks passed within that dimension. The overall score is the simple average of the six dimension scores, mapped to a letter grade as follows:
- A+ ≥ 90
- A 80–89
- B 70–79
- C 60–69
- D 50–59
- F < 50
7. Conformance Tiers
Three named tiers gate which checks are required vs. recommended. A site claiming “Agent-Ready” MUST meet all Basic-tier MUST checks.
- Basic (Agent-Ready): all MUST checks in Discoverability, Content, Access pass.
- Optimized (Agent-Optimized): Basic + all SHOULD checks in Discoverability, Content, Access pass; Capabilities and Commerce score ≥ 50.
- First (Agent-First): Optimized + all SHOULD checks across all six dimensions pass; overall score ≥ 90.
8. Discovery Card
Sites that complete a scan and choose to declare their conformance tier MAY publish a discovery card at a well-known URL on their origin. The discovery card is a small JSON document containing the tier achieved, the overall score, the per-dimension scores, and the scan timestamp. The exact location, schema, and validation rules are documented as part of the scanner’s output for sites that pass.
9. IANA Considerations
This specification reuses existing IANA registrations fortext/markdown (RFC 7763), HTTP headers (Link,Vary, X-Robots-Tag), and well-known URI prefixes (RFC 8615). No new registrations are requested.
10. Normative References
- RFC 2119 / RFC 8174 — Keywords for Requirement Levels
- RFC 6454 — The Web Origin Concept
- RFC 7234 — Hypertext Transfer Protocol (HTTP/1.1): Caching
- RFC 7763 — The text/markdown Media Type
- RFC 8288 — Web Linking
- RFC 8615 — Well-Known Uniform Resource Identifiers
- RFC 9110 — HTTP Semantics
- OpenAPI Specification 3.1.0
- The Model Context Protocol (MCP) v1.0
- Schema.org — Vocabulary for structured data
- sitemaps.org — XML Sitemap Protocol
- llms.txt convention
11. Versioning & Stability
This standard follows semantic versioning. The v1.0 release (2026-05-14) is the first stable release. Minor versions (v1.x) MAY add new SHOULD or MAYchecks but MUST NOT change the semantics of existing MUST checks. Major versions are reserved for breaking changes. The canonical version of this document is always at https://www.startuphub.ai/spec.
12. Changelog
- 2026-05-14 — v1.0 published.
- 2026-04-20 — reference scanner shipped at /agent-readiness.
Licensed under CC BY 4.0. Anyone may implement, fork, or extend this specification with attribution. Cite as: The Agent-Ready Web Standard v1.0, StartupHub.ai, 2026-05-14, https://www.startuphub.ai/spec.