HTTPS surface reachable (robots ✓, sitemap ✗, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# ============================================================
# Last updated: 2026-03
# ============================================================
# ── Sitemaps ────────────────────────────────────────────────
Sitemap: https://venturebeat.com/sitemap.xml
Sitemap: https://venturebeat.com/news-sitemap.xml
# ── Default rules (all crawlers not listed below) ───────────
User-agent: *
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
Disallow: /sponsored-posts
# ── Search engine crawlers ───────────────────────────────────
User-agent: Googlebot
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
User-agent: Bingbot
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
Crawl-delay: 5
User-agent: MSNBot
Disallow: /_next/
Disallow: /api/
Disallow: /login
Crawl-delay: 5
User-agent: BingPreview
Disallow: /_next/
Disallow: /api/
Disallow: /login
Crawl-delay: 5
User-agent: Slurp
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
# ── AI Crawlers — EXPLICITLY ALLOWED ────────────────────────
# Google Gemini / AI Overviews
User-agent: Google-Extended
Allow: /
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
# OpenAI / ChatGPT
User-agent: GPTBot
Allow: /
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
# Anthropic / Claude
User-agent: ClaudeBot
Allow: /
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
User-agent: Claude-Web
Allow: /
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
User-agent: anthropic-ai
Allow: /
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
# Perplexity AI
User-agent: PerplexityBot
Allow: /
Disallow: /search
Disallow: /_next/
Disallow: /api/
Disallow: /login
# ── Utility / Legitimate bots ────────────────────────────────
User-agent: TermlyBot
Allow: /
# ── Blocked crawlers ─────────────────────────────────────────
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: YandexBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: amazon-QBusiness
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: MotoMinerBot
Disallow: /
User-agent: SentiBot
Disallow: /
User-agent: IRLbot
Disallow: /
User-agent: DF Bot 1.0
Disallow: /
User-agent: proximic
Disallow: /
User-agent: AwarioSmartBot
Disallow: /
head
- title
- Vercel Security Checkpoint
- description
- —
social
no OpenGraph or Twitter meta tags found