HTTPS surface reachable (robots ✓, sitemap ✗, title ✗)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
User-agent: usasearch
Crawl-delay: 2
User-agent: Mediapartners-Google*
Disallow: /
User-agent: bingbot
Disallow: /
User-agent: msnbot
Disallow: /
User-agent: IsraBot
Disallow: /
User-agent: Orthogaffe
Disallow: /
User-agent: UbiCrawler
Disallow: /
User-agent: DOC
Disallow: /
User-agent: Zao
Disallow: /
User-agent: sitecheck.internetseer.com
Disallow: /
User-agent: Zealbot
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: Fetch
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: WebZIP
Disallow: /
User-agent: linko
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: Xenu
Disallow: /
User-agent: larbin
Disallow: /
User-agent: libwww
Disallow: /
User-agent: ZyBORG
Disallow: /
User-agent: Download Ninja
Disallow: /
User-agent: wget
Disallow: /
User-agent: grub-client
Disallow: /
User-agent: k2spider
Disallow: /
User-agent: NPBot
Disallow: /
User-agent: WebReaper
Disallow: /
User-agent: *
Disallow: /js/
Disallow: /preview/
Disallow: /*.js$
Disallow: /*.js.map$
Disallow: /*.json$
Disallow: /readingroom/search/
Disallow: /readingroom/advanced-search-view/
Disallow: /readingroom/request/
Crawl-delay: 10
Sitemap: https://www.cia.gov/sitemap/sitemap-0.xml
Sitemap: https://www.cia.gov/readingroom/sitemap.xml
Sitemap: https://www.cia.gov/the-world-factbook/sitemap/sitemap-0.xml
Host: https://www.cia.gov
social
- og:title
- og:description
- og:image
- og:url
- twitter:title
- twitter:description
- twitter:card