HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
User-Agent: *
Disallow: /travel/event/search/
Disallow: /car/index.html
Disallow: /housing/index.html
Disallow: /english/newsfeatures.html
Disallow: /english/business.html
Disallow: /english/cooljapan.html
Disallow: /english/sports.html
Disallow: /*/search/results*
Allow: /
Allow: /.well-known/assetlinks.json
Allow: /ads/tu/
User-agent: Googlebot
Disallow: /*klpuid=*
User-agent: CCBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Google-CloudVertexBot
Disallow: /
User-agent: ICC-Crawler
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Claude-SearchBot
Disallow: /
User-agent: Claude-User
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Cohere-training-data-crawler
Disallow: /
User-agent: omgili
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Perplexity-ai
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: Meta-externalfetcher
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Gensparkbot
Disallow: /
User-agent: AmazonBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: Magpie-crawler
Disallow: /
User-agent: Scrapy
Disallow: /
User-agent: Timpibot
Disallow: /
User-agent: Webzio-Extended
Disallow: /
User-agent: SBIntuitionsBot
Disallow: /
User-agent: SBIntuitions-SearchBot
Disallow: /
User-agent: OpenindexSpider
Disallow: /
User-agent: AI2Bot
Disallow: /
sitemap: https://www.asahi.com/sitemap.xml
sitemap.xml
present — 8 url(s)
head
- title
- 朝日新聞:朝日新聞社のニュースサイト
- description
- 朝日新聞社のニュースサイトです。政治、経済、社会、国際、スポーツ、文化、科学などの速報ニュースに加え、教育、医療、環境などの話題や写真も。
social
- og:locale
- ja_JP
- og:title
- 朝日新聞:朝日新聞社のニュースサイト
- og:url
- https://www.asahi.com/
- og:image
- https://www.asahicom.jp/images/logo_ogp.png
- og:site_name
- 朝日新聞
- og:type
- website
- og:description
- 朝日新聞社のニュースサイトです。政治、経済、社会、国際、スポーツ、文化、科学などの速報ニュースに加え、教育、医療、環境などの話題や写真も。
- twitter:card
- summary
- twitter:site
- @asahi