HTTPS surface reachable (robots ✓, sitemap ✗, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
user-agent: *
Allow: /
Allow: */cricket/test/
Allow: */cricketworldcup2019/Test/
Disallow: /toets/
Disallow: */toets/*
Disallow: /test/
Disallow: */_test/*
Disallow: */testpolar/*
Disallow: */test/*
Disallow: /xArchive/Archive/Illegal-liquor-export-20010319
Disallow: /.well-known/
Disallow: /assetlinks.json
User-agent: Twitterbot
Allow: /
User-agent: ia_archiver
Disallow: /BreakingNewsSms
User-Agent: MauiBot
Disallow: /
# AI Assistants
User-agent: chatgpt-user
Disallow: /
# AI Data Scrapers
User-agent: bytespider
Disallow: /
User-agent: ccbot
Disallow: /
User-agent: diffbot
Disallow: /
User-agent: facebookbot
Disallow: /
User-agent: google-extended
Disallow: /
User-agent: gptbot
Disallow: /
User-agent: omgili
Disallow: /
# AI Search Crawlers
User-agent: amazonbot
Disallow: /
User-agent: applebot
Disallow: /
User-agent: perplexitybot
Disallow: /
User-agent: youbot
Disallow: /
# Scrapers
User-agent: 008
Disallow: /
User-agent: dataprovider-com
Disallow: /
User-agent: dcrawl
Disallow: /
User-agent: httrack
Disallow: /
User-agent: httrack-3-0
Disallow: /
User-agent: metainspector
Disallow: /
User-agent: newspaper
Disallow: /
User-agent: nutch
Disallow: /
User-agent: offline-explorer
Disallow: /
User-agent: scrapy
Disallow: /
# SEO Crawlers
User-agent: ahrefsbot
Disallow: /
User-agent: barkrowler
Disallow: /
User-agent: blexbot
Disallow: /
User-agent: dataforseobot
Disallow: /
User-agent: domainstatsbot
Disallow: /
User-agent: dotbot
Disallow: /
User-agent: hypestat
Disallow: /
User-agent: linkdexbot
Disallow: /
User-agent: mj12bot
Disallow: /
User-agent: screaming-frog-seo-spider
Disallow: /
User-agent: semrushbot
Disallow: /
User-agent: semrushbot-ba
Disallow: /
User-agent: semrushbot-ct
Disallow: /
User-agent: semrushbot-si
Disallow: /
User-agent: semrushbot-swa
Disallow: /
User-agent: serpstatbot
Disallow: /
User-agent: zoombot
Disallow: /
# Undocumented AI Agents
User-agent: anthropic-ai
Disallow: /
User-agent: claude-web
Disallow: /
User-agent: claudebot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Bytespider
Disallow: /
Sitemap: https://www.news24.com/sitemap
head
- title
- Top Stories | News24
- description
social
- og:site_name
- News24
- og:type
- website
- og:image
- https://www.news24.com/images/tenants/news24/Logo.svg
- og:image:secure_url
- https://www.news24.com/images/tenants/news24/Logo.svg
- og:image:alt
- Top Stories | News24
- og:title
- Top Stories | News24
- og:description
- twitter:widgets:csp
- on
- twitter:card
- summary_large_image
- twitter:site
- news24