HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
User-agent: Mediapartners-Google*
Disallow:
User-agent: Nutch
Crawl-delay: 5
Disallow:
User-agent: Slurp
Disallow: /*.gif$
Disallow: /*.jpg$
# --- Amazon / AWS crawlers (begin) ---
# AmazonAdbot is intentionally allowed (ad targeting — see HLMR-6504)
User-agent: AmazonAdbot
Disallow:
User-agent: Amazonbot
Disallow: /
User-agent: Amzn-SearchBot
Disallow: /
User-agent: Amzn-User
Disallow: /
User-agent: amazon-kendra
Disallow: /
User-agent: amazon-QBusiness
Disallow: /
User-agent: bedrockbot
Disallow: /
User-agent: aws-quick-on-behalf-of-
Disallow: /
User-agent: AmazonProductDiscoverybot
Disallow: /
User-agent: AmazonBuyForMe
Disallow: /
User-agent: AmazonSellerInitiatedListing
Disallow: /
User-agent: NovaAct
Disallow: /
# --- Amazon / AWS crawlers (end) ---
User-agent: *
Crawl-delay: 5
Disallow: /linkfwd.php
Disallow: /counters.php
# Wordpress Previews
Disallow: /articles/mnt-*
Disallow: /program/mnt-*
# API Routes
Disallow: /api/*
# Invalid URLs
Disallow: */null$
Disallow: */inline$
User-agent: GPTBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: omgili
Disallow: /
User-agent: Timpibot
Disallow: /
User-agent: Webzio-Extended
Disallow: /
User-agent: DDM-DCipher/1.0.7
Disallow: /
# Sitemaps
Sitemap: https://www.medicalnewstoday.com/sitemap.xml
# Widget Sampler
Disallow: /articles/widget-sampler
# Static Test Articles
Disallow: /test/
# Block AMP URLs
Disallow: /amp/
sitemap.xml
present — 2 url(s)
head
- title
- Medical and health information | MedicalNewsToday
- description
- Medical news and health news headlines posted throughout the day, every day
social
- og:title
- Medical and health information | MedicalNewsToday
- og:description
- Medical news and health news headlines posted throughout the day, every day
- og:type
- article
- og:url
- https://www.medicalnewstoday.com
- og:image
- https://assets.medicalnewstoday.com/content/mnt_sharing.png
- twitter:title
- Medical and health information | MedicalNewsToday
- twitter:description
- Medical news and health news headlines posted throughout the day, every day
- twitter:card
- summary_large_image
- twitter:image
- https://assets.medicalnewstoday.com/content/mnt_sharing.png