HTTPS surface reachable (robots ✓, sitemap ✗, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# 16/08/2019
# Il est interdit d'utiliser des robots d'indexation Web ou d'autres méthodes automatiques de feuilletage ou de navigation sur ce site Web.
# Nous interdisons de crawler notre site Web en utilisant un agent d'utilisateur volé qui ne correspond pas à votre identité.
# « Violation du droit du producteur de base de données - article L 342-1 et suivant le Code de la propriété intellectuelle ».
# Nous vous invitons à nous contacter pour contracter une licence d'utilisation. Seuls les partenaires sont habilités à utiliser nos contenus pour un usage autre que strictement individuel.
User-agent: *
Allow: /ws/1/live/*
Allow: /ws/1/related_content/*
Disallow: /ajax/
Disallow: /ajah/
Disallow: /api/
Disallow: /beta
Disallow: /element/commun/afficher/
Disallow: /petites-annonces/
Disallow: /qui-sommes-nous/
Disallow: /txt/
Disallow: /verification/source/*
Disallow: /noscript/
Disallow: /ws/*
Disallow: /lemonde-beta/*
Disallow: /_rprt/*
Disallow: /layout/*
Disallow: /cgi-bin/*
Disallow: /envoyer-par-email/*
Disallow: /lmdgft/*
Disallow: /article-offert/*
Disallow: /*?s=43260*
Disallow: /*?contributions
Disallow: */reactions/
Disallow: */mmpub/
# WordPress
Disallow: /blog/*/wp-admin/
Disallow: /blog/*/wp-includes/
Disallow: /blog/*/wp-content/plugins/
Disallow: /blog/*/wp-content/themes/
Disallow: /blog/*/wp-login.php
Disallow: /blog/*/wp-register.php
Disallow: /blog/*/author/admin/
# Recherche
Disallow: /recherche/?*search_keywords=*
# Sitemaps
Sitemap: https://www.lemonde.fr/sitemap_news.xml
Sitemap: https://www.lemonde.fr/sitemap_index.xml
# Sitemaps EN
Sitemap: https://www.lemonde.fr/en/sitemap_news.xml
Sitemap: https://www.lemonde.fr/en/sitemap_index.xml
User-agent: Googlebot-Image
Allow: /image/
User-agent: Googlebot-News
Disallow: /archives/
# Robots exclus de toute indexation.
User-agent: Meltwater
Disallow: /
User-agent: Cision
Disallow: /
User-agent: Talkwater
Disallow: /
User-agent: Jetbot
Disallow: /
User-agent: kbcrawl
Disallow: /
User-agent: Newzbin
Disallow: /
User-agent: Qwam content intelligence
Disallow: /
User-agent: flipboard
Disallow: /
User-agent: Youmag
Disallow: /
User-agent: Synthesio
Disallow: /
User-agent: trendybuzz
Disallow: /
User-agent: scoop.it
Disallow: /
User-agent: linkfluence
Disallow: /
User-agent: grub-client
Disallow: /
User-agent: ia_archiver-web.archive.org
Allow: /$
Disallow: /*
User-agent: k2spider
Disallow: /
User-agent: libwww
Disallow: /
User-agent: wget
Disallow: /
User-agent: 5erue
Disallow: /
User-agent: adequat
Disallow: /
User-agent: adequat-systems
Disallow: /
User-agent: coexel
Disallow: /
User-agent: leadbox
Disallow: /
User-agent: mention
Disallow: /
User-agent: mytwip
Disallow: /
User-agent: opinion-tracker
Disallow: /
User-agent: proxem
Disallow: /
User-agent: score3
Disallow: /
User-agent: vecteurplus
Disallow: /
User-agent: verticalsearch
Disallow: /
User-agent: vsw
Disallow: /
User-agent: winello
Disallow: /
User-agent: Fetch
Disallow: /
User-agent: infoseek
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: sitecheck.internetseer.com
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: Zealbot
Disallow: /
User-agent: asknread.com
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Webzio-Extended
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Timpibot
Disallow: /
User-agent: AI2Bot
Disallow: /
User-agent: Applebot
Disallow: /
User-agent: cohere-training-data-crawler
Disallow: /
User-agent: DuckAssistBot
Disallow: /
User-agent: Kangaroo Bot
Disallow: /
User-agent: PanguBot
Disallow: /
User-agent: MistralAI-User
Disallow: /
# Cas particulier pour les bots de Facebook
User-agent: face
head
- title
- Le Monde in English – World news, culture and opinion
- description
- France
social
- og:site_name
- Le Monde.fr
- og:type
- website
- og:locale
- en_US
- og:image:type
- image/jpeg
- og:title
- Le Monde - World news, culture and opinion from the unique perspective of the leading French newspaper
- og:description
- Le Monde - World news, culture and opinion from the unique perspective of the leading French newspaper
- og:url
- https://www.lemonde.fr/en/
- og:image
- https://asset.lemde.fr/medias/img/social-network/default.png
- og:image:width
- 1880
- og:image:height
- 984
- twitter:site
- @LeMonde_EN
- twitter:url
- https://www.lemonde.fr/en/
- twitter:card
- summary
- twitter:image
- https://asset.lemde.fr/medias/img/social-network/default.png
- twitter:title
- Le Monde - World news, culture and opinion from the unique perspective of the leading French newspaper
- twitter:description
- Le Monde - World news, culture and opinion from the unique perspective of the leading French newspaper