HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# Robots.txt for MLflow Documentation
# Optimized for AI crawlers to prioritize latest documentation
# Default rules for all crawlers
User-agent: *
# Allow latest documentation
Allow: /docs/latest/
# Disallow all legacy documentation versions
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/3.*/
Disallow: /docs/0.*/
# Specific rules for AI crawlers
# OpenAI (ChatGPT)
User-agent: ChatGPT-User
User-agent: GPTBot
Allow: /docs/latest/
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/3.*/
Disallow: /docs/0.*/
# Google Gemini
User-agent: Google-Extended
Allow: /docs/latest/
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/3.*/
Disallow: /docs/0.*/
# Anthropic Claude
User-agent: ClaudeBot
User-agent: Claude-Web
Allow: /docs/latest/
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/3.*/
Disallow: /docs/0.*/
# Common Crawl (used by many AI systems)
User-agent: CCBot
Allow: /docs/latest/
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/3.*/
Disallow: /docs/0.*/
# Perplexity
User-agent: PerplexityBot
Allow: /docs/latest/
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/0.*/
# Cohere
User-agent: cohere-ai
Allow: /docs/latest/
Disallow: /docs/1.*/
Disallow: /docs/2.*/
Disallow: /docs/3.*/
Disallow: /docs/0.*/
# Sitemap location
Sitemap: https://mlflow.org/sitemap.xml
Sitemap: https://mlflow.org/docs/latest/sitemap.xml
sitemap.xml
present — 2 url(s)
head
- title
- MLflow - Open Source AI Platform for Agents, LLMs & Models
- description
- —
social
no OpenGraph or Twitter meta tags found