HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# ===========================================
# ubuntu.com robots.txt
# Strategy:
# - Block private/transactional/search paths for all.
# - Group major AI Retrieval & Crawling bots to optimize crawl budget.
# - Allow high-value product/doc paths; block noise.
# ===========================================
# ===========================================
# DEFAULT RULES — all crawlers
# ===========================================
User-Agent: *
Disallow: /search
Disallow: /search*
Disallow: /*/search*
Disallow: /account
Disallow: /account/*
Disallow: /login
Disallow: /logout
Disallow: /pro/dashboard
Disallow: /pro/users
Disallow: /pro/account-users
Disallow: /pro/subscribe
Disallow: /pro/activate
Disallow: /pro/attach
Disallow: /pro/offer
Disallow: /pro/offers
Disallow: /pro/renewals/
Disallow: /pro/contracts/
Disallow: /pro/trial/
Disallow: /pro/set-auto-renewal
Disallow: /pro/user-subscriptions
Disallow: /pro/distributor/users
Disallow: /pro/distributor/invoice
Disallow: /pro/distributor/thank-you
Disallow: /account.json
Disallow: /mirrors.json
Disallow: /pro/subscriptions.json
Disallow: /pro/offers.json
Disallow: /pro/channel-offers.json
Disallow: /thank-you
Disallow: /*/thank-you
Disallow: /blog/draft-blogs
Disallow: /blog/draft-blogs/*
Disallow: /tests/
Disallow: /tests/*
Disallow: /sentry-debug
Disallow: /mobile
Disallow: /mobile/*
Disallow: /phone
Disallow: /phone/*
Disallow: /tablet
Disallow: /tablet/*
Disallow: /tv
Disallow: /tv/*
Disallow: /devices
Disallow: /devices/*
Disallow: /credentials/exam*
Crawl-delay: 1
# ===========================================
# AI OPTIMIZED RULES
# Includes: OpenAI (Browsing & Crawling), Perplexity, and Anthropic
# ===========================================
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: meta-externalagent
User-agent: PerplexityBot
User-agent: cohere-ai
User-agent: Bytespider
# Nudge toward Markdown endpoints
Allow: /*?format=md
# High-value Content Priority
Allow: /server
Allow: /desktop
Allow: /cloud
Allow: /openstack
Allow: /kubernetes
Allow: /ceph
Allow: /containers
Allow: /core
Allow: /ai
Allow: /pro
Allow: /landscape
Allow: /security
Allow: /internet-of-things
Allow: /embedded
Allow: /hpc
Allow: /real-time
Allow: /confidential-computing
Allow: /enterprise-store
Allow: /kernel
Allow: /toolchains
Allow: /robotics
Allow: /certified
Allow: /about
Allow: /community
Allow: /download
Allow: /pricing
Allow: /training
Allow: /credentials
Allow: /support
Allow: /managed
Allow: /managed-infrastructure
Allow: /aws
Allow: /azure
Allow: /gcp
Allow: /dell
Allow: /ibm
Allow: /nvidia
Allow: /hpe
Allow: /supermicro
Allow: /blender
Allow: /blog
Allow: /tutorials
Allow: /appliance
Allow: /cpu-compatibility
Allow: /what-is-enterprise-linux
# Block "Noise" (Forms, fragments, and archives that exhaust context windows)
Disallow: /search
Disallow: /search*
Disallow: /*/search*
Disallow: /account
Disallow: /account/*
Disallow: /account.json
Disallow: /mirrors.json
Disallow: /login
Disallow: /logout
Disallow: /contact-us
Disallow: /contact-us/*
Disallow: /*/contact-us
Disallow: /thank-you
Disallow: /*/thank-you
Disallow: /engage
Disallow: /engage/*
Disallow: /takeovers
Disallow: /takeovers.json
Disallow: /templates/
Disallow: /frame
Disallow: /marketo/submit
Disallow: /blog/feed
Disallow: /blog/archives
Disallow: /blog/tag/
Disallow: /blog/author/
Disallow: /blog/topic/
Disallow: /blog/group/
Disallow: /blog/latest-news
Disallow: /blog/events-and-webinars
Disallow: /blog/draft-blogs
Disallow: /blog/draft-blogs/*
Disallow: /engage/resources.json
Disallow: /engage/metadata.json
Disallow: /pro/dashboard
Disallow: /pro/users
Disallow: /pro/account-users
Disallow: /pro/subscribe
Disallow: /pro/activate
Disallow: /pro/attach
Disallow: /pro/offer
Disallow: /pro/offers
Disallow: /pro/renewals/
Disallow: /pro/contracts/
Disallow: /pro/trial/
Disallow: /pro/set-auto-renewal
Disallow: /pro/user-subscriptions
Disallow: /pro/subscriptions.json
Disallow: /pro/offers.json
Disallow: /pro/
sitemap.xml
present — 15 url(s)
head
- title
- Enterprise Open Source and Linux
| Ubuntu
- description
-
Ubuntu is the modern, open source operating system on Linux for the enterprise server, desktop, cloud, and IoT.
social
- og:type
- website
- og:url
- https://ubuntu.com/
- og:site_name
- Ubuntu
- og:title
- Enterprise Open Source and Linux | Ubuntu
- og:description
-
Ubuntu is the modern, open source operating system on Linux for the enterprise server, desktop, cloud, and IoT.
- og:image
- https://assets.ubuntu.com/v1/47f12466-og_%20ubuntu.png
- twitter:account_id
- 4503599627481511
- twitter:site
- @ubuntu
- twitter:title
- Enterprise Open Source and Linux | Ubuntu
- twitter:description
-
Ubuntu is the modern, open source operating system on Linux for the enterprise server, desktop, cloud, and IoT.
- twitter:card
- summary_large_image
- twitter:image
- https://assets.ubuntu.com/v1/47f12466-og_%20ubuntu.png