HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# Robots.txt file CCF
# Updated 11/11/2025
User-agent: AmazonAdBot
# Only allow crawling of areas where we show advertisements
Allow: /health/
Allow: /watch/
Disallow: /
# --- Default directives for all other crawlers ---
User-agent: *
Disallow:
Crawl-delay: 10
Disallow: /Search
Disallow: /search
Disallow: /*/Search
Disallow: /*/search
Disallow: /*?date=*
Disallow: /*?q=*
Disallow: /*?view=*
Disallow: /ccf/media/files/*guide.pdf
Disallow: /cache/clearcache/
Disallow: /atoz/healthinformationletterstatus/
Disallow: /atoz/healthinformationpages/
Disallow: /atoz/healthinformationribbonlinkvisibility/
Disallow: /atoz/institutesdepartmentsletterstatus/
Disallow: /atoz/institutesdepartmentspages/
Disallow: /ribbon/locationspecialtylists/
Disallow: /provider/physicianratings/
Disallow: /provider/image/
Disallow: /video/panel/
Disallow: /clinicaltrial/jsonresults/
Disallow: /patientstory/jsonresults/
Disallow: /patientstory/forsearchindex/
Disallow: /location/waittimes/
Disallow: /treatment-guides/
Sitemap: https://my.clevelandclinic.org/sitemap.xml
sitemap.xml
present — 2 url(s)
head
- title
- Cleveland Clinic: Every Life Deserves World Class Care
- description
- Cleveland Clinic, a non-profit academic medical center, provides clinical and hospital care and is a leader in research, education and health information.
social
- og:title
- Access Anytime Anywhere | Cleveland Clinic
- og:description
- Cleveland Clinic
- og:image
- https://my.clevelandclinic.org:443/-/scassets/images/org/logo/logo-ccf.svg
- og:url
- https://my.clevelandclinic.org/
- og:type
- website
- og:site_name
- Cleveland Clinic