HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
# robots.txt for https://www.fsf.org/
User-agent: *
Crawl-delay: 10
Disallow: /.git/
Disallow: /?set_language
Disallow: /@@search
Disallow: /associate/forum/
Disallow: /norobotsnorhumansshouldevervisithispage/
Disallow: /search
Disallow: /search_rss
Disallow: /share
Disallow: /software/
Disallow: /software/winboard/whats_new/
Sitemap: https://www.fsf.org/sitemap.xml
# Majestic - SEO
User-agent: MJ12bot
Disallow: /
# DataForSeo - SEO
User-agent: DataForSeoBot
Disallow: /
# webmeup - SEO
User-agent: BLEXBot
Disallow: /
# Ahrefs - SEO
User-agent: AhrefsBot
Disallow: /
# babbar - SEO
User-agent: barkrowler
Disallow: /
# Screamingfrog - SEO
User-agent: Screaming Frog SEO Spider
Disallow: /
# Seozoom - SEO
User-Agent: ZoomBot
Disallow: /
# Brandwatch - SEO
User-agent: magpie-crawler
Disallow: /
# Begin Moz - SEO
# Not to be confused with Mozilla.
User-agent: DotBot
Disallow: /
User-agent: rogerbot
Disallow: /
# End Moz - SEO
# Begin Semrush - SEO
User-agent: SemrushBot
Disallow: /
User-agent: SiteAuditBot
Disallow: /
User-agent: SemrushBot-BA
Disallow: /
User-agent: SemrushBot-SI
Disallow: /
User-agent: SemrushBot-SWA
Disallow: /
User-agent: SplitSignalBot
Disallow: /
User-agent: SemrushBot-OCOB
Disallow: /
# End Semrush - SEO
# cognitiveSEO - SEO
User-agent: JamesBOT
Disallow: /
# oncrawl - SEO
User-agent: Oncrawl
Disallow: /
# BEGIN Awario - Marketing
User-agent: AwarioRssBot
Disallow: /
User-agent: AwarioSmartBot
Disallow: /
User-agent: AwarioBot
Disallow: /
# END Awario - Marketing
# SERPSTAT - SEO
User-agent: serpstatbot
Disallow: /
# website-datenbank.de - Search engine?
User-agent: netEstate NE Crawler
Disallow: /
# Ignores Crawl-delay and does not help us.
User-Agent: panscient.com
Disallow: /
# Latvian Academic Integrity - Plagiarism
User-agent: AcademicBotRTU
Disallow: /
# TurnItIn - Plagiarism
User-agent: TurnitinBot
Disallow: /
# CheckMarkNetwork - Trademark
User-agent: CheckMarkNetwork
Disallow: /
# SEOkicks - SEO
User-agent: SEOkicks
Disallow: /
# Timpi NFT
User-Agent: Timpibot
Disallow: /
# Seobility - SEO
User-agent: Seobility
Disallow: /
# BigSight - Sec
User-agent: BitSightBot
# Meltwater - SEO
User-agent: linkfluence
Disallow: /
# adbeat - Ads
User-agent: adbeat_bot
Disallow: /
# BrandVerity - Ads
User-agent: BrandVerity
Disallow: /
# peer39 - Ads
User-agent: peer39_crawler
Disallow: /
# Pipl - Spy
User-agent: PiplBot
Disallow: /
# Scrapy - Reckless scraper https://github.com/scrapy/scrapy/issues/6597
User-agent: Scrapy
Disallow: /
# BubiNG crawls too fast.
User-agent: BUbiNG
Disallow: /
# undici can be configured to crawl too fast.
User-agent: undici
Disallow: /
# Does not listen to wildcard Disallows.
User-agent: YisouSpider
Disallow: /
# ceramic - Ignores Crawl-delay
User-agent: TerraCotta
Disallow: /
# HaloScan SEO
User-agent: HaloBot
Disallow: /
# seranking.com SEO
User-agent: SERankingBacklinksBot
Disallow: /
sitemap.xml
present — 212 url(s)
head
- title
- Front Page — Free Software Foundation — working together for free software
- description
- —
social
no OpenGraph or Twitter meta tags found