HTTPS surface reachable (robots ✓, sitemap ✓, title ✓)
Why it matters: Public files — robots.txt, sitemap.xml, head meta — are what attackers see first during reconnaissance. Misadvertised paths, stale sitemaps, and verbose generators leak more than intended (ISO 27001 A.8.9).
robots.txt
present
User-agent: *
Disallow: /search/
# Block known SEO crawlers and data co
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: DotBot
Disallow: /
User-agent: Zoominfobot
Disallow: /
User-agent: BlexBot
Disallow: /
User-agent: Cliqzbot
Disallow: /
User-agent: DataForSeoBot
Disallow: /
User-agent: MauiBot
Disallow: /
User-agent: MegaIndex.ru
Disallow: /
# Image crawlers
User-agent: Pixray-Seeker
Disallow: /
User-agent: Copyleaks
Disallow: /
User-agent: Plaghunter
Disallow: /
User-agent: ImageMax
Disallow: /
User-agent: Tineye
Disallow: /
User-agent: Copytrack
Disallow: /
User-agent: CopytrackBot
Disallow: /
# Allow major search engines
User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: DuckDuckBot
Disallow:
User-agent: Baiduspider
Disallow:
User-agent: YandexBot
Disallow:
User-agent: Sogou
Disallow:
User-agent: PetalBot
Disallow:
User-agent: AspiegelBot
Disallow:
User-agent: SeznamBot
Disallow:
User-agent: Bytespider
Disallow:
User-agent: VoilaBot
Disallow:
# Allow social media platforms
User-agent: Twitterbot
Disallow:
User-agent: FacebookExternalHit
Disallow:
User-agent: LinkedInBot
Disallow:
User-agent: Pinterestbot
Disallow:
User-agent: TelegramBot
Disallow:
User-agent: Discordbot
Disallow:
User-agent: WhatsApp
Disallow:
User-agent: Slackbot
Disallow:
User-agent: Redditbot
Disallow:
# Explicitly allow the following crawlers
User-agent: Verity/1.1 (https://gumgum.com/verity; verity-support@gumgum.com)
Allow: /
User-agent: Scope3/2.0 (scope3.com)
Allow: /
User-agent: AdsBot-Google
Allow: /
User-agent: AdsBot-Google-Mobile
Allow: /
User-agent: Googlebot-Image
Allow: /
User-agent: Googlebot-Video
Allow: /
User-agent: Googlebot-News
Allow: /
User-agent: Googlebot-InspectionTool
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ChatGPT-Operator
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: Claude-Web
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: CCBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
User-agent: Meta-ExternalFetcher
Allow: /
User-agent: Facebookbot
Allow: /
User-agent: AmazonAdBot
Allow: /
Sitemap: https://www.salon.com/sitemap_index.xml
sitemap.xml
present — 371 url(s)
head
- title
- Salon.com - News, Politics, Culture, Science & Food
- description
- —
social
- og:locale
- en_US
- og:type
- website
- og:url
- https://www.salon.com/?utm_source=website&utm_medium=social&utm_campaign=ogshare&utm_content=og
- og:site_name
- Salon.com
- twitter:card
- summary_large_image