Can I exclude URLs from broken links checks?

Yes. The broken links check has an exclusion list per monitor for URLs you don't want us to report on.

Why you'd want to exclude a URL

A few common patterns:

  • Third-party hosts that rate-limit aggressive crawling (LinkedIn, some CDNs, certain social networks). These can return 429s for automated requests even when a real browser works fine.
  • URLs that require session cookies you don't want to share with us.
  • Ephemeral URLs, like pre-signed S3 download links that expire quickly and look broken by the time we crawl them.
  • Internal admin pages you explicitly don't want touched by the crawler.
  • URLs that 404 on purpose (like a "test your 404 page" link that's supposed to return 404).

How to add URLs to the exclusion list

  1. Open the monitor.
  2. Go to Settings > Broken Links.
  3. Find Ignored URLs (sometimes shown as "Whitelisted URLs").
  4. Add one URL pattern per line.
  5. Save.

Wildcards (*) are supported, so https://example.com/* ignores every URL on that host.

Excluding URLs from being crawled at all

The exclusion list above suppresses reports on specific URLs, but we still visit them. If you want Oh Dear to never touch a path at all, add it to the Do not crawl list instead. Those URLs are never requested.

For site-wide exclusions that apply to every bot (not just Oh Dear), use robots.txt. See Can I add Oh Dear to my robots.txt?.

For a large exclusion list, consider managing it via our API. You can keep the list in source control alongside your site configuration and sync it to Oh Dear whenever it changes.

Related Questions

View all Our Crawler questions →

Want to get started? We offer a no-strings-attached 10 day trial. No credit card required.

Start monitoring

You're all set in
less than a minute!