What's the crawler page limit, and what happens on very large sites?

The broken links and mixed content crawler visits up to 5,000 pages per monitor per run. If your site has more, we crawl the first 5,000 reachable pages and stop.

Why there's a limit

Two reasons:

  1. Fair use across customers: a single crawler worker running for hours on one enormous site would delay every other customer's crawl.
  2. Diminishing returns: for most sites, the first few thousand pages include the homepage, navigation, primary landing pages, and the most-linked content. Issues in those pages affect most visitors. Issues on page 10,000 of a product catalogue rarely affect SEO or user experience to the same degree.

What gets crawled

The crawler starts from your monitor URL and follows internal links. It respects robots.txt (unless you've disabled that in the monitor settings) and nofollow. It skips assets (images, CSS, JS) unless you have the relevant check enabled.

Strategies for very large sites

If you genuinely need more coverage, a few patterns work well:

  • Use the sitemap monitor instead. The sitemap check takes your site's sitemap and verifies each URL independently, which scales differently from a crawler.
  • Split by section. Add multiple monitors, each anchored at a subsection of your site (example.com/blog, example.com/products, etc.). Each gets its own 5,000-page budget, so a 20,000-page site is covered in four monitors.
  • Prioritize with the sitemap. If you care most about specific URLs, make sure those appear in your sitemap and use sitemap monitoring on top of broken links.

Increasing the limit

For most sites, 5,000 pages is more than enough. If you have a clear case where it isn't (a news archive of 100,000 articles, a large product catalogue, etc.) contact support and we can discuss options.

The 20-minute crawler time limit

There's also a 20-minute time budget per crawl. Very slow sites can hit this before the 5,000-page limit. You can tune the crawler speed in the monitor's Broken Links settings to balance time-per-page against total throughput.

Related Questions

View all Our Crawler questions →

Want to get started? We offer a no-strings-attached 10 day trial. No credit card required.

Start monitoring

You're all set in
less than a minute!