How does Oh Dear handle large or split sitemaps?

The sitemap monitor works with both single sitemaps and sitemap indexes (files that link to other, smaller sitemaps). There are a handful of practical limits you should know about, especially if your site has a lot of URLs.

The limits

  • Up to 50 sub-sitemaps per sitemap index. If your index references more than 50 child sitemaps, the ones past that limit won't be opened on a single run.
  • 20-minute budget per run. If the crawl exceeds 20 minutes (slow responses, rate-limits, huge files), we stop and report on what we managed to check.
  • URL check limit, configurable per monitor. This caps the total number of URLs we fetch per run and protects very large sites from consuming the full time budget on a single sitemap.

These limits exist so one enormous sitemap can't block checks for every other monitor we run.

What happens when a limit kicks in

You'll see a partial result in the sitemap report. The URLs that were successfully checked will be listed, and any sub-sitemap or URL that we couldn't reach before the limit fired will be marked as skipped.

That's not a failure state. It's "this is what fit in today's run". Subsequent runs start fresh, so the same set of URLs doesn't get permanently ignored.

If your site has thousands of URLs or tens of sub-sitemaps, a few patterns work well:

  • Split by content type. Instead of one giant sitemap-index.xml pointing at 80 sub-sitemaps, break your site into logical groups (products, blog, marketing pages) and expose separate top-level sitemaps for each. Add each as its own monitor in Oh Dear.
  • Split by volume. If one sub-sitemap has hundreds of thousands of URLs, split it into smaller files (Google's own recommendation caps a single sitemap at 50,000 URLs).
  • Don't nest sitemap indexes. A sitemap index pointing at another sitemap index is allowed by the spec but adds fragility. Keep it one level deep.

Why these limits exist

We check every URL in a sitemap with a real HTTP request, which costs time and bandwidth. Without limits, a single large customer could keep a worker busy for hours at a time, which would slow down every other sitemap check we run. Predictable ceilings keep everyone's checks snappy.

If your site genuinely needs more than the default limits, get in touch with the sitemap structure you're working with and we'll look at it together.

Related Questions

View all General questions →

Want to get started? We offer a no-strings-attached 10 day trial. No credit card required.

Start monitoring

You're all set in
less than a minute!