Why aren't all my pages crawled?

Oh Dear! crawls websites to report broken links and mixed content reporting. In some circumstances we won't crawl all pages. This page explains some of those situations.

Crawl prevented by robots.txt or similar #

If you have a robots.txt page with content similar to this, Oh Dear! will not crawl a single page on your site.

User-agent: *
Disallow: /

This content essentially tells robots (search engines like Google/Bing, but also our crawler) to not crawl any page on this site, starting from and including the root page /.

If you have a robots.txt page like this, we will report 0 pages scanned in Oh Dear!.

You can tweak which pages we can/can't crawl in your robots.txt though, for more fine grained controls.

Additionally, we also listen to both the HTML tags as well as the x-robots-tag HTTP header. If we see an HTML tag similar to this, we won't crawl that particular page:

<meta name="robots" content="noindex" />

And here's an example HTTP header that prevents robots from crawling the site:

x-robots-tag: noindex

JavaScript initiated content #

Our crawler currently does not parse JavaScript. We fetch the content from your site and its pages and look at the raw HTML (DOM) we get back to search for links.

If your HTML is empty because it is dynamically injected with JavaScript during page load, we won't be able to find and crawl any pages.

Was this page helpful to you? Feel free to reach out via support@ohdear.app or on Twitter via @OhDearApp if you have any other questions. We'd love to help!