Can I add Oh Dear to my robots.txt?
Yes. Our crawler follows robots.txt, so you can tell it which parts of your site to stay out of, same as you would for Googlebot or any other well-behaved bot.
A typical robots.txt
Most robots.txt files look something like this:
User-agent: * Disallow:
That tells every crawler that respects robots.txt that it can visit every page.
Adding rules for Oh Dear
To limit what we crawl, add a block for our user agent:
User-agent: OhDear Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /admin/
This tells our crawler to skip those paths while leaving other bots untouched. You can use any disallow rules you like.
To block us completely:
User-agent: OhDear Disallow: /
Ignoring robots.txt (if you need to)
Sometimes you want us to crawl a page that robots.txt blocks for everyone, for example a staging area we need to keep monitoring. In that case, head to your monitor's broken links or mixed content settings and toggle Respect robots.txt off. We'll then crawl the full site regardless of your rules.
Want to see our user agent in full? Here's exactly what we send.