We're excited to announce that we've shipped some nice improvements to our broken links & mixed content checks. These checks both make use of the crawler that powers those features.
Increasing the amount of pages we crawl
When we first launched Oh Dear!, we decided to limit the crawls to the first 1.000 unique pages we find. This was mostly a protection for infinite loops, since those are really hard to detect.
Imagine having a broken website with infinite pagination, where you can keep clicking on to the next page, but still see the same content (there are several WordPress plugins out there exhibiting this behaviour). The body of the page still changes (the page-number will increment), so we can't just compare the
html-body to previous pages either.
Makes you wonder how Google fixes this, doesn't it? :-)
Now, more than a year later, we feel confident to increase the 1.000 page limit to 5.000 per website.
Thanks to major improvements in the crawler, we can now also change 2 other important options.
- Our per-page crawl delay was decreased from 500ms to 250ms
- We now crawl 2 pages concurrently per website instead of per 1
Both of these original settings were also a protection mechanism for the website: we don't want to overwhelm your server(s) by crawling too agressively.
Our goal is to be a monitoring service, not a DDoS service.
Our crawler will now be a bit faster, handle more requests concurrently and crawl more pages than before.