What is a 'robots.txt' file?

Have you ever come across a file called robots.txt in the root of a website and wondered if someone was pulling your leg? You're not alone. It's not the most glamorous part of managing a site, but it plays a pretty important role in the way search engines and bots interact with your content.

So, let’s break it down: what does that oddball text file actually do, and why should you care?

What is robots.txt, exactly?

In essence, robots.txt is a simple text file placed in the root directory of your website (so like: yourdomain.com/robots.txt). It gives instructions to web crawlers—like Googlebot, Bingbot, and other automated tools about what they can and can’t access on your site.

In short: it’s your website’s way of saying, “Hey robots, here’s what I’d like you to do (or not do).”

Why do websites use robots.txt?

Websites use robots.txt for a bunch of reasons, mostly centered around controlling crawl behavior and protecting resources. A few practical examples:

Prevent duplicate content from being crawled (like print versions of pages or filtered product listings).
Block access to sensitive files or folders, like /admin/, /cgi-bin/, or temporary staging content.
Stop crawlers from wasting server resources by limiting access to certain scripts or large files.
Delay indexing of unfinished pages that aren't ready for the public eye, or if you're testing certain pages.

That last one is kind of like putting a “Do Not Enter” sign on certain rooms in your house before company comes over.

A word of warning

It might come across as if you can 'hide' certain pages from the internet by excluding them in your robots.txt file, but that's not necessarily the case. As per Google, if other pages point to your page with descriptive text, Google could still index the URL without visiting the actual page.

What does a robots.txt file look like?

It’s plain text, no HTML, no fancy formatting. Just rules. Here’s a super basic example:

User-agent: *
Disallow: /private/

This tells all bots to stay away from the /private/ directory. A more targeted version might look like this:

User-agent: Googlebot
Disallow: /drafts/

This politely tells only Google’s bot to skip crawling anything in the /drafts/ folder. Want to allow access to everything? That’s even easier:

User-agent: *
Disallow:

Indeed, an empty 'Disallow' means: “Go ahead, crawl all over me, baby.”

Does robots.txt do anything for security?

Here’s the catch: robots.txt is a polite request, not a security lock. Bots can choose to ignore it. Malicious crawlers often do.

So if there’s a page you really don’t want anyone accessing—like an admin dashboard or a hidden login—you’re better off blocking it through authentication, proper permissions, or an .htaccess rule. Putting it in robots.txt alone won’t keep it private.

In fact, telling bots not to go somewhere sometimes acts like a giant neon sign for bad actors: “Here’s the one place we don’t want you to look!”

So use it wisely.

Common use cases for robots.txt

Let’s look at a few examples of how people use robots.txt in the real world.

Blocking internal or staging pages

You’re working on a redesign or testing something? This tells crawlers not to index that folder while it’s under construction:

User-agent: *
Disallow: /staging/

Preventing duplicate content

E-commerce sites, for example, often generate multiple URLs with slight variations. This stops bots from crawling every single version:

User-agent: *
Disallow: /search/
Disallow: /filter/

Allowing specific bots only

This one tells all bots to go away—except Googlebot. Useful if you’re prioritizing one search engine or troubleshooting crawl issues.

User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

Does robots.txt affect SEO?

This is probably the main reason why most people, and by most people I mean marketers, care about this ol' plaintext file. The short answer is yes. But how it affects SEO depends on how you use it, obviously.

Pros:

You can prevent low-quality or duplicate pages from being indexed.
You can guide crawl budgets more efficiently (especially on large sites).
You can protect sensitive areas from being flooded with bot traffic.

Cons:

You can accidentally block important pages from being crawled.
Prohibiting access to CSS/JS files can tamper with rankings.
You might forget to remove disallow rules once a page is ready to be indexed.

So yes, handle your robots.txt file with care. One bad rule can tank your visibility if you’re not careful. Honestly it might just be better to just not tamper with it too much, at all.

Where to find your own robots.txt file

This one's easy as pie, just go to yourdomain.com/robots.txt

You can also use Google Search Console’s Tester tool to validate your rules and see if they’re doing what you expect.

About third-party tools and robots.txt

For example—and this is completely random—say you’re using a website monitoring tool like Oh Dear, and you notice their crawler showing up in your logs. You might be wondering: "Should I add it to my robots.txt file as well then?"

We actually have an entry about that in our FAQ page. You don't have to do anything, really. Oh Dear doesn't rely on your robots.txt file for proper functioning.

That said, if you want to explicitly allow Oh Dear (just for peace of mind), you can add this line and include the pages or folders you'd like Oh Dear to skip over:

User-agent: OhDear
Disallow: /junk/

This tells the Oh Dear bot it’s welcome to crawl all parts of your site except for anything located in the junk folder of your site's structure.

And that's all there is to it, folks. Nothing flashy but the higher the stakes, the more small stuff like this becomes important. Done right, it helps you manage traffic, prevent unnecessary crawl load, and sort of help guide search engines toward where you want them to go.