How to control search engine crawlers?

  • February 10, 2015
  • blog
No Comments

metarobots-seo

We will discuss robots.txt tips and tutorial in this post. Use power of robots txt file and guide/control search engine crawlers (spiders or robots). Your website should have a robots.tx file located at the root of your website so that it can be accessed as http://redefineinfotech.com/robots.txt or http://www.redefineinfotech.com/robots.txt .

How to create/generate robots.txt file

You can simply create a basic robots txt file or you can also generate it through Google webmaster tools.

1. Open a new txt file using notepad.
2. Write following code. The below code is used to allow all website pages for crawling.

User-agent: *
Allow: /

or

User-agent: *
Disallow:

3. Click save and use file name as robots.
4. Upload this file to your website root folder
5. Browse this file using path http://www.example.com/robots.txt or http://example.com/robots.txt whatever your preferred web address.
6. Now test your robots.txt file in Google Webmaster tools.

When you encounter 500 or 404 error on accessing this file then contact your webmaster or website developer.

Robots.txt Tips : How to control search engine crawlers?

how to conrtol robots txt

Allow all webpages for crawling

User-agent: *
Allow: /

or

User-agent: *
Disallow:

Disallow specific path or folder for crawling

User-agent: *
Disallow: /folder

Robots.txt Wildcard Matching

Disallow query string URLs or extensions.

Disallow all URLs with query string
User-agent: *
Disallow: /*?

Disallow all URLs which ends with .asp
User-agent: *
Disallow: /*.asp$

Robots.txt Advanced Tips

If you have very large website then you can use crawl delay function so that crawlers may not harm your website performance. Although you can use this feature in Google webmaster tools and set your website crawl priority.

Example –

User-agent: Googlebot
Crawl-delay: 10

Where value 10 is in seconds

Write different rules for different crawlers

Example –

User-agent: *
Disallow: /folder1
Disallow: /folder2

User-agent: Googlebot
Disallow: /folder3

User-agent: bingbot
Disallow: /folder4

Source: Soni SEO

About us and this blog

We are a digital marketing company with a focus on helping our customers achieve great results across several key areas.

Request a free quote

We offer professional SEO services that help websites increase their organic search score drastically in order to compete for the highest rankings even when it comes to highly competitive keywords.

Subscribe to our newsletter!

Fields marked with an * are required

More from our blog

See all posts