[SEO] Robot.txt 寫法

Search Engine 去收集網頁的data, 主要會利用web robot 去screen 網站中的內容, 並更新其database. 然而, 部份網站並不樂見有robot access 其網頁, 除了產生不必要的network traffic 外, 而且還有機會screen到敏感資料. 因此便出現了Robot Exclusion Protocal, 一種common command去指令robot 去screen 哪兒.

為了實行Robot Exclusion Protocal, 網站的root directory 通常會有一個/robot.txt, 當search engine 開始screen 時, 會找有沒有這一個檔案. 若果有, 會讀取其內容. robot.txt 內容如下:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

上便的例子是不容許robot 去screen /cgi-bin/ , /tmp/ , ~joe/ 的內容, 而它亦可以對不同的search engine有不同的應對方法, 例如:

User-agent: Google
Disallow:

User-agent: *
Disallow: /

以上例子只容許Google 去screen其內容, 其他不可.

Reference

About /robots.txt, robotstxt.org
Create a robots.txt file, Google Support
The robots.txt file, Patrick Sexton

Ling's Note

Everything related IT, and me.

[SEO] Robot.txt 寫法

Be the first to comment

Leave a Reply Cancel reply