[SEO] Robot.txt 寫法

Search Engine 去收集網頁的data, 主要會利用web robot 去screen 網站中的內容, 並更新其database. 然而, 部份網站並不樂見有robot access 其網頁, 除了產生不必要的network traffic 外, 而且還有機會screen到敏感資料. 因此便出現了Robot Exclusion Protocal, 一種common command去指令robot 去screen 哪兒.

為了實行Robot Exclusion Protocal, 網站的root directory 通常會有一個/robot.txt, 當search engine 開始screen 時, 會找有沒有這一個檔案. 若果有, 會讀取其內容. robot.txt 內容如下: 

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

上便的例子是不容許robot 去screen /cgi-bin/ , /tmp/ , ~joe/ 的內容, 而它亦可以對不同的search engine有不同的應對方法, 例如:

User-agent: Google
Disallow:

User-agent: *
Disallow: /

以上例子只容許Google 去screen其內容, 其他不可.

Reference

About C.H. Ling 260 Articles
a .net / Java developer from Hong Kong and currently located in United Kingdom. Thanks for Google because it solve many technical problems so I build this blog as return. Besides coding and trying advance technology, hiking and traveling is other favorite to me, so I will write down something what I see and what I feel during it. Happy reading!!!

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.