GAIS About the Robot

 

Why is Gaisbot grabbing webpages from my website?

Gaisbot is the agent software of GAIS which
crawls web sites all over the world, in order
to build a search engine like google or altavista.

 

I do not want my website to be crawled, what should I do?

You can put a file named robotx.txt in your web server.
It is a standard way to exclude robot programs from
retrieving parts or whole of your web site.
For a detailed description about robotx.txt, please refer :
http://www.robotstxt.org/wc/norobots.html

 

Why does Gaisbot try to access some non-existing URLs from my website?

There might be some places in the web that have some stale URLs
pointing to some non-existing URLs in your web site.
Gaisbot crawls the web by following links in the pages it gathered,
and thus could access some non-existing links.

 

Why doesn't Gaisbot obey my robots.txt?

Gaisbot does try to follow the robots.txt by
filtering out URLs that are specified in the robot exclusion database.
However, updating the robot exclusion database is done in batch mode
once a day, and there is a chance that Gaisbot has
retrieved part of your web pages listed in your robots.txt
before your robots.txt be fetched and processed into
our robot exclusion database.
Once Gaisbot has noticed your robots.txt and learned the rule,
it will not grab web pages listed in your robots.txt after then.
Should there be still a question, please email
robot05@gais.cs.ccu.edu.tw

 

How frequent does Gaisbot access web pages from a server?

In order to avoid the load on the servers Gaisbot is accessing,
Gaisbot adopts a BFS (breadth first Search) algorithm in accessing the URLs,
and has tried to spread the URLs in a same site to be evenly
distributed across the access time interval.
However, different servers have different view-points regarding
the access frequency. Should you feel that Gaisbot is accessing
too frequently from your web sites, please send email to
robot05@gais.cs.ccu.edu.tw and we will look into
the access log and adjust the behaviour.

 

What should I do if I want Gaisbot to stop accessing my site as soon as possible?

Please send an email to robot05@gais.cs.ccu.edu.tw
to tell us that you want Gaisbot to stop accessing your website.
Please specify the domain name of your server in the request.

 

What is the user-agent name that I should specify in robots.txt to disallow the Gaisbot from crawling my web site?

The user-agent name is "Gaisbot" and you may disallow it from
crawling your web site by putting the following text in your
robots.txt file:
User-agent: Gaisbot
Disallow: /

 

 

Copyright (C)2001 GAIS. All rights reserved