Mod Release: GoogleBot Detector (1.4.1)

Mods etc.

Moderator: Integra Moderator

Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: ZacFields » Mon Apr 09, 2007 10:26 am

GoogleBot Spider Detector

I think many people will enjoy this mod. It detects the Google spider's presence on your forums and logs the EXACT URL's that it is visiting. This serves many purposes such as:

1. Identifying a possible googlebot problem (googlebot has been known to hit websites too heavy and ignore crawl delays set on the robots.txt file)

2. Seeing specifically which pages on your site google is indexing.

3. OR just plain seeing how much and how often google visits your site.

Five minutes after I installed this mod I had several google hits already. Overnight Google had hit over 800 pages on my site. Very interesting information to see and this mod is very simple to install.

This mod works with 1.4.1 but I do not know if it works with 1.4.0 but it should.

Here is the download link:
http://www.brokencar.net/im_mods/googlebot.zip

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

Re: Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: .QUACK.Major.Pain » Mon Apr 09, 2007 2:43 pm

Sounds cool.

Got it working but think it's not registering all of them.

Usually when I go on our site, there are 2-4 googlebot ip's in the ACP index
Will have to check tomorrow and check the count.

Does it show a duplicate ip if same returns at another time?

I get hit by googlebot so far: 7 pages 23 visits.
Lycos I get way more a day: 424 pages 93 visits
Last edited by .QUACK.Major.Pain on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

.QUACK.Major.Pain
Sr Integra Member
Sr Integra Member
 
Posts: 986
Likes: 0 post
Liked in: 0 post
Joined: Sat Jan 27, 2007 10:15 am
Cash on hand: 0.00

PostAuthor: ZacFields » Mon Apr 09, 2007 9:22 pm

^seems to be registering all of them for me... or at least I hope so with well over 700 hits between 2am this morning and 11am this morning. lol

I believe the mod just searches for the name "googlebot" from all I could tell in the source code. not any specific ip address or IP range.

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

Re: Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: .QUACK.Major.Pain » Tue Apr 10, 2007 3:49 am

I checked last night before I went to bed, and the ACP index showed the googlebot ip logged in.
When I checked this morning, there was no record of any googlebots.
I'm sure your site is older and propably why more hits.
Last edited by .QUACK.Major.Pain on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

.QUACK.Major.Pain
Sr Integra Member
Sr Integra Member
 
Posts: 986
Likes: 0 post
Liked in: 0 post
Joined: Sat Jan 27, 2007 10:15 am
Cash on hand: 0.00

Re: Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: Whisky » Tue Apr 10, 2007 5:38 am

2 minutes after having installed this I already have 15 records lol <img>

Do you think it's possible to display the bot in the users online box?
Last edited by Whisky on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.
I am the Lizard King, I can do anything

Whisky
Sr Integra Member
Sr Integra Member
 
Posts: 256
Likes: 0 post
Liked in: 0 post
Joined: Thu May 18, 2006 1:28 am
Cash on hand: 0.00
Location: Brussels

PostAuthor: ZacFields » Tue Apr 10, 2007 9:49 am

^That is something I'd like to work on. But as of right now it's not possible because that mod has not yet been ported to IM.

I am also working on porting a mod right now that will allow you to see how many results are shown on each search engine by searching for your site name. You'de be able to access this with a single click from your ACP but as of right now only 2 of the 6 search engines are working so I'm trying to update the mod to make it work. However, it is somewhat of an older mod so it might not work out.

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

Re: Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: jtadmin » Tue Apr 10, 2007 10:36 am

Does anyone know if this works with 1.4.0?
Last edited by jtadmin on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.
User avatar
jtadmin
Newbie
Newbie
 
Posts: 8
Likes: 0 post
Liked in: 0 post
Joined: Tue Jun 20, 2006 8:57 am
Cash on hand: 0.00

PostAuthor: ZacFields » Tue Apr 10, 2007 10:46 am

jt: I haven't actually tested it, but I am almost 100% confident that it will work with 1.4.0.

Give it a try, there are only a couple file edits. SHould only take about 5 minutes. let us know if it works.

That being said, according to my logs, as of 1:00 yesterday, so approximately 24 hours time. There are 1,980 hits from googlebot on my site.

The problem I've noticed with googlebot is that they can get around your robots.txt restrictions by sending more than one googlebot IP to your site. Sometimes I have 5-10 google IP's on my site so even though I have a 120 second crawl delay it doesn't make a difference when there's so many different bots on.

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

PostAuthor: jtadmin » Tue Apr 10, 2007 11:15 am

"ZacFields";p="23921" wrote:jt:
The problem I've noticed with googlebot is that they can get around your robots.txt restrictions by sending more than one googlebot IP to your site. Sometimes I have 5-10 google IP's on my site so even though I have a 120 second crawl delay it doesn't make a difference when there's so many different bots on.

Zac


Is this going to effect the performance of my website. Today I had to figure out how to manually remove over 600 pending bots and sessions from the database. I was wondering what this add-on will give me over what currently in place for bot management.
Last edited by jtadmin on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.
User avatar
jtadmin
Newbie
Newbie
 
Posts: 8
Likes: 0 post
Liked in: 0 post
Joined: Tue Jun 20, 2006 8:57 am
Cash on hand: 0.00

PostAuthor: ZacFields » Tue Apr 10, 2007 11:21 am

The only thing this mod is really good for is telling you specifically which URL's are being visited by the googlebot.

It gives you the date/time and then the exact URL so you can see which topics googlebot has already spidered.

I haven't noticed any performance difference after installing this modificatino. it is useful to me to be able to see when googlebot is hitting my site too hard. When your forum is running slow it's a pretty easy way to tell if googlebot is simply hitting you too hard.

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

Re: Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: .QUACK.Major.Pain » Tue Apr 10, 2007 1:10 pm

My forum is only a couple of months old, but in the last 23+ hours I haven't had any gogglebots.
But I have had 28 Lycos bots.
As I am writing this, googlebot is on my ACP index. It was also on this morning when I looked. This would lead me to think that this googlebot and bot management doesn't register all of them, unless it's a bot I have already added from the pending bots, then I could understand that it probably passes over already added ones.
Last edited by .QUACK.Major.Pain on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

.QUACK.Major.Pain
Sr Integra Member
Sr Integra Member
 
Posts: 986
Likes: 0 post
Liked in: 0 post
Joined: Sat Jan 27, 2007 10:15 am
Cash on hand: 0.00

PostAuthor: ZacFields » Tue Apr 10, 2007 1:17 pm

can't imagine why it's not working for you. i'm getting about 100% success rate. all it does is search for the hostname "googlebot" which all the googlebots have in their hostname.

Did you remember to perform the SQL query from the instructions? I would assume you'de get an error if you hadn't but just an idea.

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

Re: Mod Release: GoogleBot Detector (1.4.1)

PostAuthor: .QUACK.Major.Pain » Tue Apr 10, 2007 1:27 pm

I did that.
I checked my database and there is a phpbb_googlebot thingy was there. (don't recall the proper name but it was there)

I got 3 or 4 in the first 20 minutes after installing, but nothing since.
Last edited by .QUACK.Major.Pain on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

.QUACK.Major.Pain
Sr Integra Member
Sr Integra Member
 
Posts: 986
Likes: 0 post
Liked in: 0 post
Joined: Sat Jan 27, 2007 10:15 am
Cash on hand: 0.00

PostAuthor: ZacFields » Tue Apr 10, 2007 1:31 pm

that's rather odd. I'd say just leave it for a few more days and see if anything turns up. Could be some sort of compatibility issue or something with your php version.

Zac
Last edited by ZacFields on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.

ZacFields
Sr Integra Member
Sr Integra Member
 
Posts: 426
Likes: 0 post
Liked in: 0 post
Joined: Wed May 24, 2006 10:14 pm
Cash on hand: 0.00

PostAuthor: tekguru » Tue Apr 10, 2007 9:31 pm

417 pages here in 6 hours! Does this indicate that the site is getting over googled?
Last edited by tekguru on Wed Dec 31, 1969 4:00 pm, edited 1 time in total.
[size=99px]http][/size]
[url=http][img=left]http://www.4winmobile.com/news/MVP_Horizontal_FullColor.png[/img][/url]
User avatar
tekguru
Sr Integra Member
Sr Integra Member
 
Posts: 329
Likes: 0 post
Liked in: 0 post
Joined: Tue Mar 28, 2006 10:29 pm
Cash on hand: 0.00

Next

Return to IntegraMOD Modifications

Who is online

Registered users: Bing [Bot], Google [Bot], Majestic-12 [Bot]