I’ve added some code to the lgf-referrer script I’m using to block certain domains from the list. Mostly I’ve blocked search engines, since I have them in my other list. But I just figures out two new ones to block today, http://radio.xmlstoragesystem.com/ and http://www.technorati.com/.
http://radio.xmlstoragesystem.com/ is the weblogs.com ping checking that my site has changed. I ping weblogs.com whenever I change the site. Sometimes I go there just to find random new blogs, and I figure other people do the same. But I don’t need their robot in my referrer list. I don’t like to know when someone comes from there to my site, so weblogs.com shows up in the list.
Technorati is a cool site, but they hit my site every time I come up on weblogs.com. Unfortunately blocking them means I won’t see when people come from their site to look at the site.
Well, that’s a bummer. We like Technorati to be useful to you, and blocking the indexing spider is your perogative, but I don’t understnd why you would want to do it – are we hitting your site too hard? Is it a CPU or bandwidth issue?
Always looking to make the Technorati service better…
Dave
Not so much that you are hitting it too hard, no harder than radio is. I just like my list to show real people who are hitting the site, and not robots.
I can’t tell the difference between the spider and a click from your site to my site to filter them differently.
BTW: the only thing that is blocked is you URL going into my referrer list. You spider can crawl the site as much as it wants.
Ah, OK. No problem then. You might want to filter for the exact URL ‘http://…/‘ – any click-throughs will come from deeper links.