Tail Rank or how to avoid search enging rank spam
I just read a post by Alex Barnett and another one by Ronert Scoble about Tail Rank, a new blog search engine that has been released by Kevin Burton. What got my attention was a later post by Kevin commenting on the recent hack that Tara and Alex post about hacking tech.mememorandum that reminded me of the origins of Google.
Before Google came up most of the popular search engines at that time (Altavista, Lycos, Infoseek, etc.) were suffering a huge spam problem problem with their ranking algorithms. The main issue was that their ranking algorithms were based on keyword relevance and therefore spammers figured out that if you hide a million times the word "porn" (just as an example, no particular preference on the choice) in your web site, since search engines were pretty much counting keyword frequency in pages, the ranking of your page to the search term "porn" will go through the roof and appear on top of a search. What Google did that made that was different was to come up with an spam free ranking method (although that might not have been their original motivation). The details are described in Google's original paper:"The Anatomy of a Large-Scale Hypertextual Web Search Engine" (this is when Google was google.standford.edu and I was doing my PhD so I read this paper at that time). The reason Google was spam free was the nature of the PageRank algorithm that rely on how many pages link to a particular site but also on what was the PageRank of those pages (in other words, how many pages link to the pages that link to you and so on, it was a recurrent definition and the other trick they solve was how to calculate it efficiently). So basically to get your site a higher PageRank, you have to cheat not only on your web site but on the web sites that link to you and on the ones that link to them and so on, an unfeasible proposition.
So Kevin's Tail Rank, according to him "was designed with spam prevention in mind" which is a very good think in these days of too much information and blogs available.
Kevin also mentions that: "This isn't a Google-style beta. This isn't a Web 2.0-style beta. This is the old school definition of beta where we need feedback from the community to make a better product."
Well, I tried searching for "mindcamp" and I got two pages or results. Clicking on the second page of results gets me a page that in the bottom has "Search Results" with 6 pages listed but all of them take me to either of the first two. This is "constructive criticism" as I really want to finally see a useful blog search engine and TailRank has the potential to be the one.
CD
Right now our full-text engine only matches posts within the last two weeks. I'll be expanding this a bit later.
Note that we're only indexing memes so you might not find niche content.
I might open up the full-text search index to include other posts in the future though.
Thanks!
Kevin
Posted by: Kevin Burton | November 09, 2005 at 09:22 AM