A very sharp and experienced developer named Eric wrote the first version of similarities that made it out to the Amazon website. It was great working with Eric. I learned much from him over the years.
The first version of similarities was quite popular. But it had a problem, the Harry Potter problem.
Oh, yes, Harry Potter. Harry Potter is a runaway bestseller. Kids buy it. Adults buy it. Everyone buys it.
So, take a book, any book. If you look at all the customers who bought that book, then look at what other books they bought, rest assured, most of them have bought Harry Potter.
When I worked on the personalization team we were still struggling with the problem- there are definite ways to identify a Harry Potter problem, but you have to remember to apply them. Adding to that, within certain genres there are Harry Potter books/music albums that are only runaway successes within those genres. If you compared those books to the general list of books that amazon sells, they wouldn't look like books that everyone has bought. Taking it a step further, if then if you narrow the scope to only related books you'll find that they are crazy popular.
The biggest side effect of the Harry Potter problem is that it weakens recommendations. For instance, I've bought the O'Reilly regex pocket book and the O'Reilly Python Cookbook and Ruby Cookbook. From those three books, you can pretty easily peg me as a web nerd and safely recommend a Steve Souder's website performance book. Those are very strongly correlated purchases in a narrow band of interest. However, because I'm a geek, I've also bought Neal Stephenson's latest book, Anathem. As have a few hundred thousand OTHER geeks. We could say that Anathem is a nerd's Harry Potter.
So I received an email today from amazon with a list of recommended books, most of which were based off Anathem and Daniel Silva's latest book, Moscow Rules (great book but also a bit of a Harry Potter widely-bought book). As you might guess, the recommendations were really bad. I wish that email had a link that I could click that would say "never recommend any of these books to me again please" -I could go to each detail page and mark that, but it would take a massive amount of time.