So this post is a brain dump of something that's been tickling my brain for a while now. It's mostly a collection of poorly cohered thoughts, but they're rattling around in my head at night and preventing me from sleeping well, so I need to get them out of my brain and into a semi-permanent medium. I'd also like to get feedback on the concept... I think it could be interesting.
The starting idea is the concept of A/B testing webpages. That is, testing variations of a webpage, randomized to eliminate interference by dependent variables, to see which web page is more effective at getting a user to "convert" (i.e., do what we want.)
The next concept is that of the prediction algorithm -- using (usually past) data to predict what a user is going to do. Or in this case, which page (A or B) will drive the user to a conversion.
The Netflix Prize
Now, the best prediction algorithms in the world right now are probably those designed to win the Netflix prize, so I'll discuss those next. The Bellkor prize solution isn't quite as broadly applicable to other problems (mostly describing the weights of the winning algorithm), so its slightly less interesting, but it does convey one tidbit we should keep in mind. This is that taking a wide selection of recommendation algorithms (in the Bellkor paper they use 107 elements) and taking their results in some weighted fashion is probably better than going "deep" into one algorithm.
What's more interesting is the algorithm they published in the paper Improved Neighborhood-based Collaborative Filtering. The paper's not too difficult, but it is analyzed quite nicely here at Algorithms Analyzed (seen here), so I'll use that as a reference.
So, to make sure we're clear, if you were to compare A/B testing to the netflix recommender, movies would be the equivalent of page variants. So we'd want to predict which page variant a user would "prefer", based on some characteristic of the user and the pages. Got it? Good. Now it gets more interesting.
Neighborhood-based Collaborative Filtering
The Algorithms Analyzed explanation reduced the prediction problem down to essentially two formulas. In these formulas, the variables u,v represent two users and the variables i,j,k represent the movies.
So, in the first case, you'd be considering the similarity between two movies and combining that with the user's preference for one of the movies. In the second case, you'd be comparing between two similar users and seeking how the other person rated that movie. To consider the parallel case of A/B testing pages, you'd only use the second formula (which page similar users prefer), rather than trying to compare different pages in the A/B test (which would be impossible, if the user's never been to your site before.)
And one of the best things about this concept is that the hard work isn't done on the fly. The most difficult part of the process is deriving all the weights for the equation, and that can be done from log data. How to accomplish this is left as an exercise to the reader; you could use Singular Value Decomposition if you want high precision, or just use a neural network or genetic algorithm for a good approximation that is simpler than doing the SVD. Regardless, once you figure out the weights, the only thing that needs to be done in real-time is *applying* the formula, which is just a few lines of code.
Linearity and Variable Dependence
One possible problem with the nearest neighbor algorithm as defined above is that for simplicity sake, all variables are assumed to be linear functions, neither of which may hold -- certainly not linearity, and most likely not even the assumption they're functions (i.e., do they pass the vertical line test?). The other is the idea of variable dependence, which I will illustrate anecdotally from our A/B testing example -- if all variables are independent, then it assumes all rich people will respond a certain way, regardless of if they're in the east, west, or south. So it precludes even the possibility that there's a function where a poor new englander will respond more like a rich westerner, or anything like that. This model is clearly very simplified.
Please leave any thoughts or feedback in the comments, or e-mail me at email@example.com