**Wait, what?**

Every so often, I like to code up something random. I think of it as a productive way to expand my technical abilities, and although I don't have any concrete evidence, I figure it probably gives me a broader base against which to think about problems in other areas. So, as a weekend project, I thought it might be challenging to teach myself something about image processing. The field of "computer vision" is evolving rapidly, and I didn't (still don't) know a thing about it. I initially thought it would be cool to build something that could identify features of a face... geometry detection. But after reading a bit about it, I found out that almost every modern method of doing this is based on computing geometry from detected edges in the image. And, so clicking a few Wikipedia links, I realized I needed first to learn about edge detection.

**Edge Detection**After some brief research, I found the Canny Edge Detection Algorithm. Although it was developed over 20 years ago (in 1986), it looks like almost all modern methods are based on, or at least acknowledge, Canny's ideas, and it seems to remain one of the best methods for finding edges in an image.

**Method**As interesting as the topic is, the description of the edge method would be really boring to read as text. Not that boring is a problem, since the focus of the blog isn't to be interesting as much as it is to get things down on paper. But since I think I can make it more interesting, I will try to do so.

Here's a nice picture of a person playing Donkey Conga for Game Cube.

And (looking ahead to the solution), here's the edge detected version. In the rest of this post, I'll explain how I got there.

**Step 1: Noise Reduction**

Now, the first process discussed as part of the Canny detector is a noise-reduction stage. To do this, I simply reduced the color space to greyscale. Since I'm not an expert in the field, I just picked a random formula from the internet that I thought looked good. It is:

my $grey = byte(77/256 * $red + 150/256 * $green + 29/256 * $blue)

Having done this, I then sought to blur the image. Thankfully, wikipedia provided a suggested blur filter, which would be to use a convolution with the image (A) and the square matrix below. Since the PDL (Perl Data Language) library provides a function for a 2d convolution, this was actually remarkably easy.

As a side note, I did find a few papers suggesting that using a different color space will make it way easier to detect skin, but I thought it might be better to build a generic detector first, then specialize the program later if I wanted to work on face geometry.

This yielded a greyscale, slightly blurry version of the original picture, seen here:

**Step 2: Gradient Detection**

Next, I attempt to find the actual edges by finding the intensity gradient of the image. This is very easy,, I just find the delta-intensity between each pixel in the North-South and East-West directions. In the images below, areas of high gradient (high first derivative) have higher intensity values (they are whiter). As you can see, the NS edges are primarily horizontal and the EW edges are primarily vertical. Here's the gradients in the North-South (NS) direction.

And here's the gradients in the East-West (EW) direction.

There's also some work out there that suggests using the second derivative instead of a simple gradient to find the edges (zeros of the second derivative correspond to maximums in the gradient. This would natually obtain higher accuracy, thin down the lines, and eliminate the need for the non-maximum suppression (step 5) later on.

The Canny algorithm also calls for gradients to be calculated in the NW/SE and NE/SW directions (4 gradients instead of 2). For simplicity sake, I only implemented the NS and EW gradients here.

**Side Note:**

From an implementation standpoint, its clearly redundant to write separate functions for the NS and EW directions. Since the image manipulation functions that I wrote below all operate in-place, I can transpose the matrix to operate on it "sideways" than change it back.

So, more specifically, to get the EW implementation, I take the original input, transpose it, and pass it into the NS function. Then, I take the resulting image, and transpose it again to get the EW result out.

**Step 3: Thresholding**

The third step is to identify which edges are important and which are not. The algorithm assumes that areas of high difference (gradient is large) are more important edges than those that are smaller. Thus, a threshold can be used to find the important lines.

The level of noise versus accurately detecting all the important edges really depends on the threshold chosen. Below, I'll pick two thresholds, one at 20% of the maximum gradient and one at 10% of the maximum gradient. (That is, if the greatest pixel-to-pixel difference were 100, I'd only show lines that were over 20 and over 10).

And of course, these are only in the NS direction. We'll also want to do this in the EW direction. As seen here (high):

And with the EW, low (10%) threshold:

As you can see, the higher threshold has less noise, but in both the NS and EW direction, fails to pick up the full line that you can see in the lower threshold.

**Step 4: Edge Tracing**

The algorithmic solution to the problem above is to use the high threshold as a baseline to figure out what lines are the important lines, and use the lower threshold to follow those lines. I'll quote Wikipedia's explanation as being a relatively clear explanation:

*Making the assumption that important edges should be along continuous
curves in the image allows us to follow a faint section of a given line
and to discard a few noisy pixels that do not constitute a line but
have produced large gradients. Therefore we begin by applying a high
threshold. This marks out the edges we can be fairly sure are genuine.
Starting from these, using the directional information derived earlier,
edges can be traced through the image. While tracing an edge, we apply
the lower threshold, allowing us to trace faint sections of edges as
long as we find a starting point.*

So, starting with the pixels in the high threshold, I then use any pixels in the low-threshold image that are adjacent. My implementation of this was akin to the Paint Bucket tool in an image editing program, spreading (recursively) until all the adjacent pixels in the region were detected. Again, I ignored the NE/SW type directions for simplicity.

The edge-traced images are shown below, first in the NS direction:

And also in the EW direction:

As you can see, the major lines are fully traced (better than in the 20% threshold), while eliminating most the noise (much better than in the 10% threshold).

**Step 5: Non-Maximum Supression**

The lines in the above images are all really thick, and all the extra pixels would make the detected edges hard to use in practice, for example, in a geometry detection algorithm. What we'd like to do is to thin down the lines so that only the highest delta in a region showed up. To do this, we refer to the original (post step-1) greyscale image, and find the local maximas of the gradient function. My implementation (for the NS direction) looks at each "column" of pixels independently, and finds the local maximas of each "cluster" of white. The resulting edges are shown below:

North-South:

And EW:

**Step 6: Composite Images**

Finally, we need to join the NS and EW edges into a single image. The implementation is easy, just binary OR each pixel in the image. Hooray!

As you can see below, we finally have a clean edge-detection image to use.

So this post is a brain dump of something that's been tickling my brain for a while now. It's mostly a collection of poorly cohered thoughts, but they're rattling around in my head at night and preventing me from sleeping well, so I need to get them out of my brain and into a semi-permanent medium. I'd also like to get feedback on the concept... I think it could be interesting.

--

**A/B Testing**

The
starting idea is the concept of A/B testing webpages. That is, testing
variations of a webpage, randomized to eliminate interference by
dependent variables, to see which web page is more effective at getting
a user to "convert" (i.e., do what we want.)

**Prediction Algorithms**

The
next concept is that of the prediction algorithm -- using (usually
past) data to predict what a user is going to do. Or in this case,
which page (A or B) will drive the user to a conversion.

**The Netflix Prize**

Now,
the best prediction algorithms in the world right now are probably
those designed to win the Netflix prize, so I'll discuss those next.
The Bellkor prize solution
isn't quite as broadly applicable to other problems (mostly describing
the weights of the winning algorithm), so its slightly less
interesting, but it does convey one tidbit we should keep in mind.
This is that taking a wide selection of recommendation algorithms (in
the Bellkor paper they use 107 elements) and taking their results in
some weighted fashion is probably better than going "deep" into one
algorithm.

What's more interesting is the algorithm they published in the paper Improved Neighborhood-based Collaborative Filtering. The paper's not too difficult, but it is analyzed quite nicely here at Algorithms Analyzed (seen here), so I'll use that as a reference.

**Parallels**

So,
to make sure we're clear, if you were to compare A/B testing to the
netflix recommender, movies would be the equivalent of page variants.
So we'd want to predict which page variant a user would "prefer", based
on some characteristic of the user and the pages. Got it? Good. Now
it gets more interesting.

**Neighborhood-based Collaborative Filtering**

The
Algorithms Analyzed explanation reduced the prediction problem down to
essentially two formulas. In these formulas, the variables u,v
represent two users and the variables i,j,k represent the movies.

or

So, in the first case, you'd be considering the similarity between two movies and combining that with the user's preference for one of the movies. In the second case, you'd be comparing between two similar users and seeking how the other person rated that movie. To consider the parallel case of A/B testing pages, you'd only use the second formula (which page similar users prefer), rather than trying to compare different pages in the A/B test (which would be impossible, if the user's never been to your site before.)

**Similarity Functions**

So,
assuming you could actually pull off the above and implement the
algorithm for an A/B test, you still have to come up with some way to
figure out the similar function between two users. How could you do
that? A few readily come to mind. IP geolocation gives you pretty
close data for location. If you could narrow it to a set of zipcodes,
you could use demographic information such as household income, age
distribution, race distribution. You have data regarding the user's
computer from the browser's user agent (browser, operating system, .NET
installation), as well as via javascript (screen resolution, color
depth). You can guess their ISP. You could even portscan them if you
really had to.

**Processing Power**

And one of the best
things about this concept is that the hard work isn't done on the fly.
The most difficult part of the process is deriving all the weights for
the equation, and that can be done from log data. How to accomplish
this is left as an exercise to the reader; you could use Singular Value
Decomposition if you want
high precision, or just use a neural network or genetic algorithm for a
good approximation that is simpler than doing the SVD. Regardless,
once you figure out the weights, the only thing that needs to be done
in real-time is *applying* the formula, which is just a few lines of
code.

**Linearity and Variable Dependence**

One possible
problem with the nearest neighbor algorithm as defined above is that
for simplicity sake, all variables are assumed to be linear functions,
neither of which may hold -- certainly not linearity, and most likely
not even the assumption they're functions (i.e., do they pass the
vertical line test?). The other is the idea of variable dependence,
which I will illustrate anecdotally from our A/B testing example -- if
all variables are independent, then it assumes all rich people will
respond a certain way, regardless of if they're in the east, west, or
south. So it precludes even the possibility that there's a function
where a poor new englander will respond more like a rich westerner, or
anything like that. This model is clearly very simplified.

**Feedback**

Please leave any thoughts or feedback in the comments, or e-mail me at blog@ryankosai.com

Design is hard enough without trying to reproduce a commonly used visual element. Here are some stencils that cover the basics.

**Yahoo's GUI Stencils:**

This is a fairly complete package of common web elements as well as basic viewports for handheld devices. It also has some nice "Windows" elements and the sizes of banner ads.* http://developer.yahoo.com/ypatterns/wireframes/*

Yahoo's stencil set has some elements for the iPhone, but this package gives you some additional iPhone options that aren't in the Yahoo package.

This package contains cursors and common web elements for a variety of browsers on different operating systems.