Every so often, I like to code up something random. I think of it as a productive way to expand my technical abilities, and although I don't have any concrete evidence, I figure it probably gives me a broader base against which to think about problems in other areas. So, as a weekend project, I thought it might be challenging to teach myself something about image processing. The field of "computer vision" is evolving rapidly, and I didn't (still don't) know a thing about it. I initially thought it would be cool to build something that could identify features of a face... geometry detection. But after reading a bit about it, I found out that almost every modern method of doing this is based on computing geometry from detected edges in the image. And, so clicking a few Wikipedia links, I realized I needed first to learn about edge detection.
After some brief research, I found the Canny Edge Detection Algorithm. Although it was developed over 20 years ago (in 1986), it looks like almost all modern methods are based on, or at least acknowledge, Canny's ideas, and it seems to remain one of the best methods for finding edges in an image.
As interesting as the topic is, the description of the edge method would be really boring to read as text. Not that boring is a problem, since the focus of the blog isn't to be interesting as much as it is to get things down on paper. But since I think I can make it more interesting, I will try to do so.
Here's a nice picture of a person playing Donkey Conga for Game Cube.
And (looking ahead to the solution), here's the edge detected version. In the rest of this post, I'll explain how I got there.
Step 1: Noise Reduction
Now, the first process discussed as part of the Canny detector is a noise-reduction stage. To do this, I simply reduced the color space to greyscale. Since I'm not an expert in the field, I just picked a random formula from the internet that I thought looked good. It is:
Having done this, I then sought to blur the image. Thankfully, wikipedia provided a suggested blur filter, which would be to use a convolution with the image (A) and the square matrix below. Since the PDL (Perl Data Language) library provides a function for a 2d convolution, this was actually remarkably easy.
As a side note, I did find a few papers suggesting that using a different color space will make it way easier to detect skin, but I thought it might be better to build a generic detector first, then specialize the program later if I wanted to work on face geometry.
This yielded a greyscale, slightly blurry version of the original picture, seen here:
Step 2: Gradient Detection
Next, I attempt to find the actual edges by finding the intensity gradient of the image. This is very easy,, I just find the delta-intensity between each pixel in the North-South and East-West directions. In the images below, areas of high gradient (high first derivative) have higher intensity values (they are whiter). As you can see, the NS edges are primarily horizontal and the EW edges are primarily vertical. Here's the gradients in the North-South (NS) direction.
And here's the gradients in the East-West (EW) direction.
There's also some work out there that suggests using the second derivative instead of a simple gradient to find the edges (zeros of the second derivative correspond to maximums in the gradient. This would natually obtain higher accuracy, thin down the lines, and eliminate the need for the non-maximum suppression (step 5) later on.
The Canny algorithm also calls for gradients to be calculated in the NW/SE and NE/SW directions (4 gradients instead of 2). For simplicity sake, I only implemented the NS and EW gradients here.
From an implementation standpoint, its clearly redundant to write separate functions for the NS and EW directions. Since the image manipulation functions that I wrote below all operate in-place, I can transpose the matrix to operate on it "sideways" than change it back.
So, more specifically, to get the EW implementation, I take the original input, transpose it, and pass it into the NS function. Then, I take the resulting image, and transpose it again to get the EW result out.
Step 3: Thresholding
The third step is to identify which edges are important and which are not. The algorithm assumes that areas of high difference (gradient is large) are more important edges than those that are smaller. Thus, a threshold can be used to find the important lines.
The level of noise versus accurately detecting all the important edges really depends on the threshold chosen. Below, I'll pick two thresholds, one at 20% of the maximum gradient and one at 10% of the maximum gradient. (That is, if the greatest pixel-to-pixel difference were 100, I'd only show lines that were over 20 and over 10).
And of course, these are only in the NS direction. We'll also want to do this in the EW direction. As seen here (high):
And with the EW, low (10%) threshold:
As you can see, the higher threshold has less noise, but in both the NS and EW direction, fails to pick up the full line that you can see in the lower threshold.
Step 4: Edge Tracing
The algorithmic solution to the problem above is to use the high threshold as a baseline to figure out what lines are the important lines, and use the lower threshold to follow those lines. I'll quote Wikipedia's explanation as being a relatively clear explanation:
Making the assumption that important edges should be along continuous curves in the image allows us to follow a faint section of a given line and to discard a few noisy pixels that do not constitute a line but have produced large gradients. Therefore we begin by applying a high threshold. This marks out the edges we can be fairly sure are genuine. Starting from these, using the directional information derived earlier, edges can be traced through the image. While tracing an edge, we apply the lower threshold, allowing us to trace faint sections of edges as long as we find a starting point.
So, starting with the pixels in the high threshold, I then use any pixels in the low-threshold image that are adjacent. My implementation of this was akin to the Paint Bucket tool in an image editing program, spreading (recursively) until all the adjacent pixels in the region were detected. Again, I ignored the NE/SW type directions for simplicity.
The edge-traced images are shown below, first in the NS direction:
And also in the EW direction:
As you can see, the major lines are fully traced (better than in the 20% threshold), while eliminating most the noise (much better than in the 10% threshold).
Step 5: Non-Maximum Supression
The lines in the above images are all really thick, and all the extra pixels would make the detected edges hard to use in practice, for example, in a geometry detection algorithm. What we'd like to do is to thin down the lines so that only the highest delta in a region showed up. To do this, we refer to the original (post step-1) greyscale image, and find the local maximas of the gradient function. My implementation (for the NS direction) looks at each "column" of pixels independently, and finds the local maximas of each "cluster" of white. The resulting edges are shown below:
Step 6: Composite Images
Finally, we need to join the NS and EW edges into a single image. The implementation is easy, just binary OR each pixel in the image. Hooray!
As you can see below, we finally have a clean edge-detection image to use.