Wednesday, October 3, 2007

Computer Vision (35): Segmentation Verses Stereo Correspondence

One question that always keeps rambling in my mind is if segmentation is a 2D or a stereo phenomenon. A 2D image on analysis can get us no more than a set of colors, edges and intensities. A segmentation algorithm even though would at its best try to exploit one or more of these image lineaments, would still fall short of what is expected from it. This is because when we define segmentation as a process of segregating the different objects in our surrounding or a given input image, not all them can be extracted using the combination of the cited features. A lot of times we might have to coalesce more than one segment to form an object and the rule to do this has kept our brains cerebrating for decades. Below image is a simple illustration of this.

Hope it doesn’t get difficult for your brain at least to get the contents in the image. On observing keenly, it shows a dog testing its olfactory system to find something good for its stomach. You can almost recognize the dog as a Dalmatian. Now I bet if anyone can get me a generalized segmentation algorithm that can extract the dog from this image!!!

Some people might argue that it’s almost impossible to achieve this from a 2D image, since there is no way to distinguish the plain of the dog from that of the ground. Remember, your brain has already done it! In a real scenario even if we come across such a view our stereo vision would ensure that the dog forms an image separate to the plain of the ground and hence would get segmented due to the variation in depth. Our brain can still do it from the 2D image here due to the tremendous amount of knowledge it has gathered over the years. In short, stereo vision helped us build this knowledge over the years and this knowledge is now helping us to segment objects even from a 2D image. The BIG question is, how do we do it in a computer?

Whenever I start to think about a solution to the problem of stereo correspondence the problem of segmentation would barricade it. This is why. The first step to understand or solve for stereo correspondence is to experiment with two cameras taking images at an offset. Below is a sample image.

It is very obvious that we cannot correspond the images pixel by pixel. Which blue pixel of the sky in the left image would you use to pair with a particular blue pixel in the right? Some pixels in the right would not correspond with the left and vice versa, but how do you know where these pixels are? This again loops us back to use some segmentation techniques to match similar objects in the two images, but I think we had just now concluded that segmentation was due to stereo!!!

No comments: