Tuesday, October 30, 2007

Computer Vision (36): Mechanical or Knowledge based CORRESPONDENCE

In spite of expending sleepless nights giving deep thoughts on what could be the technique behind our brain solving the problem of depth perception, my brain only gave me a drowsier day ahead. So I started to filter out the possibilities to narrow down to the solution. The question I asked to myself was; is our brain using knowledge to correspond the left and the right images, or is it something that happens more mechanically? I had tried out a lot of knowledge based approaches, but only in vain and even the discussion that we had in the earlier post concluded to nothing. I wanted to take a different route by thinking of a more mechanical and less of a knowledge based approach. My brain then pointed me to the age old theory proposed by Thomas Young to explain the wave nature (interference) of light, “The double Slit Experiment”. How could this be of use to solve a seemingly unrelated problem of depth perception? On comparing you will find a few things in common between the two setups. Both are trying to deal with light and both of them are trying to pass the surrounding light through two openings and combine them later. I excitedly thought, have I unlocked the puzzle?

Let’s analyze and understand it better to know if I really did! I am neither an expert in physics nor biology, so I can only build a wrapper around this concept and not verify its complete path.

Young’s experiment used a single source of monochromatic light to pass through the two slits and got an interference pattern on the rear screen, the two slits being equidistant from the source. The approximate formula for this experiment is given below.

Where,


λ is the wavelength of the light

s is the separation of the (slits/eyes)

x is the distance between the bands of light (also called fringe distance)

D is the distance from the (slits to the screen/eye and retina)

In this experiment the distance between the light source and the slits was kept constant and the separation between the slits varied. In our case the distance between the eyes remains same object to be seen varies over depth in the 3D space. Assuming D and λ to be constants, decreasing s will increase the frequency of the pattern on the screen. Conceptually decreasing the distance between the slits is equivalent to increasing the distance between the source and the slits. So the frequency of the pattern on the screen can be said to depend on the depth of the source from the slits in the 3D space. If the source is placed along a line bisecting the two slits, the pattern would be symmetric on either sides of this line on the screen. Every point along this line in the 3D space would have a unique frequency and hence pattern. The diagrams here are only for conceptual understanding and not the exact experimental outcome.
As the source starts to move away from this bisecting line the symmetry in the pattern should start to degrade.

If a light source is placed at 3 different locations equidistant from the center of the slits, the one at red would produce a symmetric pattern and the other two I guess would not. I have not experimented this and hence the letters NX (Not eXperimented). If my guess is right, a light source placed anywhere in the 3D space would produce a unique pattern on the screen!!! This means an analysis of this pattern would tell us the exact location of the source in the 3D space.

This concept can be applied to our vision, by replacing the light source by any point in space that reflects light acting as a passive source.

Tuesday, October 16, 2007

Photography and Travel: Kudremukha

This is the season of the year when nature embraces the dry mountains of the western ghats with a green velvety floral carpet. The season after the rains brings with it a fine spray of mist over these hillocks. The drive along the mountain veers amidst fresh vegetation and serene valleys makes your experience even more invigorating.

Wednesday, October 3, 2007

Computer Vision (35): Segmentation Verses Stereo Correspondence

One question that always keeps rambling in my mind is if segmentation is a 2D or a stereo phenomenon. A 2D image on analysis can get us no more than a set of colors, edges and intensities. A segmentation algorithm even though would at its best try to exploit one or more of these image lineaments, would still fall short of what is expected from it. This is because when we define segmentation as a process of segregating the different objects in our surrounding or a given input image, not all them can be extracted using the combination of the cited features. A lot of times we might have to coalesce more than one segment to form an object and the rule to do this has kept our brains cerebrating for decades. Below image is a simple illustration of this.

Hope it doesn’t get difficult for your brain at least to get the contents in the image. On observing keenly, it shows a dog testing its olfactory system to find something good for its stomach. You can almost recognize the dog as a Dalmatian. Now I bet if anyone can get me a generalized segmentation algorithm that can extract the dog from this image!!!

Some people might argue that it’s almost impossible to achieve this from a 2D image, since there is no way to distinguish the plain of the dog from that of the ground. Remember, your brain has already done it! In a real scenario even if we come across such a view our stereo vision would ensure that the dog forms an image separate to the plain of the ground and hence would get segmented due to the variation in depth. Our brain can still do it from the 2D image here due to the tremendous amount of knowledge it has gathered over the years. In short, stereo vision helped us build this knowledge over the years and this knowledge is now helping us to segment objects even from a 2D image. The BIG question is, how do we do it in a computer?

Whenever I start to think about a solution to the problem of stereo correspondence the problem of segmentation would barricade it. This is why. The first step to understand or solve for stereo correspondence is to experiment with two cameras taking images at an offset. Below is a sample image.

It is very obvious that we cannot correspond the images pixel by pixel. Which blue pixel of the sky in the left image would you use to pair with a particular blue pixel in the right? Some pixels in the right would not correspond with the left and vice versa, but how do you know where these pixels are? This again loops us back to use some segmentation techniques to match similar objects in the two images, but I think we had just now concluded that segmentation was due to stereo!!!