Depth perception through stereo imaging

Wednesday, April 2, 2008

Stargazing Olympics 2008:

Being close to the beta version of my MSMM software, I find Olympics 2008 the rightest place to demonstrate it. Here is a list of events to be held in this event: http://en.beijing2008.cn/cptvenues/schedule/. MSMM can be used to depict motion events like; Athletics, Badminton, Basketball, Canoe/Kayak -- Slalom, Artistic Gymnastics, Gymnastics -- Trampoline, Rhythmic Gymnastics, Aquatics -- Diving, Taekwondo, Volleyball and Beach Volleyball in an effective way. In fact from the video transmission perspective once a MSMM image is created there would be no need to transmit the replay of the entire sequence, as this image(MSMM) would depict all of this in a single frame. So it would be a kind of compression in a way.

Any company willing to promote its camera through this software??? Check out the online demo here: http://www.multishotimaging.com/

Friday, March 7, 2008

Voice munching

Lip reading in computer vision tries to uncover the conversation through an audio less video sequence. It tries to exploit the movement of the lips and the jaw which are assumed to have a unique correspondence to what we speak. When it comes to broadcasting of media content for example say a news show, ideally it would only be required to transmit the video of the person and the software residing locally should be able to give you the news through lip reading. But would it really be worth the effort? How much bandwidth would the audio signal after all take? I would rather be impressed if the whole concept was reversed; bring in the lip movement by looking at the voice. Of course this would not replicate the exact video of the person talking, but no other better example could be found to support this concept. Transmit only the first frame of video and guess the next frames, I mean lip movement through the transmitted voice. The bandwidth to transmit a news channel will just be equal to the bandwidth of voice which means I will be able to use my landline phone with no special modem to make a video call. Crazy stuff! But all depends on how best you can make the person’s lips dance to the tunes of his voice.

Thursday, December 27, 2007

Computer Vision (37): Sensing through Seismics, The Golden Mole

Nature has always outwitted humans in its creativity and optimization. Humans are one of the few creatures bestowed with a complex and highly developed visual sensitivity. Even though we ourselves haven't been able to crack the algorithms of our visual cortex, researchers are trying hard to replicate the behavior in robots. I myself have strived for years to unravel the enigma, but in vain. I then started to look out for other suitable ways to allow robots to become autonomous in one way or the other and came across a category of owls that could pin point their prey through hearing and have already blogged about it.

Some pythons have the ability to sense the infra red radiation from creatures and can even use it to hunt down their prey. Usually these are called pit snakes. Though not very well developed they still have eyes for vision, which leave these creatures not that special compared to the golden moles that I came across recently.

These creatures do not have eyes at all. They have extremely sensitive hearing and vibration detection, and can navigate underground with unerring accuracy. Morphological analysis of the middle ear has revealed a massive malleus which likely enables it to detect seismic cues. The make use of this seismic sensitivity to detect prey as well as to navigate when burrowing through sand. While vibrations are used over long distances to detect prey, smell is possibly used over shorter distances.

FORSAKEN FANFARE

Travel and Photography

Had been to kerala recently and wanted to have this post under my usual Travel and photography theme, but the message I wanted to convey was much more than just this, so made that a sub heading. I questioned myself; What does it take to be a celebrity? Fame doesn't shake hands with those who are just talented. There is something else missing in these people which I yet need to discover or probably find out from you. As they say, your family name most of the times would do all the magic in the film industry. Our country with such a bursting population would have a lot of such cases that fail to exhibit their endowment at the right place and time. I met one such case in Fort Kochi.

This person could embellish the algae clad wall with just a few colored chalks, and of course a lot of his esteemed abilities. People gathered to watch him chalk his imagination, but pretended not to recognize that it was not a charity show. He stood there smiling at the audience waiting to at least settle his accounts on the money he had spent for the chalks. It was shocking to see everyone disperse from there without even a single penny flying to his side. Seriously I feel that my Canon 350D failed to reproduce the shades (In fact I borrowed this snap from my friend) that he could create on such a dirty wall. With a canvas I think he will touch the skies.
Here are some of the glimpses of kerala (Cochin, Attirapalli and Alleppey backwaters) through my camera: http://www.flickr.com/photos/57078108@N00/.

Tuesday, October 30, 2007

Computer Vision (36): Mechanical or Knowledge based CORRESPONDENCE

In spite of expending sleepless nights giving deep thoughts on what could be the technique behind our brain solving the problem of depth perception, my brain only gave me a drowsier day ahead. So I started to filter out the possibilities to narrow down to the solution. The question I asked to myself was; is our brain using knowledge to correspond the left and the right images, or is it something that happens more mechanically? I had tried out a lot of knowledge based approaches, but only in vain and even the discussion that we had in the earlier post concluded to nothing. I wanted to take a different route by thinking of a more mechanical and less of a knowledge based approach. My brain then pointed me to the age old theory proposed by Thomas Young to explain the wave nature (interference) of light, “The double Slit Experiment”. How could this be of use to solve a seemingly unrelated problem of depth perception? On comparing you will find a few things in common between the two setups. Both are trying to deal with light and both of them are trying to pass the surrounding light through two openings and combine them later. I excitedly thought, have I unlocked the puzzle?

Let’s analyze and understand it better to know if I really did! I am neither an expert in physics nor biology, so I can only build a wrapper around this concept and not verify its complete path.

Young’s experiment used a single source of monochromatic light to pass through the two slits and got an interference pattern on the rear screen, the two slits being equidistant from the source. The approximate formula for this experiment is given below.

Where,

λ is the wavelength of the light

s is the separation of the (slits/eyes)

x is the distance between the bands of light (also called fringe distance)

D is the distance from the (slits to the screen/eye and retina)

In this experiment the distance between the light source and the slits was kept constant and the separation between the slits varied. In our case the distance between the eyes remains same object to be seen varies over depth in the 3D space. Assuming D and λ to be constants, decreasing s will increase the frequency of the pattern on the screen. Conceptually decreasing the distance between the slits is equivalent to increasing the distance between the source and the slits. So the frequency of the pattern on the screen can be said to depend on the depth of the source from the slits in the 3D space. If the source is placed along a line bisecting the two slits, the pattern would be symmetric on either sides of this line on the screen. Every point along this line in the 3D space would have a unique frequency and hence pattern. The diagrams here are only for conceptual understanding and not the exact experimental outcome.

As the source starts to move away from this bisecting line the symmetry in the pattern should start to degrade.

If a light source is placed at 3 different locations equidistant from the center of the slits, the one at red would produce a symmetric pattern and the other two I guess would not. I have not experimented this and hence the letters NX (Not eXperimented). If my guess is right, a light source placed anywhere in the 3D space would produce a unique pattern on the screen!!! This means an analysis of this pattern would tell us the exact location of the source in the 3D space.

This concept can be applied to our vision, by replacing the light source by any point in space that reflects light acting as a passive source.

Tuesday, October 16, 2007

Photography and Travel: Kudremukha

This is the season of the year when nature embraces the dry mountains of the western ghats with a green velvety floral carpet. The season after the rains brings with it a fine spray of mist over these hillocks. The drive along the mountain veers amidst fresh vegetation and serene valleys makes your experience even more invigorating.

Wednesday, October 3, 2007

Computer Vision (35): Segmentation Verses Stereo Correspondence

One question that always keeps rambling in my mind is if segmentation is a 2D or a stereo phenomenon. A 2D image on analysis can get us no more than a set of colors, edges and intensities. A segmentation algorithm even though would at its best try to exploit one or more of these image lineaments, would still fall short of what is expected from it. This is because when we define segmentation as a process of segregating the different objects in our surrounding or a given input image, not all them can be extracted using the combination of the cited features. A lot of times we might have to coalesce more than one segment to form an object and the rule to do this has kept our brains cerebrating for decades. Below image is a simple illustration of this.

Hope it doesn’t get difficult for your brain at least to get the contents in the image. On observing keenly, it shows a dog testing its olfactory system to find something good for its stomach. You can almost recognize the dog as a Dalmatian. Now I bet if anyone can get me a generalized segmentation algorithm that can extract the dog from this image!!!

Some people might argue that it’s almost impossible to achieve this from a 2D image, since there is no way to distinguish the plain of the dog from that of the ground. Remember, your brain has already done it! In a real scenario even if we come across such a view our stereo vision would ensure that the dog forms an image separate to the plain of the ground and hence would get segmented due to the variation in depth. Our brain can still do it from the 2D image here due to the tremendous amount of knowledge it has gathered over the years. In short, stereo vision helped us build this knowledge over the years and this knowledge is now helping us to segment objects even from a 2D image. The BIG question is, how do we do it in a computer?

Whenever I start to think about a solution to the problem of stereo correspondence the problem of segmentation would barricade it. This is why. The first step to understand or solve for stereo correspondence is to experiment with two cameras taking images at an offset. Below is a sample image.

It is very obvious that we cannot correspond the images pixel by pixel. Which blue pixel of the sky in the left image would you use to pair with a particular blue pixel in the right? Some pixels in the right would not correspond with the left and vice versa, but how do you know where these pixels are? This again loops us back to use some segmentation techniques to match similar objects in the two images, but I think we had just now concluded that segmentation was due to stereo!!!