Friday, March 7, 2008

Voice munching

Lip reading in computer vision tries to uncover the conversation through an audio less video sequence. It tries to exploit the movement of the lips and the jaw which are assumed to have a unique correspondence to what we speak. When it comes to broadcasting of media content for example say a news show, ideally it would only be required to transmit the video of the person and the software residing locally should be able to give you the news through lip reading. But would it really be worth the effort? How much bandwidth would the audio signal after all take? I would rather be impressed if the whole concept was reversed; bring in the lip movement by looking at the voice. Of course this would not replicate the exact video of the person talking, but no other better example could be found to support this concept. Transmit only the first frame of video and guess the next frames, I mean lip movement through the transmitted voice. The bandwidth to transmit a news channel will just be equal to the bandwidth of voice which means I will be able to use my landline phone with no special modem to make a video call. Crazy stuff! But all depends on how best you can make the person’s lips dance to the tunes of his voice.

Thursday, December 27, 2007

Computer Vision (37): Sensing through Seismics, The Golden Mole

Nature has always outwitted humans in its creativity and optimization. Humans are one of the few creatures bestowed with a complex and highly developed visual sensitivity. Even though we ourselves haven't been able to crack the algorithms of our visual cortex, researchers are trying hard to replicate the behavior in robots. I myself have strived for years to unravel the enigma, but in vain. I then started to look out for other suitable ways to allow robots to become autonomous in one way or the other and came across a category of owls that could pin point their prey through hearing and have already blogged about it.
Some pythons have the ability to sense the infra red radiation from creatures and can even use it to hunt down their prey. Usually these are called pit snakes. Though not very well developed they still have eyes for vision, which leave these creatures not that special compared to the golden moles that I came across recently.


These creatures do not have eyes at all. They have extremely sensitive hearing and vibration detection, and can navigate underground with unerring accuracy. Morphological analysis of the middle ear has revealed a massive malleus which likely enables it to detect seismic cues. The make use of this seismic sensitivity to detect prey as well as to navigate when burrowing through sand. While vibrations are used over long distances to detect prey, smell is possibly used over shorter distances.

FORSAKEN FANFARE

Travel and Photography
Had been to kerala recently and wanted to have this post under my usual Travel and photography theme, but the message I wanted to convey was much more than just this, so made that a sub heading. I questioned myself; What does it take to be a celebrity? Fame doesn't shake hands with those who are just talented. There is something else missing in these people which I yet need to discover or probably find out from you. As they say, your family name most of the times would do all the magic in the film industry. Our country with such a bursting population would have a lot of such cases that fail to exhibit their endowment at the right place and time. I met one such case in Fort Kochi.
This person could embellish the algae clad wall with just a few colored chalks, and of course a lot of his esteemed abilities. People gathered to watch him chalk his imagination, but pretended not to recognize that it was not a charity show. He stood there smiling at the audience waiting to at least settle his accounts on the money he had spent for the chalks. It was shocking to see everyone disperse from there without even a single penny flying to his side. Seriously I feel that my Canon 350D failed to reproduce the shades (In fact I borrowed this snap from my friend) that he could create on such a dirty wall. With a canvas I think he will touch the skies.
Here are some of the glimpses of kerala (Cochin, Attirapalli and Alleppey backwaters) through my camera:
http://www.flickr.com/photos/57078108@N00/.

Tuesday, October 30, 2007

Computer Vision (36): Mechanical or Knowledge based CORRESPONDENCE

In spite of expending sleepless nights giving deep thoughts on what could be the technique behind our brain solving the problem of depth perception, my brain only gave me a drowsier day ahead. So I started to filter out the possibilities to narrow down to the solution. The question I asked to myself was; is our brain using knowledge to correspond the left and the right images, or is it something that happens more mechanically? I had tried out a lot of knowledge based approaches, but only in vain and even the discussion that we had in the earlier post concluded to nothing. I wanted to take a different route by thinking of a more mechanical and less of a knowledge based approach. My brain then pointed me to the age old theory proposed by Thomas Young to explain the wave nature (interference) of light, “The double Slit Experiment”. How could this be of use to solve a seemingly unrelated problem of depth perception? On comparing you will find a few things in common between the two setups. Both are trying to deal with light and both of them are trying to pass the surrounding light through two openings and combine them later. I excitedly thought, have I unlocked the puzzle?

Let’s analyze and understand it better to know if I really did! I am neither an expert in physics nor biology, so I can only build a wrapper around this concept and not verify its complete path.

Young’s experiment used a single source of monochromatic light to pass through the two slits and got an interference pattern on the rear screen, the two slits being equidistant from the source. The approximate formula for this experiment is given below.

Where,


λ is the wavelength of the light

s is the separation of the (slits/eyes)

x is the distance between the bands of light (also called fringe distance)

D is the distance from the (slits to the screen/eye and retina)

In this experiment the distance between the light source and the slits was kept constant and the separation between the slits varied. In our case the distance between the eyes remains same object to be seen varies over depth in the 3D space. Assuming D and λ to be constants, decreasing s will increase the frequency of the pattern on the screen. Conceptually decreasing the distance between the slits is equivalent to increasing the distance between the source and the slits. So the frequency of the pattern on the screen can be said to depend on the depth of the source from the slits in the 3D space. If the source is placed along a line bisecting the two slits, the pattern would be symmetric on either sides of this line on the screen. Every point along this line in the 3D space would have a unique frequency and hence pattern. The diagrams here are only for conceptual understanding and not the exact experimental outcome.
As the source starts to move away from this bisecting line the symmetry in the pattern should start to degrade.

If a light source is placed at 3 different locations equidistant from the center of the slits, the one at red would produce a symmetric pattern and the other two I guess would not. I have not experimented this and hence the letters NX (Not eXperimented). If my guess is right, a light source placed anywhere in the 3D space would produce a unique pattern on the screen!!! This means an analysis of this pattern would tell us the exact location of the source in the 3D space.

This concept can be applied to our vision, by replacing the light source by any point in space that reflects light acting as a passive source.

Tuesday, October 16, 2007

Photography and Travel: Kudremukha

This is the season of the year when nature embraces the dry mountains of the western ghats with a green velvety floral carpet. The season after the rains brings with it a fine spray of mist over these hillocks. The drive along the mountain veers amidst fresh vegetation and serene valleys makes your experience even more invigorating.

Wednesday, October 3, 2007

Computer Vision (35): Segmentation Verses Stereo Correspondence

One question that always keeps rambling in my mind is if segmentation is a 2D or a stereo phenomenon. A 2D image on analysis can get us no more than a set of colors, edges and intensities. A segmentation algorithm even though would at its best try to exploit one or more of these image lineaments, would still fall short of what is expected from it. This is because when we define segmentation as a process of segregating the different objects in our surrounding or a given input image, not all them can be extracted using the combination of the cited features. A lot of times we might have to coalesce more than one segment to form an object and the rule to do this has kept our brains cerebrating for decades. Below image is a simple illustration of this.

Hope it doesn’t get difficult for your brain at least to get the contents in the image. On observing keenly, it shows a dog testing its olfactory system to find something good for its stomach. You can almost recognize the dog as a Dalmatian. Now I bet if anyone can get me a generalized segmentation algorithm that can extract the dog from this image!!!

Some people might argue that it’s almost impossible to achieve this from a 2D image, since there is no way to distinguish the plain of the dog from that of the ground. Remember, your brain has already done it! In a real scenario even if we come across such a view our stereo vision would ensure that the dog forms an image separate to the plain of the ground and hence would get segmented due to the variation in depth. Our brain can still do it from the 2D image here due to the tremendous amount of knowledge it has gathered over the years. In short, stereo vision helped us build this knowledge over the years and this knowledge is now helping us to segment objects even from a 2D image. The BIG question is, how do we do it in a computer?

Whenever I start to think about a solution to the problem of stereo correspondence the problem of segmentation would barricade it. This is why. The first step to understand or solve for stereo correspondence is to experiment with two cameras taking images at an offset. Below is a sample image.

It is very obvious that we cannot correspond the images pixel by pixel. Which blue pixel of the sky in the left image would you use to pair with a particular blue pixel in the right? Some pixels in the right would not correspond with the left and vice versa, but how do you know where these pixels are? This again loops us back to use some segmentation techniques to match similar objects in the two images, but I think we had just now concluded that segmentation was due to stereo!!!

Thursday, September 20, 2007

Computer Vision and Photography (34): The Focus Story Continues…

Was thinking of ways to solve the auto focus problem perfectly even for surfaces without any contrast and came up with a few proposals which I do not know are practical or not. Light that is converged by the lens we all know forms a 3D cone between the point from where it takes off and the surface of the lens towards the point. This cone will be right circular only for points on the optical axis. The lens only sees the 2D projection of the light points existing in the 3D space around it and so there can only be one point on the optical axis that the lens will be able to see at any instant of time. This is the point that I am trying to FOCUS on. I need to somehow come up with a technique to detect the rays emerging from a point on the optical axis.
One difference between a point on the optical axis and any other is that, rays emanating from the former point meet a circle of radius R drawn from the center of the lens, at the same angle.

But any point on the lens would receive light from all points visible around it. So at any point on the lens light rays will be converging from every possible angle, which leaves us with no way to pinpoint the ray that started from the optical axis.There are many more problems with this very way of thinking to solve the problem. From the perspective of the lens we never know where the real point is located on the optical axis. Different points from the surrounding space can create the same effect as though there was a real point at a different location on the optical axis. This indeed can happen continuously all along the axis! Assuming that the frequency of light reflected from a real point will almost be the same when it meets the circle and the probability of such a thing happening for a virtual point zero, the problem could be solved. But if you recall, the very reason why I started to think about this, was to get a solution to cases where there is zero contrast.
After a while I came across a theory called QED that solved a lot of these problems but kept the hardware required to achieve it out of our current technology’s reach. According to QED, a photon represents the “particle” of light, and its instantaneous phase the “wave” counterpart. This phase depends on the frequency of the light under consideration. A lens focuses light because the probability that the photons reach the focus point with the same phase is high and zero anywhere else. For more details refer to the book “QED: The Strange Theory of Light and Matter”. Putting the same theory into action for our current scenario, this would hold good only for a real particle. Since phase is something that repeats as the photon travels through space, the random points that form the virtual particle should be present at exact locations (again that can repeat in space) to meet the point “a”, all with the same phase!, which is highly improbable in a practical scenario. Now this should work for ZERO contrast!