Depth perception through stereo imaging: 2007

Thursday, December 27, 2007

Computer Vision (37): Sensing through Seismics, The Golden Mole

Nature has always outwitted humans in its creativity and optimization. Humans are one of the few creatures bestowed with a complex and highly developed visual sensitivity. Even though we ourselves haven't been able to crack the algorithms of our visual cortex, researchers are trying hard to replicate the behavior in robots. I myself have strived for years to unravel the enigma, but in vain. I then started to look out for other suitable ways to allow robots to become autonomous in one way or the other and came across a category of owls that could pin point their prey through hearing and have already blogged about it.

Some pythons have the ability to sense the infra red radiation from creatures and can even use it to hunt down their prey. Usually these are called pit snakes. Though not very well developed they still have eyes for vision, which leave these creatures not that special compared to the golden moles that I came across recently.

These creatures do not have eyes at all. They have extremely sensitive hearing and vibration detection, and can navigate underground with unerring accuracy. Morphological analysis of the middle ear has revealed a massive malleus which likely enables it to detect seismic cues. The make use of this seismic sensitivity to detect prey as well as to navigate when burrowing through sand. While vibrations are used over long distances to detect prey, smell is possibly used over shorter distances.

FORSAKEN FANFARE

Travel and Photography

Had been to kerala recently and wanted to have this post under my usual Travel and photography theme, but the message I wanted to convey was much more than just this, so made that a sub heading. I questioned myself; What does it take to be a celebrity? Fame doesn't shake hands with those who are just talented. There is something else missing in these people which I yet need to discover or probably find out from you. As they say, your family name most of the times would do all the magic in the film industry. Our country with such a bursting population would have a lot of such cases that fail to exhibit their endowment at the right place and time. I met one such case in Fort Kochi.

This person could embellish the algae clad wall with just a few colored chalks, and of course a lot of his esteemed abilities. People gathered to watch him chalk his imagination, but pretended not to recognize that it was not a charity show. He stood there smiling at the audience waiting to at least settle his accounts on the money he had spent for the chalks. It was shocking to see everyone disperse from there without even a single penny flying to his side. Seriously I feel that my Canon 350D failed to reproduce the shades (In fact I borrowed this snap from my friend) that he could create on such a dirty wall. With a canvas I think he will touch the skies.
Here are some of the glimpses of kerala (Cochin, Attirapalli and Alleppey backwaters) through my camera: http://www.flickr.com/photos/57078108@N00/.

Tuesday, October 30, 2007

Computer Vision (36): Mechanical or Knowledge based CORRESPONDENCE

In spite of expending sleepless nights giving deep thoughts on what could be the technique behind our brain solving the problem of depth perception, my brain only gave me a drowsier day ahead. So I started to filter out the possibilities to narrow down to the solution. The question I asked to myself was; is our brain using knowledge to correspond the left and the right images, or is it something that happens more mechanically? I had tried out a lot of knowledge based approaches, but only in vain and even the discussion that we had in the earlier post concluded to nothing. I wanted to take a different route by thinking of a more mechanical and less of a knowledge based approach. My brain then pointed me to the age old theory proposed by Thomas Young to explain the wave nature (interference) of light, “The double Slit Experiment”. How could this be of use to solve a seemingly unrelated problem of depth perception? On comparing you will find a few things in common between the two setups. Both are trying to deal with light and both of them are trying to pass the surrounding light through two openings and combine them later. I excitedly thought, have I unlocked the puzzle?

Let’s analyze and understand it better to know if I really did! I am neither an expert in physics nor biology, so I can only build a wrapper around this concept and not verify its complete path.

Young’s experiment used a single source of monochromatic light to pass through the two slits and got an interference pattern on the rear screen, the two slits being equidistant from the source. The approximate formula for this experiment is given below.

Where,

λ is the wavelength of the light

s is the separation of the (slits/eyes)

x is the distance between the bands of light (also called fringe distance)

D is the distance from the (slits to the screen/eye and retina)

In this experiment the distance between the light source and the slits was kept constant and the separation between the slits varied. In our case the distance between the eyes remains same object to be seen varies over depth in the 3D space. Assuming D and λ to be constants, decreasing s will increase the frequency of the pattern on the screen. Conceptually decreasing the distance between the slits is equivalent to increasing the distance between the source and the slits. So the frequency of the pattern on the screen can be said to depend on the depth of the source from the slits in the 3D space. If the source is placed along a line bisecting the two slits, the pattern would be symmetric on either sides of this line on the screen. Every point along this line in the 3D space would have a unique frequency and hence pattern. The diagrams here are only for conceptual understanding and not the exact experimental outcome.

As the source starts to move away from this bisecting line the symmetry in the pattern should start to degrade.

If a light source is placed at 3 different locations equidistant from the center of the slits, the one at red would produce a symmetric pattern and the other two I guess would not. I have not experimented this and hence the letters NX (Not eXperimented). If my guess is right, a light source placed anywhere in the 3D space would produce a unique pattern on the screen!!! This means an analysis of this pattern would tell us the exact location of the source in the 3D space.

This concept can be applied to our vision, by replacing the light source by any point in space that reflects light acting as a passive source.

Tuesday, October 16, 2007

Photography and Travel: Kudremukha

This is the season of the year when nature embraces the dry mountains of the western ghats with a green velvety floral carpet. The season after the rains brings with it a fine spray of mist over these hillocks. The drive along the mountain veers amidst fresh vegetation and serene valleys makes your experience even more invigorating.

Wednesday, October 3, 2007

Computer Vision (35): Segmentation Verses Stereo Correspondence

One question that always keeps rambling in my mind is if segmentation is a 2D or a stereo phenomenon. A 2D image on analysis can get us no more than a set of colors, edges and intensities. A segmentation algorithm even though would at its best try to exploit one or more of these image lineaments, would still fall short of what is expected from it. This is because when we define segmentation as a process of segregating the different objects in our surrounding or a given input image, not all them can be extracted using the combination of the cited features. A lot of times we might have to coalesce more than one segment to form an object and the rule to do this has kept our brains cerebrating for decades. Below image is a simple illustration of this.

Hope it doesn’t get difficult for your brain at least to get the contents in the image. On observing keenly, it shows a dog testing its olfactory system to find something good for its stomach. You can almost recognize the dog as a Dalmatian. Now I bet if anyone can get me a generalized segmentation algorithm that can extract the dog from this image!!!

Some people might argue that it’s almost impossible to achieve this from a 2D image, since there is no way to distinguish the plain of the dog from that of the ground. Remember, your brain has already done it! In a real scenario even if we come across such a view our stereo vision would ensure that the dog forms an image separate to the plain of the ground and hence would get segmented due to the variation in depth. Our brain can still do it from the 2D image here due to the tremendous amount of knowledge it has gathered over the years. In short, stereo vision helped us build this knowledge over the years and this knowledge is now helping us to segment objects even from a 2D image. The BIG question is, how do we do it in a computer?

Whenever I start to think about a solution to the problem of stereo correspondence the problem of segmentation would barricade it. This is why. The first step to understand or solve for stereo correspondence is to experiment with two cameras taking images at an offset. Below is a sample image.

It is very obvious that we cannot correspond the images pixel by pixel. Which blue pixel of the sky in the left image would you use to pair with a particular blue pixel in the right? Some pixels in the right would not correspond with the left and vice versa, but how do you know where these pixels are? This again loops us back to use some segmentation techniques to match similar objects in the two images, but I think we had just now concluded that segmentation was due to stereo!!!

Thursday, September 20, 2007

Computer Vision and Photography (34): The Focus Story Continues…

Was thinking of ways to solve the auto focus problem perfectly even for surfaces without any contrast and came up with a few proposals which I do not know are practical or not. Light that is converged by the lens we all know forms a 3D cone between the point from where it takes off and the surface of the lens towards the point. This cone will be right circular only for points on the optical axis. The lens only sees the 2D projection of the light points existing in the 3D space around it and so there can only be one point on the optical axis that the lens will be able to see at any instant of time. This is the point that I am trying to FOCUS on. I need to somehow come up with a technique to detect the rays emerging from a point on the optical axis.

One difference between a point on the optical axis and any other is that, rays emanating from the former point meet a circle of radius R drawn from the center of the lens, at the same angle.

But any point on the lens would receive light from all points visible around it. So at any point on the lens light rays will be converging from every possible angle, which leaves us with no way to pinpoint the ray that started from the optical axis.There are many more problems with this very way of thinking to solve the problem. From the perspective of the lens we never know where the real point is located on the optical axis. Different points from the surrounding space can create the same effect as though there was a real point at a different location on the optical axis. This indeed can happen continuously all along the axis! Assuming that the frequency of light reflected from a real point will almost be the same when it meets the circle and the probability of such a thing happening for a virtual point zero, the problem could be solved. But if you recall, the very reason why I started to think about this, was to get a solution to cases where there is zero contrast.

After a while I came across a theory called QED that solved a lot of these problems but kept the hardware required to achieve it out of our current technology’s reach. According to QED, a photon represents the “particle” of light, and its instantaneous phase the “wave” counterpart. This phase depends on the frequency of the light under consideration. A lens focuses light because the probability that the photons reach the focus point with the same phase is high and zero anywhere else. For more details refer to the book “QED: The Strange Theory of Light and Matter”. Putting the same theory into action for our current scenario, this would hold good only for a real particle. Since phase is something that repeats as the photon travels through space, the random points that form the virtual particle should be present at exact locations (again that can repeat in space) to meet the point “a”, all with the same phase!, which is highly improbable in a practical scenario. Now this should work for ZERO contrast!

Saturday, August 25, 2007

Computer Vision and Photography (33): Capturing stereo images using a single camera

Chak De INDIA

When I started my work on stereovision I used to download stereo images from the internet for my experiments. I didn’t have a camera then. By the time I could afford one, the statement “taking images at an offset” had become as synonymic as stereo images. Since I couldn’t afford two similar cameras for both cost and reasons of inutility I started to take my own stereo images by moving the camera a bit to the side (to create that offset) for shooting the second image. I even went to the extent of thinking to developing an attachment for my tripod to create a flat base for this movement! I always thought that research would make the mind sharper to thinking of new ways to deal with the subject, but seems like it didn’t work out in my case at that instant. Instead of developing an attachment for my tripod, why couldn’t I think of developing an attachment for my camera so that I could totally eliminate this requirement of moving it for shooting the second image? It didn’t take much time for this innovation to bloom in me and I immediately rushed to the nearby glass vendor to prepare this arrangement.

It is simple (figure only for conceptual understanding). Like our eyes, I keep two plane glasses at an offset and direct light to the lens of my camera through a pair of pair of 45 degree mirrors as shown in the figure. So in a single sensor I capture both the images, each on one half of it.

Given a camera and a requirement to take stereo images, this arrangement was anyone’s mind game. I thought simple things like these need not be documented. A few months after this finding I saw a paper on exactly the same concept from a university in Switzerland. Do we really need PhDs to build this school level optics? Now I know the reason behind the very poor numbers behind India’s contribution towards the world’s papers and patents. Are we too fainéant to put our findings on paper or do we think we are not up to the mark in creating new things when compared to others? That too when others are confident of such simple findings.

CHAK DE

INDIA

: FLAUNT YOUR FINDINGS

Saturday, August 11, 2007

Computer Vision (32): Monocular Cues

Under this topic, focus was the only one I wanted to drag a lot since it is very much required for my techniques on depth perception. I will mention a few other cues here that are even though very obvious to anyone, are required for the completeness of the topic. Monocular cues are better understood from photographs, so here’s one to explain all the three I will be mentioning:

Taking any one of the cars in this picture as reference we can very easily guess the relative position of the others in the image. This cue is called “Familiar size”. It works not only for similar objects but anything around you. Where this cue takes a beating sometimes there is another that drops in to resolve this issue; “Interposition”. On the left we have a red and a silver car projecting the same size even though both are not at the same depth. How do I know? My brain tells me that some portion of the red car is occluded by the silver one which means that the latter should be in front of it.

Bringing in the rest of the image you can see that the road in the above figure appears to get narrower farther it is considered from the camera. Taking this cue as reference you can almost separate the different regions in this image into their depth categories. The small hilly region on the right is farther away from the lake on the left. The fountain is definitely closer to the camera than the lake, etc. This is called “Linear Perspective”; the convergence of parallel lines as they move away from you.
All these cues supplemented with our knowledge will always give us if not accurate a misty information about depth even in a 2D scenario.

Thursday, July 26, 2007

Ideas and Technology: More Intelligent Alarms

From the time I started traveling by bus to office, I am wasting a minimum 2hrs every day. Minimum because, the time it takes to travel depends on various factors like the condition of the traffic and the driver. Some of them make it in just 50 min, while some take me on a long 90 min journey. With more traffic it only gets worse. Reading is not something I can do, due to low light in the evenings, so the best thing could be to take a nap. I tried this option a lot of times, but my mind always tries to be over careful so that I just don't miss my stop. The result; I only end up resting my eyes. Probably I could set off an alarm at 50 min starting from the departure time, but then on quite a few occasions it wakes me up midway whenever a slow driver blends with a bad traffic. Why can't alarms be more intelligent? How do I make sure my alarm gets active at almost the same time that my bus reaches a specific PLACE. There you are, my alarm should better track the location than time! A lot of people think of GPS whenever it comes to keeping track of the location. But then it would require that a GPS receiver be integrated into your most commonly used, all in one device; your mobile. If it is too complex then forget it, I don't want it to be very accurate for this application. I am OK with my alarm waking me up on reaching the nearest tower to my house, which can be done with conventional mobiles as they will know the location code of the towers. I think, if this was so simple, mobile companies would have implemented it way back, or probably it has done it but I am unaware of one? I found quite a few number of literatures on this topic on the net, but most of them just think of GPS when it comes to location tracking. Hope this is a much simpler way out. Well, for my problem at least!

Sunday, July 22, 2007

Computer Vision (31): "Seeing" through ears

Till a few days back even I wasn’t aware of the existence of such creatures in Nature. I had not even thought of trying out something like this, even though it has been years getting into researching in this field. Nature again outwitted us in its design and complexity. I am actually talking about creatures having ears at a vertical offset to extract yet another dimension; depth that our ears/brain fail to solve through hearing. The Great Horned Owl (Bubo Virginianus), the Barn Owl (Tyto Alba) and the Barred Owl are some of such Nature’s selected gifted creatures. This offset helps them to hone on a creature with more sensitivity and helps them hunt down creatures even in complete darkness. With this ability they don’t even spare creatures like mice that usually hide under snow and manage to escape from their sight. Evolution has created wonders in Nature. These predators usually live in regions with long and dark winters and hence have developed the ability to “see” through their ears.

But how does it all work? With just horizontal offset our ears manage to tell us the direction of sound in the 3D space. Imagine it to be an arrow being hit in that particular direction. You don’t know the distance of the target but just fire it in that direction. The arrow actually leaves from a point which is the horizontal bisector of your ears. Applying the same concept on vertical offset there will be another arrow leaving from a point which is the vertical bisector of the ears (in the case of these specially gifted creatures). From primary school mathematics we all know that two straight non parallel lines can only meet at one point in space, which in this case happens to be the target.
Even Nature can only produce best designs and not perfect ones and the Owls will definitely have to starve if their prey manages to remain silent. To make its design more reliable and worthy, Nature has never allowed a prey to have this very thought in its mind.

Saturday, July 21, 2007

Computer Vision (30): Why wasn't our face designed like this?

I have already touched upon the reasons behind having two sensors and their advantages (1). We now know why we were born with two ears, two nostrils and two eyes but only one mouth. What I failed to discuss at that point of time is their placement. I will concentrate more on the hearing and vision (placement of ears and eyes) which are better developed in electronics than the smell.

Our eyes as any layman will be aware of, is a 2D sensor similar to the sensors present in a camera (not design wise of course!). They capture the 2D projection of the 3D environment that we live in. In order to perceive depth (the 3rd dimension), our designers came up with the concept of two eyes (to determine depth through triangulation). For triangulation (2) to work the eyes only needed to be at an offset; horizontal, vertical, crossed anything would do. So probably the so called creator of humans (GOD for some and Nature for others) decided that they would place them horizontally to give a more symmetric look wrt our body. But why wasn’t it placed at the side of our head in place of our ears? (Hmmm, then where would our ears sit?).

Light, we know from our high school physics does not bend along the corners (I mean, a bulb in one room cannot light the other beside it), and from my earlier posts we know that to perceive depth our eyes need to capture some common region (the common region is where we perceive depth), they have to be placed on the same plane and parallel to it. This plane happened to be the plane of our frontal face and so our eyes came at the front where they are today.

Let’s move on to hearing now! When we hear some sound, our brain will be able to determine the direction of it (listen to some stereo sound), but will not be able to pin point the exact location (in terms of depth or distance from us) of the source. That is because our ears unlike our eyes are a single dimensional sensor able to detect only the intensity of sound (of course they can separate out the frequencies, but that is no way related to direction of sound) at any point in time. In order to derive the direction from which it came our creators/designers probably thought of reusing the same concept that they had come up for our sight and so gave us two ears (to derive the direction of sound through the difference in timing when it arrives at each of the ears). To derive the direction, our ears only needed to be at an offset; horizontal, vertical or crossed, so to give a symmetric look they probably thought of placing it at a horizontal offset. But why was it placed at the side of our face and not at the top like a rabbit, dog or any other creature?

Again from high school physics we know that sound can bend along the corners and pass through tiny gaps and spread out. So you can enjoy your music irrespective of where you are and where you are playing it in your house (well! if you could only adjust with its differing intensity). So our ears never demanded to be on the same plane and parallel to it! The common signals required to perceive the direction would anyway reach it irrespective of its placement since sound can bend. Secondly our ears were required to isolate the direction in the 360deg space unlike our eyes that only projects the frontal 180deg. Probably the best place to keep it was at the side of our face.

Our 2D vision was now capable of perceiving depth and our 1D hearing could locate the direction. Since our visual processing is one of the most complex design elements in our body and very well developed to perceive depth the designers never thought of giving any extra feature for any of our other sensors. But Nature has in fact produced creatures that have a different design to what we have, with ears at a vertical offset, on top of their head, etc, which I will be discussing in my next post.

References:

1. http://puneethbc.blogspot.com/2007/03/computer-vision-4.html
2. http://puneethbc.blogspot.com/2007/04/computer-vision-13.html

Wednesday, July 18, 2007

Photography: Effects of Aperture variation on Clarity

The image above is not related to the title, I will come to it at the end.

One thing I wanted to observe in the previous post photos (min and max aperture) is that apart from the increase in sharpness, min aperture should give us a true, more real to our sight colors and contrast. WHY? Applying the same "light cone" concept, smaller the opening/aperture, narrower will be the light cone, which means lesser mixing of the light from adjacent cones when they are collected at the sensor. The increase in DOF when the aperture is closed is infact the result of the light cone getting narrower. Ideally if the aperture is a single point, there will be no cone at all, since the base of the cone is now a point! This means there will be only one ray (assuming light to be a ray for simplicity) from every point in space able to pass through this "point aperture" to be captured on the sensor. If the sensor is infinite in resolution, every point in space will be represented by a pixel on the sensor, which would give you a truely represented image/projection of the surrounding space. Practically since the pixels are of finite dimension the aperture need not be a single point and keeping the aperture at its minimum should give you a more real picture. We can clearly observe this difference in the below images at the central bottom portion of the clouds where the light and dark regions are clearly seen in the snap taken with min aperture than the one with max aperture. But again the factors for the failure of my experiment mentioned in my earlier post might be responsible for clearing this idea! So I will not conclude till some authentic and satisfactory experiments are made :(Good article on DOF: http://en.wikipedia.org/wiki/Depth_of_field

Also I have a few sample images from my multishot image merging software here: http://www.flickr.com/photos/57078108@N00/I will be launching it as soon as the GUI gets ready!

Sunday, July 8, 2007

Photography: Effects of aperture variation on sharpness

Taken with f5.6

Taken with f36

Either my concept is junk or my experiment. My knowledge says that a wider aperture would give lesser sharpness in the image regions where planes other than the focus plane is projected on the sensor, due to the wider light cone from these planes. When these cones meet the sensor they will be more spread out when compared to the cones from the same region when the aperture is smaller. But my experiments to prove this are telling a different story altogether.

To make the difference more apparent I selected the max and min apertures available in my camera in telephoto and kept the focus plane fixed, still I see a wider aperture giving better sharpness in all regions. Still, at this point of time I am more towards my concepts being true. My experiment turns out to be junk because in order to verify my concepts, the experimental setup should have had no other variables other than the aperture settings between the two shots. Here are the two things that changed between the two shots unknowingly.

Here I tried to shoot the clouds with min and max apertures and the shutter speed that I got was 1/80 and 1/4000 respectively. Even though the shots were taken only a few seconds apart we can easily see that the clouds have significantly changed their pattern in this time. This places a high probability of more cloud movement in the lower aperture shot than the wider one, which might have caused more blurness in the former.
Both the shots were taken with the camera hand held from on top of a pretty high building under windy conditions, which might have caused more handshake and in turn blurring in the lower aperture shot than the higher one.

I am not giving excuses for backing my concepts, I did not retake the shots with a tripod because I wanted a more practical, day to day effect of these values in photography. So even though conceptually a lower aperture would give more sharpness (proof pending), during practical day to day photography it is better to get a higher shutter speed just enough to keep any unwanted movements off our sensor.

Sunday, June 24, 2007

DSLR's and Photography

Photography has become one of the passions of a lot of people these days, with a good number of them going for DSLRs. With DSLRs comes the interest to shoot in manual mode. Manual shooting involves as a basic step the adjustment of the aperture, shutter speed and ISO settings in the camera to get a proper exposure. The camera will have a pointer indicating the current exposure on a scale ranging from -2 through 0 to +2. If the needle points to -2 means that the photo is going to be underexposed, 0 is proper exposure and +2 over exposure. This again depends on the kind of metering selected for light evaluation. Let me put some points on the usage of each of these features:

1. Aperture: Wider you open the aperture, more blur the objects would be at depths starting from the plane of focus. Closing the aperture would make objects in a wider range from this plane to appear to be in focus. This is because closing of the aperture would make the light cone narrower.

2. Shutter Speed: This is about telling the camera how long the sensor needs to be exposed to light. Greater the shutter speed lesser the time it will be exposed. All other parameters being the same, the shutter speed should be less under low light and high during more light in the surrounding.

3. ISO: This tells the sensor how sensitive it has to be for light. More the value higher will be its sensitivity to light. So even under low light conditions the shutter speed can be kept high by selecting a higher ISO. But this comes at a price. Increasing the ISO causes random electrical activity in the sensor due to the shift from the normal linear region of operation.

Most people feel that it is enough to keep the exposure marker at 0 be it by adjusting the aperture, shutter speed or ISO, if they are not too keen on getting the depth of field. Shutter speed plays an important role while capturing moving objects. If you want them to appear to be relatively static, choose a higher shutter speed. But wrt the exposure to light, shutter speed almost gives a linear relationship, I mean the amount of light collected by the sensor in x sec will be almost half of the amount of light collected in 2x secs (unless of course it is not saturated). I dont want to talk about the ISO parameter simply because I started writing this article to explain the consequences of aperture variation on a photograph and so would like to stick to that. Will come up with some sample images in my next post to explain about the effects of aperture variation.

Wednesday, June 20, 2007

Photography and Travel: Bhimeshwari and Shivanasamudra

Had been to Bhimeshwari and Shivanasamudra last weekend. The fishing camp and rafting is actually conducted by JLR and they don't allow you inside their fenced campus without prior permission and we didn't have one. On my way to Shivanasamudra from Bhimeshwari we found cauvery water flowing close to the road and so got down to get some snaps. I found this strange creature on one of the rocks and wondered why it had this strange looks! When I went to photograph it, it escaped into one of the bushes and then I realised the importance of its dressing to disguise. WoW! How does nature correlate similar patterns in two totally different forms of life; one falling under the category of insects and the other being a plant. It's simply amazing!

Thursday, June 14, 2007

Photography, Programming and Algorithms: Merging Multishot Motion Images

I was busy working on a few algorithms these days, so could not put posts on a regular basis. So what was the algorithm about? Recently I was watching the movie Matrix on TV, especially the scene where he dodges the bullets. It was excellent! In this particular scene multiple images of his will be seen at a single instance if time. I started thinking if this was possible with a regular camera using the multishot capability or long exposure kind of stuff. I also started searching for techniques that are currently available to create such an effect and came across one called "stroboscopic technique" where in flashes of light will be used to illuminate a moving object over a dark background. But this cannot be used in our day today life in the actual/real environment or surrounding. Also long exposures cannot create this sharp effect. So, at first I tried to capture my matrix kind of motion using the multishot capability in DCs (3fps using my Canon 350D) and merged them to get the above effect. It looked wonderful, but the draw back was that the camera was held static during the capture and so, did not require any special software to do the merge. I just used Matlab to get this thing done. But what I wanted is a far more flexible stuff. Assume I go to watch a motion sport; something like a 20m diving competition (in swimming). I would want to capture the motion of this person from the start to the end, till he drops off into water and merge the complete set of images into a single one. Why would I want this? Simply because it is a motion sport and so, to get its complete effect I need to capture a motion sequence of it; something like a video. But suppose I want a poster of his motion or some of his important moves along his dive path in a single frame; today there is nothing I can do! Single picture frames don't give me the complete story, so I am not interested in it. I cant get a poster from the video I have captured either. Simple image averaging and differencing can create such effects if and only if the camera remained static during the capture. But unfortunately I don't want to put an extra burden of carrying a tripod, on the person who wants to capture such a shot. This means that the software should be able to merge images with a little offset. Also I cant expect the person to have very stable hands which means the output images would have also undergone a little bit of rotation. The software should take care of even this case :( Memory comes at a cost, and to create a single motion shot we would have captured 10s of stills, 10s of motion shots would require 100s of stills, which will quickly fill up the memory. So I would want this to be an embedded compatible, in camera software; which means it should make use of very less resources (both memory and time). I tried my best to come up with a software which would closely match the above requirements. I do not know what other requirements you people might have. If it is something that will impress me and be practical, I will definitely try to incorporate it in the coming versions (this one will be alpha-- still!). The software will be out shortly for you to test and create some jazzy stuff. Multishot of your own stunts, your favorite sports, etc, etc. Let me see how creative you people can get!

Tuesday, June 5, 2007

Computer Vision (29): Motion Segmentation

Motion segmentation is another concept that comes out of motion detection. As a newly born kid all you see around you is colors; colors that make no sense to you and you don't even know what color they are. One way in which our brain can start segmenting objects is through stereo correspondence. But again the process of stereo correspondence can be mechanical or knowledge based. If it is mechanical then we got to find how it can be done (I will discuss this later), if it is KB, our brain has to first of all learn how to correspond. So how does our brain start to segment objects? If you try to observe the point of vision of newly born kids they seem to be looking somewhere at a far off place which is the relaxed state of our eye. We need to interrupt its brain so that its visual system starts to concentrate on different things. This is the reason we get colorful toys that make interesting sounds and play it in front of them. Bright colors capture the sight of these kids and draw thier attention towards it. But if these objects are placed static the interrupts stop, so also the concentration. In order to keep up the interrupts and concentration you need to keep swaying it in front of them. This not only draws its attention but also helps it to catch up on the object through motion segmentation. You can now see that its eyes are actually pointing on the object you are playing with. After repeating this procedure for quite a few times you will see that it will freeze its sight to the object even if placed static. It has now started to update its knowledge! This knowledge helps it to segment objects from its background as the days pass by and finally they will start to grasp them. This is the onset of the perception of depth.

Wednesday, May 30, 2007

Photography and Travel: Ooty

I started riding on the high-class, highly-tempting, luxurious Bangalore-Mysore highway last weekend. You cannot control your fists from rolling the accelerator cable to the maximum extent possible. After reaching mysore we decided to go to Ooty via the Bandipur forest range, which had received rain reacently and was lush green all along till our destiny. The cool weather along with lush green forests and misty mountains was simply mind blowing and an unforgetable experience. Check out the snaps here: http://puneethbc.myphotoalbum.com/view_album.php?set_albumName=album17

Thursday, May 24, 2007

Computer Vision (28): Motion Detection

I know the topic on focus was too much to digest and keep up your concentration, so I decided to switch the topic a bit towards motion detection. I have only covered less than half of the full focus story, so will come back to it at a later time when I will explain one of my techniques to solve stereo correspondence.

Even though we have been able to build GHz processors and parallel computing systems, we are having a tough time matching the processing power of our brain. One reason for this is that our brain selectively processes the required information which we fail to do. On taking an image we do not know what region of the image has to be processed and so end up figuring out what each and every pixel present in huge megapixel image can mean or form. But that's not what our brain does. It selectively puts its power in only those regions where it is required the most. For example the recognition is performed as explained earlier only in the region of the fovea. Our brain will put its concentration on the rest of the regions only when some event is detected. This event is motion. A lot of smaller creature's are specialized mainly in this kind of processing which gives their even smaller brains the power of vision.

Motion detection is a concept that has been exploited in computer vision also. Then why are we still behind? When I say motion there are basically two things; motion caused due to our visual system being in motion (which involves our body motion also) and motion in the surrounding. How do we differentiate between the two? Motion in the surrounding is always limited to a certain region in space, and this small region motion detection raises an interrupt and draws the attention of our brain towards it. Motion detection is not the only thing that interrupts the brain, in fact the system that generates this interrupt doesn't even know what motion is! What it is only concerned with, is whether there was a change or not. So even a sudden flash of light can interrupt your brain, even though it is not moving.

In order to detect this kind of a change in computer vision systems we try to diff the current frame with the previous one. Any change would reflect as an edge in the resultant image which would give us the location of the change. This detected change is not necessarily an interrupt to our system because we still spend time looking for this edge in the entire image. Our computer vision systems no doubt capture the surrounding in parallel at once but the processing still takes place serially, which forces it to take a back seat compared to our brain.

Wednesday, May 16, 2007

Photography and Travel: Bellandur Lake

For people staying in Bangalore, if you do not have anything else to do, do pay a visit to this place during sunset and get some good snaps. This is a nice place to experiment with HDR photography. I could not get good HDR snaps, but managed to get a pretty okay long exposed night shot of the reflection. Thanks for the calm waters that made it possible. You can also get a pretty good panoramic image. For people not interested in photography there is nothing else you have here, so stay at home.

Computer Vision (27), Optics and Photography

The concept of the cone changes slightly, when light reflected from surfaces is taken into account. It is this light that we generally see/perceive in our surrounding, because objects reflect light and not produce light. This reflection is not the same in all the directions and hence the circular cross section of the cone will not be of uniform intensity and frequency (color). When we perceive it as a point it is the sum of all these different light rays that we are seeing.

If this sounds too complicated, just place a CD near you and try to observe a particular point where colors can be seen. From different view points you will be able to see different colors. This means that the same point on the CD is diffracting different colors. So if the aperture is big enough to accommodate all these colors, the color of the actual point will be the addition of all these. Out of focusing this point would reveal all the individual colors. One more example is the mirror, which I have already touched upon in my earlier posts. In the diagram shown above, the rectangle is the mirror and the circles are either you or your camera. Suppose you fix up a particular point on the mirror and move around it as shown in the figure you will be able to see different objects at the same selected point on the mirror. The mirror is reflecting light from different objects from the same point on it which you will be able to capture by moving around.

For all you photographers out there, bigger aperture might solve ISO problems, but depending on the aperture value you might end up getting a different color for the same pixel on your photograph. The color that your eyes see might not match the one that you get from a camera, even if you match the sensors exactly. This is because aperture also plays a role in color reproduction! Ideally you don't need a lens if the aperture of your camera is a single point, letting just a single ray of light from every point in space around it to reach the sensor. Why? You need a lens, to see a point in space as a point in the image. Normally why that is not possible without a lens is because the reflected light from objects is diverging. The lens actually does the job of converging these rays to a point, which is what focus is. When your aperture makes sure that only one ray is allowed from every point in space, there is no need to focus it! A proper image of your surrounding can be formed on the sensor without the lens. But for this to happen, your sensor should infact be very powerful to register these single rays as a visible and differentiable value.

Tuesday, May 15, 2007

Computer Vision (26) and Optics

Here's another set of images that demonstrate this crisscross nature of light cones. Here I placed the matchstick at the corner and blocked any chances of light crossing the stick and reaching the aperture of the lens. You can easily find the difference between the first and the third images. The missing sector of the circle has moved to the other side, from left top to bottom right.

Monday, May 14, 2007

Photography and Travel: Nagarhole

I had been to Nagarhole for two days. Sadly I found no wildlife for photography, but could only time freeze the wildness of some tamed creatures. It is a wonderful place near Hunsur and very close to Waynad in Kerala. If you want to stay at one of the forest dormitories or cottages you will have to book well in advance at the forest department office in Bangalore, Hunsur or Mysore, or you can stay at one of the private lodges in Kutta, which is very close to Nagarhole. Busses are not very frequent and so a private vehicle would be a right choice. Two wheelers are not allowed inside the forest gate. If you take a Qualis or a similar kind of a vehicle you can go on a safari on your own with a guide, else there are govt Eichers and private jeeps that will do the job. This place is around 5-6hrs journey from Bangalore, which is ideal for a weekend plan.

Friday, May 11, 2007

Computer Vision (25) and Optics

The light cone that I was describing till now will be observed when the actual focus point of the object lies beyond the sensor, i.e. the light rays from the object have still not converged when the plane of the sensor was encountered.

After the focus point is reached the rays crisscross and start diverging once again. Again this crisscrossing can be captured on the sensor by moving the focus point beyond the object.

The sequence of images below were taken by moving the focus point behind the object of interest; here the LED.

In the first image of the sequence, the focus point was moved just behind the LED and we see a similar image as when the focus point was placed between the matchstick and the LED. But now the rays have actually crisscrossed which is not observed here since the cone is symmetric. To demonstrate the crisscross nature, I placed an opaque object and covered the left half of the lens, which made the right semicircle of the circular projection of the cone, disappear! To come back to our proper cone I moved the focus point back to the matchstick and did the same experiment. Now covering the left portion of the lens masks the left semicircle of the LED! This means there no crisscross!

Wednesday, May 9, 2007

Computer Vision (24) and Optics

Even though light is traveling in 3D space, a sensor represents it on a 2D surface. Effectively what it captures is the state of light at a particular 2D plane, which is dependent on where the lens is focused. This is something that is unique; if you change the focus of your lens and the plane that you will be selecting to capture on your sensor will change automatically. Changing the plane means, selecting a plane at a different distance from the lens. This is why focus or accommodation is said to give the depth of the object when it is focused on to it.

If you closely observe the three sequence of pictures I had in my earlier post you will understand it easily. In the first image the focus point was at the match stick, and the LED was at a distance behind it. The light rays diverging from this source from the perspective of the aperture of the lens would be a 3D cone which will be truncated at the matchstick. This is what is giving you that circular patch. As I move the focus back, this circle gets smaller and the intensity increases. The light that is reflected and diverging from the matchstick is now captured at a different plane, which makes it blur. Finally, when the focus point is moved to the plane of the led, it is recovered completely, even though it was masked by the matchstick completely from the projection perspective of the camera. Due to further increase in the distance of the focus point, the matchstick becomes even more blur.

Tuesday, May 8, 2007

Computer Vision (23) and Optics

The best place to observe these things is in a mirror. You will be able to see any point around you at a specific place on the mirror by positioning yourself properly. This means that there are at least some rays from every point in space reaching the selected point on the mirror from where you are able to see that point in space.

I performed a series of experiments to understand focus and the behavior of light, which I will unravel here:

SECTION1: The green light source was placed at a certain distance from the match stick. Even though the match stick had completely blocked the 2D space or projection of the light source which was an led, it is completely recovered when the focus point is shifted from the match stick to the led.

From the perspective of our eye or the camera, the light source forms a 3D cone; the apex of which is at the source itself and the base at the lens or our eye. This is the reason you see a larger circle patch of green light when the match stick is focused, which is at a distance from the led. It is like truncating the 3D cone at a particular distance from its apex. Depending on at what distance from the apex you are truncating you will be getting circles of different diameters. Larger the diameter lesser will be the intensity of the light, because the energy has now spread out.

If you take the focus point to the surface of the lens, you will see that the diameter of the circle will be the same as the diameter of the aperture of the lens.

Sunday, May 6, 2007

Computer Vision (22) and Optics

If I take a point source and place it in space it would emit light spherically in all directions around it. You will be able to see a point, only if the rays from that point reach your eyes. This means that you will be able to see a point source from any place around it. If u just had a sensor (retina) and not the lens in your eye, these rays that are diverging and almost everywhere in space would fall all over the retina to form an image which would be a uniform light patch in your brain. The same applies to non light sources as well. You will be able to see an object only if the object is reflecting light in the direction you are seeing. Again an object can reflect light in almost any direction around it. Without the lens, the reflected light from many points around you can fall at the same place on the retina as shown below.

The intensity and frequency of the reflected light from these various points can be different and hence get summed up at a point on the retina. This scenario can happen for every pixel on the sensor and hence the image that you will get will just be the summation of the intensities and frequencies of the rays coming out from various points around you. As a result of this you will always end up with a uniform patch of light on the sensor if you try to take an image without a lens.

If you didn’t have a lens in your eyes, you would only be able to know the amount of light present in the surrounding and not the objects present in front of you. The various objects wouldn’t be distinguishable at all.

To see a point as a point, we need to converge the rays that are diverging from it, to a point again. The lens does exactly this. Your brain sees various objects around it as they are because your eye lens converge the rays coming from it on the retina.

Friday, April 27, 2007

Computer Vision (21) and Optics

Let's first understand what focus or accommodation is. We have been learning since our school days that light is a ray, wave, particle, it has a dual nature, etc, etc. And lately we have also known the famous and undebatable theory on light by Richard P Feynman et al, that the dual nature of light can be explained by considering light as a particle having an instantaneous phase (which gives it the properties of a wave) associated with it. This theory is called the QED; Quantum Electro Dynamics and it promises to combine the particle and wave theories of light into one single entity that can explain almost all phenomenon of light to its highest possible accuracy. Interested people can read his own book called "QED: The Strange Theory of Light and Matter", which has his famous lectures.

But why am saying all this? When I focus light, I am bringing together light from a region to a point and to analyze this I need to decide on the theory. For now I will not get into the complex QED, but will try to justify my experiments with simple ray diagrams. Will come to QED after some more posts.

Imagine that we did not have eye ball or in other words the lens of our eyes, but just had the retina to capture the light from the surrounding. How would the surrounding appear to you? There is one way to experiment this (ofcourse not by removing the lens of your eyes :)). If you have a webcam just try removing the lens of it and switch it on exposing the retina, sorry the sensor to the surrounding. What do you see?

Monday, April 23, 2007

Photography and Travel

Kumara Parvatha, also known as KP is one of the tallest peaks in Karnataka. It is a real challenge to trek this in a day. The interesting part is that you can get down from a different route to the one you took to climb. This makes the trek even more interesting. This photo was taken at the top, where we reached a bit late in the morning (the next day we started to climb :( ). Never the less it is heaven out there at any time. This place is situated near subramanya. You can get more info on it here: http://www.kumaraparvathaconquered.blogspot.com

Saturday, April 21, 2007

Computer Vision (20)

Even though a lot of people believe that a stereogram is exactly equivalent to seeing with both the eyes there is one major difference. Stereograms are generally shot by moving the camera horizontally by a short distance (in case of a single camera system) or by keeping two cameras side by side, which capture the horizontal disparity. Suppose there are two infinitely long horizontal bars, one at a certain distance from the other (both horizontally and vertically) and nothing else around it and you take a stereo image of this, with the camera taking the projection of their lengths, you will fail to capture the horizontal disparity, because there is none in this direction.

A camera takes the horizontal projection of objects (horizontal line pointing towards you), and so the distance between the horizontal bars along this direction cannot be shown in this 2D image. The vertical distance between them is ‘v’. In other words, if we try to capture this 3D setup in a stereo image pair to get the horizontal depth between the bars you will end up with exactly the same image in the left and right. There is no use seeing it stereoscopically, because, which point in the two images will the brain correspond? Since the camera is moved horizontally, the vertical distance between the two bars remains the same in the stereo image.

In a real scenario, how do our eyes and brain together manage to catch the right point? I mean, form a triangle and get the depth out of it. This is possible because, in addition to just 2D projection our eyes collect in real time one more parameter; focus. Focus is exactly the same as accommodation that I was describing in monocular cues. Our eye has to accommodate itself to focus (see sharply) objects at different depths. When object at one depth is seen sharply depending on the aperture of our eyes objects at other distance will be blur. This means that focus or accommodation is dependent on depth and unique for every distance from the eye. So the accommodation value would actually give the absolute depth of them object.

Photography

To give the readers a change from the technical stuff I had thought of posting some other things also, so here it is. This is a place called Devarayana Durga in Tumkur around 70 Km from Bangalore. Its a nice place to visit for a day, but it was already 4 when we left Bangalore and so reached there just right at the sunset. Managed to capture the last few glimpses of the sun for that day. If you guys plan to go there better leave a bit early.

Sunday, April 15, 2007

Computer Vision (19)

Disparity is a must to perceive depth in a stereoimage pair and so our brain needs at least two separated points with disparity to extract the distance between them. Disparity at a later stage would use triangulation to perceive depth, but this triangle would depend on the separation between the images and not the depth of the actual object. The below image illustrates triangulation from disparity when a stereogram is cross viewed.

The red lines are traced when the eyes combine the rectangle and the green lines when they combine the circle. The point of intersection of the red lines gives the 3D location of the rectangle and the green lines that of the circle. As mentioned earlier the circle is in front of the rectangle when cross viewed. One of the points for the formation of the triangle comes from the point of intersection of either the red or the green lines and the other two points are the two eyes. The distance of the point of intersection of lines from the two eyes (d), depends on the separation between the images, so the absolute distance of the objects remains unknown in the stereo image pair. The relative depth of different objects from one another is obtained by corresponding objects form the two images, which moves the point of intersection of the lines according to the 3D placement of the objects (similar to red and green lines).

This is not the case when we extract depth from the actual 3D surrounding because our eye makes use of triangulation from the convergence of the eyes and not disparity. Our eyes assist to perceive the absolute depth of our surrounding while in stereograms we can only perceive the relative depth of one object from the other.

Saturday, April 14, 2007

Computer Vision (18)

Disparity is something that is required to perceive depth from 2D stereo image pairs. For example, to create a stereo image pair in a computer as shown below, I just placed a rectangle and a circle one beside the other in the left image, copied the same thing for the right image as well, and then increased the distance between the rectangle and the circle in the right image.

The 3D interpretation of it is as follows. The image seen below is the top view of the 3D space whose 2D projection is shown above. On cross viewing it, you would see the circle in front of the rectangle. Cross viewing a stereogram means, your left eye would see the image on the right and your right eye the one on the left.

The dotted lines are the angle of view of the eyes (not to scale). The blue lines are the projection lines of the objects on the respective eyes. Since the eyes are placed at some distance from one another the projection of the objects in 3D space will always be different on both the eyes, except when the objects are on the vertical bisector. This difference in the projection lengths is what disparity is.

Disparity = length of the red line - length of the green line.

From the diagram it is clearly evident why the distance between the rectangle and the circle in the right image (red line) is kept greater than the left image (green line) to recreate this 3D effect in the brain when viewed stereoscopically. Think about how the gap should be to view the circle behind the rectangle.