Depth perception through stereo imaging: May 2007

Wednesday, May 30, 2007

Photography and Travel: Ooty

I started riding on the high-class, highly-tempting, luxurious Bangalore-Mysore highway last weekend. You cannot control your fists from rolling the accelerator cable to the maximum extent possible. After reaching mysore we decided to go to Ooty via the Bandipur forest range, which had received rain reacently and was lush green all along till our destiny. The cool weather along with lush green forests and misty mountains was simply mind blowing and an unforgetable experience. Check out the snaps here: http://puneethbc.myphotoalbum.com/view_album.php?set_albumName=album17

Thursday, May 24, 2007

Computer Vision (28): Motion Detection

I know the topic on focus was too much to digest and keep up your concentration, so I decided to switch the topic a bit towards motion detection. I have only covered less than half of the full focus story, so will come back to it at a later time when I will explain one of my techniques to solve stereo correspondence.

Even though we have been able to build GHz processors and parallel computing systems, we are having a tough time matching the processing power of our brain. One reason for this is that our brain selectively processes the required information which we fail to do. On taking an image we do not know what region of the image has to be processed and so end up figuring out what each and every pixel present in huge megapixel image can mean or form. But that's not what our brain does. It selectively puts its power in only those regions where it is required the most. For example the recognition is performed as explained earlier only in the region of the fovea. Our brain will put its concentration on the rest of the regions only when some event is detected. This event is motion. A lot of smaller creature's are specialized mainly in this kind of processing which gives their even smaller brains the power of vision.

Motion detection is a concept that has been exploited in computer vision also. Then why are we still behind? When I say motion there are basically two things; motion caused due to our visual system being in motion (which involves our body motion also) and motion in the surrounding. How do we differentiate between the two? Motion in the surrounding is always limited to a certain region in space, and this small region motion detection raises an interrupt and draws the attention of our brain towards it. Motion detection is not the only thing that interrupts the brain, in fact the system that generates this interrupt doesn't even know what motion is! What it is only concerned with, is whether there was a change or not. So even a sudden flash of light can interrupt your brain, even though it is not moving.

In order to detect this kind of a change in computer vision systems we try to diff the current frame with the previous one. Any change would reflect as an edge in the resultant image which would give us the location of the change. This detected change is not necessarily an interrupt to our system because we still spend time looking for this edge in the entire image. Our computer vision systems no doubt capture the surrounding in parallel at once but the processing still takes place serially, which forces it to take a back seat compared to our brain.

Wednesday, May 16, 2007

Photography and Travel: Bellandur Lake

For people staying in Bangalore, if you do not have anything else to do, do pay a visit to this place during sunset and get some good snaps. This is a nice place to experiment with HDR photography. I could not get good HDR snaps, but managed to get a pretty okay long exposed night shot of the reflection. Thanks for the calm waters that made it possible. You can also get a pretty good panoramic image. For people not interested in photography there is nothing else you have here, so stay at home.

Computer Vision (27), Optics and Photography

The concept of the cone changes slightly, when light reflected from surfaces is taken into account. It is this light that we generally see/perceive in our surrounding, because objects reflect light and not produce light. This reflection is not the same in all the directions and hence the circular cross section of the cone will not be of uniform intensity and frequency (color). When we perceive it as a point it is the sum of all these different light rays that we are seeing.

If this sounds too complicated, just place a CD near you and try to observe a particular point where colors can be seen. From different view points you will be able to see different colors. This means that the same point on the CD is diffracting different colors. So if the aperture is big enough to accommodate all these colors, the color of the actual point will be the addition of all these. Out of focusing this point would reveal all the individual colors. One more example is the mirror, which I have already touched upon in my earlier posts. In the diagram shown above, the rectangle is the mirror and the circles are either you or your camera. Suppose you fix up a particular point on the mirror and move around it as shown in the figure you will be able to see different objects at the same selected point on the mirror. The mirror is reflecting light from different objects from the same point on it which you will be able to capture by moving around.

For all you photographers out there, bigger aperture might solve ISO problems, but depending on the aperture value you might end up getting a different color for the same pixel on your photograph. The color that your eyes see might not match the one that you get from a camera, even if you match the sensors exactly. This is because aperture also plays a role in color reproduction! Ideally you don't need a lens if the aperture of your camera is a single point, letting just a single ray of light from every point in space around it to reach the sensor. Why? You need a lens, to see a point in space as a point in the image. Normally why that is not possible without a lens is because the reflected light from objects is diverging. The lens actually does the job of converging these rays to a point, which is what focus is. When your aperture makes sure that only one ray is allowed from every point in space, there is no need to focus it! A proper image of your surrounding can be formed on the sensor without the lens. But for this to happen, your sensor should infact be very powerful to register these single rays as a visible and differentiable value.

Tuesday, May 15, 2007

Computer Vision (26) and Optics

Here's another set of images that demonstrate this crisscross nature of light cones. Here I placed the matchstick at the corner and blocked any chances of light crossing the stick and reaching the aperture of the lens. You can easily find the difference between the first and the third images. The missing sector of the circle has moved to the other side, from left top to bottom right.

Monday, May 14, 2007

Photography and Travel: Nagarhole

I had been to Nagarhole for two days. Sadly I found no wildlife for photography, but could only time freeze the wildness of some tamed creatures. It is a wonderful place near Hunsur and very close to Waynad in Kerala. If you want to stay at one of the forest dormitories or cottages you will have to book well in advance at the forest department office in Bangalore, Hunsur or Mysore, or you can stay at one of the private lodges in Kutta, which is very close to Nagarhole. Busses are not very frequent and so a private vehicle would be a right choice. Two wheelers are not allowed inside the forest gate. If you take a Qualis or a similar kind of a vehicle you can go on a safari on your own with a guide, else there are govt Eichers and private jeeps that will do the job. This place is around 5-6hrs journey from Bangalore, which is ideal for a weekend plan.

Friday, May 11, 2007

Computer Vision (25) and Optics

The light cone that I was describing till now will be observed when the actual focus point of the object lies beyond the sensor, i.e. the light rays from the object have still not converged when the plane of the sensor was encountered.

After the focus point is reached the rays crisscross and start diverging once again. Again this crisscrossing can be captured on the sensor by moving the focus point beyond the object.

The sequence of images below were taken by moving the focus point behind the object of interest; here the LED.

In the first image of the sequence, the focus point was moved just behind the LED and we see a similar image as when the focus point was placed between the matchstick and the LED. But now the rays have actually crisscrossed which is not observed here since the cone is symmetric. To demonstrate the crisscross nature, I placed an opaque object and covered the left half of the lens, which made the right semicircle of the circular projection of the cone, disappear! To come back to our proper cone I moved the focus point back to the matchstick and did the same experiment. Now covering the left portion of the lens masks the left semicircle of the LED! This means there no crisscross!

Wednesday, May 9, 2007

Computer Vision (24) and Optics

Even though light is traveling in 3D space, a sensor represents it on a 2D surface. Effectively what it captures is the state of light at a particular 2D plane, which is dependent on where the lens is focused. This is something that is unique; if you change the focus of your lens and the plane that you will be selecting to capture on your sensor will change automatically. Changing the plane means, selecting a plane at a different distance from the lens. This is why focus or accommodation is said to give the depth of the object when it is focused on to it.

If you closely observe the three sequence of pictures I had in my earlier post you will understand it easily. In the first image the focus point was at the match stick, and the LED was at a distance behind it. The light rays diverging from this source from the perspective of the aperture of the lens would be a 3D cone which will be truncated at the matchstick. This is what is giving you that circular patch. As I move the focus back, this circle gets smaller and the intensity increases. The light that is reflected and diverging from the matchstick is now captured at a different plane, which makes it blur. Finally, when the focus point is moved to the plane of the led, it is recovered completely, even though it was masked by the matchstick completely from the projection perspective of the camera. Due to further increase in the distance of the focus point, the matchstick becomes even more blur.

Tuesday, May 8, 2007

Computer Vision (23) and Optics

The best place to observe these things is in a mirror. You will be able to see any point around you at a specific place on the mirror by positioning yourself properly. This means that there are at least some rays from every point in space reaching the selected point on the mirror from where you are able to see that point in space.

I performed a series of experiments to understand focus and the behavior of light, which I will unravel here:

SECTION1: The green light source was placed at a certain distance from the match stick. Even though the match stick had completely blocked the 2D space or projection of the light source which was an led, it is completely recovered when the focus point is shifted from the match stick to the led.

From the perspective of our eye or the camera, the light source forms a 3D cone; the apex of which is at the source itself and the base at the lens or our eye. This is the reason you see a larger circle patch of green light when the match stick is focused, which is at a distance from the led. It is like truncating the 3D cone at a particular distance from its apex. Depending on at what distance from the apex you are truncating you will be getting circles of different diameters. Larger the diameter lesser will be the intensity of the light, because the energy has now spread out.

If you take the focus point to the surface of the lens, you will see that the diameter of the circle will be the same as the diameter of the aperture of the lens.

Sunday, May 6, 2007

Computer Vision (22) and Optics

If I take a point source and place it in space it would emit light spherically in all directions around it. You will be able to see a point, only if the rays from that point reach your eyes. This means that you will be able to see a point source from any place around it. If u just had a sensor (retina) and not the lens in your eye, these rays that are diverging and almost everywhere in space would fall all over the retina to form an image which would be a uniform light patch in your brain. The same applies to non light sources as well. You will be able to see an object only if the object is reflecting light in the direction you are seeing. Again an object can reflect light in almost any direction around it. Without the lens, the reflected light from many points around you can fall at the same place on the retina as shown below.

The intensity and frequency of the reflected light from these various points can be different and hence get summed up at a point on the retina. This scenario can happen for every pixel on the sensor and hence the image that you will get will just be the summation of the intensities and frequencies of the rays coming out from various points around you. As a result of this you will always end up with a uniform patch of light on the sensor if you try to take an image without a lens.

If you didn’t have a lens in your eyes, you would only be able to know the amount of light present in the surrounding and not the objects present in front of you. The various objects wouldn’t be distinguishable at all.

To see a point as a point, we need to converge the rays that are diverging from it, to a point again. The lens does exactly this. Your brain sees various objects around it as they are because your eye lens converge the rays coming from it on the retina.

Depth perception through stereo imaging