Depth perception through stereo imaging: April 2007

Friday, April 27, 2007

Computer Vision (21) and Optics

Let's first understand what focus or accommodation is. We have been learning since our school days that light is a ray, wave, particle, it has a dual nature, etc, etc. And lately we have also known the famous and undebatable theory on light by Richard P Feynman et al, that the dual nature of light can be explained by considering light as a particle having an instantaneous phase (which gives it the properties of a wave) associated with it. This theory is called the QED; Quantum Electro Dynamics and it promises to combine the particle and wave theories of light into one single entity that can explain almost all phenomenon of light to its highest possible accuracy. Interested people can read his own book called "QED: The Strange Theory of Light and Matter", which has his famous lectures.

But why am saying all this? When I focus light, I am bringing together light from a region to a point and to analyze this I need to decide on the theory. For now I will not get into the complex QED, but will try to justify my experiments with simple ray diagrams. Will come to QED after some more posts.

Imagine that we did not have eye ball or in other words the lens of our eyes, but just had the retina to capture the light from the surrounding. How would the surrounding appear to you? There is one way to experiment this (ofcourse not by removing the lens of your eyes :)). If you have a webcam just try removing the lens of it and switch it on exposing the retina, sorry the sensor to the surrounding. What do you see?

Monday, April 23, 2007

Photography and Travel

Kumara Parvatha, also known as KP is one of the tallest peaks in Karnataka. It is a real challenge to trek this in a day. The interesting part is that you can get down from a different route to the one you took to climb. This makes the trek even more interesting. This photo was taken at the top, where we reached a bit late in the morning (the next day we started to climb :( ). Never the less it is heaven out there at any time. This place is situated near subramanya. You can get more info on it here: http://www.kumaraparvathaconquered.blogspot.com

Saturday, April 21, 2007

Computer Vision (20)

Even though a lot of people believe that a stereogram is exactly equivalent to seeing with both the eyes there is one major difference. Stereograms are generally shot by moving the camera horizontally by a short distance (in case of a single camera system) or by keeping two cameras side by side, which capture the horizontal disparity. Suppose there are two infinitely long horizontal bars, one at a certain distance from the other (both horizontally and vertically) and nothing else around it and you take a stereo image of this, with the camera taking the projection of their lengths, you will fail to capture the horizontal disparity, because there is none in this direction.

A camera takes the horizontal projection of objects (horizontal line pointing towards you), and so the distance between the horizontal bars along this direction cannot be shown in this 2D image. The vertical distance between them is ‘v’. In other words, if we try to capture this 3D setup in a stereo image pair to get the horizontal depth between the bars you will end up with exactly the same image in the left and right. There is no use seeing it stereoscopically, because, which point in the two images will the brain correspond? Since the camera is moved horizontally, the vertical distance between the two bars remains the same in the stereo image.

In a real scenario, how do our eyes and brain together manage to catch the right point? I mean, form a triangle and get the depth out of it. This is possible because, in addition to just 2D projection our eyes collect in real time one more parameter; focus. Focus is exactly the same as accommodation that I was describing in monocular cues. Our eye has to accommodate itself to focus (see sharply) objects at different depths. When object at one depth is seen sharply depending on the aperture of our eyes objects at other distance will be blur. This means that focus or accommodation is dependent on depth and unique for every distance from the eye. So the accommodation value would actually give the absolute depth of them object.

Photography

To give the readers a change from the technical stuff I had thought of posting some other things also, so here it is. This is a place called Devarayana Durga in Tumkur around 70 Km from Bangalore. Its a nice place to visit for a day, but it was already 4 when we left Bangalore and so reached there just right at the sunset. Managed to capture the last few glimpses of the sun for that day. If you guys plan to go there better leave a bit early.

Sunday, April 15, 2007

Computer Vision (19)

Disparity is a must to perceive depth in a stereoimage pair and so our brain needs at least two separated points with disparity to extract the distance between them. Disparity at a later stage would use triangulation to perceive depth, but this triangle would depend on the separation between the images and not the depth of the actual object. The below image illustrates triangulation from disparity when a stereogram is cross viewed.

The red lines are traced when the eyes combine the rectangle and the green lines when they combine the circle. The point of intersection of the red lines gives the 3D location of the rectangle and the green lines that of the circle. As mentioned earlier the circle is in front of the rectangle when cross viewed. One of the points for the formation of the triangle comes from the point of intersection of either the red or the green lines and the other two points are the two eyes. The distance of the point of intersection of lines from the two eyes (d), depends on the separation between the images, so the absolute distance of the objects remains unknown in the stereo image pair. The relative depth of different objects from one another is obtained by corresponding objects form the two images, which moves the point of intersection of the lines according to the 3D placement of the objects (similar to red and green lines).

This is not the case when we extract depth from the actual 3D surrounding because our eye makes use of triangulation from the convergence of the eyes and not disparity. Our eyes assist to perceive the absolute depth of our surrounding while in stereograms we can only perceive the relative depth of one object from the other.

Saturday, April 14, 2007

Computer Vision (18)

Disparity is something that is required to perceive depth from 2D stereo image pairs. For example, to create a stereo image pair in a computer as shown below, I just placed a rectangle and a circle one beside the other in the left image, copied the same thing for the right image as well, and then increased the distance between the rectangle and the circle in the right image.

The 3D interpretation of it is as follows. The image seen below is the top view of the 3D space whose 2D projection is shown above. On cross viewing it, you would see the circle in front of the rectangle. Cross viewing a stereogram means, your left eye would see the image on the right and your right eye the one on the left.

The dotted lines are the angle of view of the eyes (not to scale). The blue lines are the projection lines of the objects on the respective eyes. Since the eyes are placed at some distance from one another the projection of the objects in 3D space will always be different on both the eyes, except when the objects are on the vertical bisector. This difference in the projection lengths is what disparity is.

Disparity = length of the red line - length of the green line.

From the diagram it is clearly evident why the distance between the rectangle and the circle in the right image (red line) is kept greater than the left image (green line) to recreate this 3D effect in the brain when viewed stereoscopically. Think about how the gap should be to view the circle behind the rectangle.

Friday, April 13, 2007

Computer Vision (17)

Of all the monocular cues that were mentioned in my earlier post, there are just two of them that interest me a lot; motion parallax and accommodation. Motion parallax as explained in the links I had referred to, can easily be observed when you are traveling (say in a train). Objects that are closer to you appear to move faster than the objects that are farther away. If you have understood my post on triangulation, motion parallax is no new concept!
Motion parallax which is a monocular cue is conceptually similar to stereovision which is a binocular cue, in the sense that both of them are perceived due to disparity. In case of motion parallax, to perceive depth along a particular direction you have to move parallel to it. When you are moving in a train, you only capture horizontal disparity between the objects, in the same way as in stereovision we perceive disparity in the direction parallel to the line along which our eyes are placed at that point of time. The first image below is a stereo image pair made into a gif and the second shows motion parallax.

A good animation on motion parallax:http://psych.hanover.edu/KRANTZ/MotionParallax/MotionParallax.html

Motion parallax is in fact mimicking stereovision but at two different instants of time. Imagine that instead of me, I placed a video camera and shot my train journey. If I extracted any two consecutive frames from it I would have got a stereo image pair, one taken after a small delay delta compared to the other. In the case of our eye these two images are captured at the same instant of time, while in the motion parallax case it is equivalent to moving the camera to the second eye’s place to capture the second of the stereo image pair. So, when disparity can solve for depth between a stereoimage pair, why not in case of motion parallax?

You can get more stereo images as shown above here: http://www.well.com/~jimg/stereo/stereo_list.html

Thursday, April 12, 2007

Computer Vision (16)

There is some more information left out which are not related to stereo but depth, which I want to mention before proceeding further.
All the above kinds of depth perception require two images or in other words two eyes, and disparity forms the main cue to perceive depth. Such cues that the brain uses are called binocular cues. There is another category of cues that our brain uses a lot to guess depth in single images, known as monocular cues. Monocular cues are the result of the enormous amount of knowledge our brain has acquired over the years. Monocular cues help us to perceive depth in 2D images. Some links to know more about monocular cues:

http://webvision.med.utah.edu/KallDepth.html

http://ahsmail.uwaterloo.ca/kin356/cues/mcues.htm

Monday, April 9, 2007

Computer Vision (15)

Now that we know triangulation can give us depth, how do we replicate this behavior in our computer based system? As I have mentioned earlier, the camera sensors that we use are not like our retina which has high density only in the region of the macula and less elsewhere. These sensors have a uniform density of pixels and so the object of interest need not be placed at the center of it, it only needs to be captured somewhere in its area. This eliminates the need to move the camera over an axis as done in the case of our eye. The complexity comes after the images are captured. An example of a stereo image pair is shown in the image below.

In order to get a triangle out of an object or a point we have to find the corresponding matches of that point in the two images. This process is called stereo correspondence. The one reason I love this field is because it has no standards restricting you in any way. You just need to understand the problem and then you are free to come up with solutions and techniques to solve it. The problem in front of you is; for each and every point in one image how do you find the corresponding point in the other. I want to keep your minds fresh and open for new ideaz, so I won’t be detailing on the currently available techniques, because there are not one or two, but many! I strongly believe that to solve the problems of nature you just need to have an open mind to think in new ways. I want you all to give a deep thought on this problem before even trying to google for what’s already been cooked. I can assure that you still have a chance to come up with your own perfect recipe even though it’s been worked out since ages.

This was the end of my introduction to “depth perception through stereo imaging”. As I dive much deeper into this problem, try to think about different ways in which you can solve it. As you go on reading my posts from now on you will find that a lot of good techniques that you had thought about wouldn’t really work in many cases. I will reveal the different dimensions to solving this problem along with the merits and demerits of each of them. Also open up a parallel thread and try to know what all people have been able to think of till now (You will get to know that you are not far off).

After having thought of it for so many years, I am just waiting for my brain to answer its own call!

Saturday, April 7, 2007

Computer Vision (14)

If you closely observe the first diagram in my earlier post, you will see a triangle formed whenever the two eyes see an object. The three lines that form it are; the line joining the left eye and the object it is currently seeing, the line joining the right eye and the object it is currently seeing and the line joining the two eyes. From the perspective of the eyes, they don’t know where the object is, because each of the eyes has only got a 2D projection of the surrounding including the object it is currently seeing. Assuming that there is a central system controlling the movement of the eyes, this is what it knows about them. The length of the line joining the two eyes is always a constant. In order to see an object it has to be placed on the macula of the retina and hence it knows the angle at which the eyes have converged. We can easily solve for the third point which is where the object is placed and hence we know its distance, which is the depth we are trying to perceive.

Knowing the distance between the eyes (D) and the angle of view of both the eyes (theta1 and theta2), we can always extend the two lines (shown dotted) to meet at a point (O). The perpendicular drawn from O to the line joining the eyes is the depth of the object from your eyes. Since the two eyes always see a common point, the lines emanating from them always converge and make sure that a triangle is formed for an object anywhere in the common 3D space.

Friday, April 6, 2007

Computer Vision (13)

By now, you all must have understood that to perceive 3D from 2D image(s), the image(s) need(s) to contain similar objects with disparity. How this disparity creates the sensation of depth in our brain is by triangulation, which I will be discussing now.

The vertical line in the below diagram is the bisector between the two eyes. The square, ellipse and circle are three different objects placed at different depths from the eyes. So, what you are viewing here is the top view of the objects along with your eyes. I have not shown the movement of the eye to see the different objects shown here just to keep the diagram simple. The objects are placed on the vertical line just to get a symmetric image on the sensor and to reduce the complexity of the drawings. The dotted lines give the field of view of each eye. To get a better understanding of whatever I am trying to explain here, I suggest the reader to try these out practically as and when he/she reads through it. This will make you understand the concepts very clearly.

The below diagram gives the 2D projection of the 3D environment shown above, that your eyes send to your brain. For the right eye the image of the square is always to its leftmost followed by the ellipse and the circle. For the left eye the image of the square is always to its rightmost followed by the ellipse and the circle. The left column in the image below is the image captured by the left eye (objects are marked with an ‘l’ on top), the right column is the image formed by the right eye (objects are marked with an ‘r’ on top) and the central column is the combined image formed in the brain (‘l’ is the image that has come from the left eye and ‘r’ is the image that has come from the right eye). ‘op’ in the diagram means the overlap point, the region where the two images are combined, which in our case is the macula. Let me explain it in 3 different cases:

When your eyes look at the square, the square is the region of overlap in the brain and therefore the square forms the center. Other objects are moved to the sides as named in the diagram. Imagine sliding the left and the right images close to each other such that the squares are placed one over the other.
When the eyes look at the ellipse, the ellipse forms the center, which is obtained by sliding the two images more towards each other so that the ellipse forms the center. Here the square from the right eye and the circle from the left eye and the circle from the right eye and the square from the left eye overlap each other. They are shown one above the other for the sake of clarity. How does our brain deal with the overlap of dissimilar objects? Will it average the two or suppress one of them? This is again binocular rivalry about which I will be posting later.
When the eyes see the circle, the circle forms the center, which is obtained by sliding the two images further towards each other to overlap on the circle.

Try these out practically and verify the results. Ooops! this post has grown so long, which I always try to avoid but…. Anyway I will come to triangulation in my next post once again

Monday, April 2, 2007

Photography with Computer Vision (12)

If you haven’t been successful in viewing stereograms (either cross or parallel) and want to give it up, here’s a simpler technique to get the same experience without straining your eyes. It is called the anaglyph technique. Here, the two photographs are overlapped before hand, so you don’t need to strain your eyes to view it. Instead you use a color filter to view them. Here’s a link for some examples.

http://www.rainbowsymphony.com/mars-3d-gallery.html

Go to the above link and observe the images before you move further. The concept works like this; each color filter that your glasses have should match the color component preserved in one of the two images. The single image that you see in this link is created by taking red component form one image and green component from the other and overlapping them. For example if you are using a red-blue glass combination, one of the images should have blue and not red component in it and the other should have red and not blue component. I assume that you all know a color image is a mixture of three layers; Red, Green and Blue. When you overlap the two components it will look blurry without the glasses due to the disparity present in stereo images. When you look through these glasses one of the components in the image will be filtered by each of the eye piece and so the same image will not reach both the eyes, which your brain resolves to perceive depth. It is equivalent to seeing two images either crossed or parallel. Now you know why they give you these colored glasses when you go to watch a 3D movie.

People interested in photography can take their own 3D photographs using their single 2D camera. Landscape photographers would be very excited to take such photographs, because it is very difficult to reproduce the 3D landscape effect in a 2D photo. Here are some of the ways to do it.

http://www.feargod.net/3dhowto.php
http://www.funsci.com/fun3_en/stscp/stscp.htm

If you find the explanation in the above links too complex let me know, I can put it in a much simpler way.

Sunday, April 1, 2007

Computer Vision (11)

Practice, practice and practice till you learn to see stereograms. You might strain your eyes a lot during this process, but I bet it is worth it. If you have chosen to work in this field you have to take this up seriously. There are different kinds of stereograms available, photographs, computer generated, random dot, etc, etc. Use whichever you are comfortable with. Use smaller images initially; you will not have to cross your eyes too much. Also I feel cross stereo is much simpler to learn than parallel stereo, because crossing your eyes is easier to moving them apart (at least for me). Here are some of the links from where you can find these different kinds of stereograms.

http://www.cut-the-knot.org/Curriculum/Geometry/Stereo.shtml
http://www.eyetricks.com/3dstereo.htm

http://www.focusillusion.com/YuryGallery/Yury01.php

I have created a simple stereogram here in which the circle appears to be in front of the rectangle when viewed stereoscopically (crossed).

Once you have learnt to see stereograms, there is a small observation you will have to make. You can actually do it in the above diagram itself. The gap between the circle and the rectangle is not the same in the two images. This change in the relative distance between two objects in the images is what is called disparity. Solving for depth between the two images is actually solving for this disparity. So you cannot overlap these two images one above the other to fit both the objects perfectly. When your brain combines these two images the disparity that exists between them is converted to depth. If your observation is very keen you will also be able to observe that when your brain combines the rectangles the circles would have not overlapped perfectly and when your brain combines the circles the rectangles would have overlapped at an offset. You cannot combine two objects at different depths at the same time in your brain.

The gray color in the overlapped object is shown just to highlight the partial overlap that takes place for objects at different depths other than what is viewed. Our brain does not average the colors, so you won’t see this gray in the single image that your brain creates; instead it will either be the circle from the left image or the right one. This is called binocular rivalry and I want to have a separate post to explain this concept.

Depth perception through stereo imaging