Showing posts with label computer vision. Show all posts
Showing posts with label computer vision. Show all posts

Wednesday, March 14, 2007

Computer Vision (4)

We humans have 5 different kinds of senses; touch, smell, sight, hearing and taste (correct me if I missed out something). We have one tongue, two eyes, two ears and two nostrils and of course skin for the sense of touch (skin is a special case I will come to it later). Ever wondered why we don’t have two tongues? Does this number two or one make any sense to our senses? Let me illustrate their significance with some examples.
  1. You can pick up a pen that is lying in front of you at one go. (Vision)
  2. When someone calls you from your left you immediately turn towards your left instead of searching for the voice all around you. (perception of sound)
  3. And of course fragrance definitely attracts you towards it. (sense of smell)

Each of these senses is highly developed in the order mentioned. As you can observe in these examples, when you have a pair of sensors they answer the question WHERE? WHERE is the object, WHERE is the sound and WHERE the smell is coming from? You don’t have two tongues because you know that to taste something you have to place it on your tongue and can’t do it wirelessly. WHERE, is something that becomes obvious in this case. The final sense is touch and when it comes to skin there is nothing like one and two and it covers our entire body. But we all know that it is sufficient to touch us at one place to feel it, rater than at two. You have to make a contact to have a sense of touch which eliminates the need to answer the question WHERE?

Just the presence of two senses needn’t always guarantee the answer WHERE, it is their placement that gives it an extra edge. In general there needs to be some common signal that passes through both of the same kind of sense. Light is a high frequency wave and cannot bend along the corners. I mean you can’t light up your room and go outside behind the room wall to read something, while this is not the case with sound or smell. So irrespective of where on your head the two ears or nostrils are placed, common signals will definitely reach them but you can’t place one eye at the front and one behind your face. Light can’t bend so you don’t get any overlap or in other words a common signal in both the eyes. We humans have both our eyes on the front of our face, so it’s very easy to get common signals. Want to experiment? Fix the position of your face and close one of your eyes, say the left one first. Remember the region that your right eye is seeing. Now close your right eye and open the left and compare the two regions. Most of the region that one of your eyes sees will also be seen by the other, which is the common region. The right eye will not be able to see the left most portion of the region seen by the left eye and vice versa. It is in this common region that we perceive depth. How?, I will explain it in detail later, for now you just need to remember that “2 sensors == 3D perception”.

Computer Vision (3)

One statement that I usually get to hear from people is: “I understand the significance of depth on our perception of the surrounding, but a photograph as you said is a 2D image and my brain still manages to extract all the information from it. So should robotic vision depend so much on depth? Why can’t we do away with it? Also take movies for example, which are a sequence of 2D images. You can actually feel depth in them, don’t you? I still don’t understand why we need 3D?”
We understand a 2D image completely because of our previous knowledge and not necessarily due to the image processing happening in our brain. Previous knowledge does not mean that we should have seen the exact image before, it means that we are aware of the context and content. Like in image processing, we don’t understand the image after segmentation; both of them go hand in hand in our brain. What kind of segmentation our brain uses is still not very clear, but I can demonstrate how knowledge rules over the kind of segmentation that we can perform on an image. Look at the image below for example. What do you think the image contains? I am sure, 100% of the people would say “a man and a woman”. You are totally wrong; the artist had actually drawn jumping dolphins on a pot! Now that you know the content of the image (knowledge) you can easily extract the dolphins out.



You feel the perception of 3D in a cinema due to motion; a cinema is a motion picture! Motion can be obtained in two ways; one by keeping the camera static and having motion in the subject or bring about motion in the camera itself, irrespective of the subject. What our brain does using two eyes could have been done with a single eye, by oscillating it left and right to get the two images that it needs. The only difference would be that the images would not be from the same instance of time. From the time we start learning about our surroundings it is 3D vision that helps us segment the objects around us and put it in our database. Once we have gained sufficient knowledge about our surrounding we do not need 3D to perceive them, which is why we understand a 2D photograph without any problems.
I will be dealing with these topics in detail later on under illusion and 3D perception through motion. I have only been introducing you, to all of them now.

Monday, March 12, 2007

Computer Vision(2)

First of all why are we so fascinated about our ability to perceive depth, or for a layman what does all this mean? After having vision (eyes) for so many years imagine a world without it. Frightening, right? Imagine having sight in just one eye. Most of them will be okay with it and some even ask me, what difference does it make? Now this is really frightening to us; computer vision researchers. We have been chasing this problem since so many decades, many researchers have even spent their entire life in vain trying to decode it and here we have some people who do not know its significance in spite of using it. No problem, what’s this article for, then? There are two major things involved in vision; sight and depth. Many of them fail to distinguish between the two. Sight is the perception of light, and depth is the perception of the space around you. “An experience is worth reading 1000 pages”, so better try it out yourself. Right from the time you get up in the morning spend the entire day closing one of your eyes. Observe if you can live life as easily as you could with two eyes open. (Disclaimer: I own no responsibilities for any accidents that might happen as a result of performing this experiment). But to get a feel of what is driving so many people in pouring so much effort for giving a machine the perception of depth, you got to try it out. Do not read my other posts till you have got at least something from this activity. One experiment that I don’t want you to miss out is here: Hang a rope, a wire, stick, anything from a point such that there is space all around it. Get your fingers ready in the wire grasping position and move your hand towards the wire in the direction perpendicular to it to and grasp it. Remember to close one eye! If you get it right believe me, you are the luckiest person. If not, you would definitely want to know the magic that your brain is doing with two images. That is exactly what all our research concentrates on. Also try judging the depth between two objects placed at different depths with just one eye open. Try experimenting on as many objects as possible. It is impossible for you to know the distance between two objects without opening two eyes, except from monocular cues (I will come to this later). If you think about it carefully, there is nothing new I am talking of. When I say one eye, it is equivalent to taking an image from a camera. In a camera image the 3D surrounding is projected on to a 2D surface. From just this projection it is impossible to know at what depth the object was originally. Take a look at the image below. Square and circle are two objects in front of the sensor. Assume they are initially placed at (circle) 10m and (square) 15m. Their projection on the sensor would be as shown at the right. Try placing the circle anywhere along its line and also the square along its straight line. Do you see any difference in the place where they are projected? Not at all, you get the same image irrespective of where the two are relative to each other along their respective lines. Some people argue with me saying you should definitely be able to observe the change in the size of the object on the sensor as it moves far away from the sensor, so in some way you know whether the object is far or near. I totally agree, but what difference does it make? Who knows the size of the objects? I just have its projection with me at a particular instance of time and nothing else. When I move the object closer to the sensor, the size of the object definitely increases, but here we are talking about depth between two objects, which our brain accomplishes with two images. Even if the size of the object changes as you move it away or towards the sensor how does it give you the absolute depth of the object? We can always solve for two distances and sizes of objects such that one is big and far from the sensor and the other small and closer to it, both giving the same projection. Looking at the sensor you never know where the objects were because you don’t know their actual sizes!

When you look at a photograph you almost get to know the depth associated with it due to a lot of monocular cues that your brain uses along with the knowledge gained over the years. I will have a separate post on monocular cues, so wait for that.