Your brain can actually perceive depth from these two 2D images, if viewed properly. You will need some practice for that. Here’s how, if you are interested in it. To appreciate how our brain creates 3D out of these two 2D images and why we are so keen in copying from it, it’s better you learn and then only proceed.
Thursday, March 29, 2007
Computer Vision (10)
Your brain can actually perceive depth from these two 2D images, if viewed properly. You will need some practice for that. Here’s how, if you are interested in it. To appreciate how our brain creates 3D out of these two 2D images and why we are so keen in copying from it, it’s better you learn and then only proceed.
Tuesday, March 27, 2007
Computer Vision (9)
Saturday, March 24, 2007
Thursday, March 22, 2007
Computer Vision (8)
- Is solving this problem so difficult?
- Why would we want to solve it the way our brain does, isn’t there a better way?
- When a camera auto focus system can estimate the depth using IR, why can’t we use say LASER to get the exact depth?
To explain why we would want to solve it in the same way as our brain does, I would like to quote these lines taken from the introduction section of one of the related papers from MIT. It states,
“The challenge of interacting with humans constrains how our robots appear physically, how they move, how they perceive the world, and how their behaviors are organized. We want it to interact with us as easily as any another human. We want it to do things that are assigned to it with a minimum of our interaction. In other words we can never predict how it is going to react to a stimulus and what decision it is going to take.
For robots and humans to interact meaningfully, it is important that they understand each other enough to be able to shape each other’s behavior. This has several implications. One of the most basic is that robots and humans should have at least some overlapping perceptual abilities. Otherwise, they can have little idea of what the other is sensing and responding to. Vision is one important sensory modality for human interaction, and the one in focus here. We have to endow our robots with visual perception that is human-like in its physical implementation. Similarity of perception requires more than similarity of sensors. Not all sensed stimuli are equally behaviorally relevant. It is important that both human and robot find the same types of stimuli salient in similar conditions. Our robots have a set of perceptual biases based on the human pre-attentive visual system. Computational steps are applied much more selectively, so that behaviorally relevant parts of the visual field can be processed in greater detail.”
I think that this completely justifies the claim made above. For us what is important is how useful it will be for us humans. Take for example, the compression algorithms used in audio and image processing. Audio compression is based on our ability to perceive or reject certain frequencies and intensities. It is compressed such that there won’t be any perceptual difference between the original and compressed data for our system. For a dog it might really play out weird! Image compression also works on the same basic concepts.
As you go on reading my posts you will get to know whether the problem is difficult or not (that is the main reason why I started writing). I can’t answer this question in one or two lines here. Coming to the third question, LASER will always give you an exact depth or distance of an object, but our brain doesn’t work on exactness. Even though your brain perceives depth it doesn’t really measure it. Secondly getting intelligence out of a LASER based system is a tough one. If you use a single ray to measure the depth of your surrounding, what if your LASER is always pointing on an object moving in unison with the LASER? We need a kind of parallel processing mechanism here, like the one that we get from an image. The entire surrounding is captured at one shot and analyzed, which a LASER fails to do. You cannot use multiple LASERs, because in that case, how would you distinguish the received signals from the ones that left out. The ray that leaves the transmitter a particular point need not comeback to the same point (due to deflections). In that case what will be resolution of the transmitters and receivers or how densely should we pack them? What if there was something we wanted to perceive in between this left out space? This is neither the best way to design a recognition system nor a competitor to our brain, so let’s just throw it away.
Assuming that evolution has designed the best system for us, which has been tried and tested continuously for millions of years, we don’t want to think of something else. We have a working model in front of us, so why not replicate it? And this is not something new for us; we have designed planes based on birds, boats based on marine animals, robots based on us and other creatures, etc, etc.
Tuesday, March 20, 2007
Photography with Computer Vision (7)
Monday, March 19, 2007
Computer Vision (6)
This is actually a debatable topic. I tried to find an answer to this by observing it in small babies, but haven’t been successful enough to conclude. Anyway I have some other observations to share. Depth perception does not produce an interrupt in the brain like the way sound, motion or color do. During the initial learning stages it is interrupt that matters because you need to draw the attention of a baby’s brain to observe something, so depth takes a back seat. I term it is an interrupt because it immediately brings your brain into action. In order to achieve this you generally tend to get some colorful toys that make interesting sounds and wafture in front of a baby. So how does it work?
Sound, as you know definitely produces interrupt in your brain, which is why you use an alarm to wake up in the morning. Colorful objects produce high contrast images in your brain which are like step and impulse functions; strong signals that your brain becomes interested in. Now you know what kind of dress to wear to draw the attention of everyone around you!
If you remember awakening a day dreamer by wavering you hand in front of him, you know how motion produces interrupt in your brain. This is actually because of the way our visual processor and retina are designed, which I will come to shortly. So next time you are buying a toy think about these.Secondly why interrupt matters is because the new born baby’s brain is like a formatted hard disk, ready to accept data, but has nothing. When it doesn’t understand anything around it, there is absolutely no meaning in perceiving depth. Whether it perceives or not, it is just going to be a colored patch and nothing else. Again it wouldn’t know which color it is! So interrupts help it to make sense of its surrounding, and when that is done depth and motion help it to segment the objects from one another to form its database.
Thursday, March 15, 2007
Computer Vision (5)
Wednesday, March 14, 2007
Computer Vision (4)
- You can pick up a pen that is lying in front of you at one go. (Vision)
- When someone calls you from your left you immediately turn towards your left instead of searching for the voice all around you. (perception of sound)
- And of course fragrance definitely attracts you towards it. (sense of smell)
Each of these senses is highly developed in the order mentioned. As you can observe in these examples, when you have a pair of sensors they answer the question WHERE? WHERE is the object, WHERE is the sound and WHERE the smell is coming from? You don’t have two tongues because you know that to taste something you have to place it on your tongue and can’t do it wirelessly. WHERE, is something that becomes obvious in this case. The final sense is touch and when it comes to skin there is nothing like one and two and it covers our entire body. But we all know that it is sufficient to touch us at one place to feel it, rater than at two. You have to make a contact to have a sense of touch which eliminates the need to answer the question WHERE?
Just the presence of two senses needn’t always guarantee the answer WHERE, it is their placement that gives it an extra edge. In general there needs to be some common signal that passes through both of the same kind of sense. Light is a high frequency wave and cannot bend along the corners. I mean you can’t light up your room and go outside behind the room wall to read something, while this is not the case with sound or smell. So irrespective of where on your head the two ears or nostrils are placed, common signals will definitely reach them but you can’t place one eye at the front and one behind your face. Light can’t bend so you don’t get any overlap or in other words a common signal in both the eyes. We humans have both our eyes on the front of our face, so it’s very easy to get common signals. Want to experiment? Fix the position of your face and close one of your eyes, say the left one first. Remember the region that your right eye is seeing. Now close your right eye and open the left and compare the two regions. Most of the region that one of your eyes sees will also be seen by the other, which is the common region. The right eye will not be able to see the left most portion of the region seen by the left eye and vice versa. It is in this common region that we perceive depth. How?, I will explain it in detail later, for now you just need to remember that “2 sensors == 3D perception”.
Computer Vision (3)
Monday, March 12, 2007
Computer Vision(2)
When you look at a photograph you almost get to know the depth associated with it due to a lot of monocular cues that your brain uses along with the knowledge gained over the years. I will have a separate post on monocular cues, so wait for that.