- Is solving this problem so difficult?
- Why would we want to solve it the way our brain does, isn’t there a better way?
- When a camera auto focus system can estimate the depth using IR, why can’t we use say LASER to get the exact depth?
To explain why we would want to solve it in the same way as our brain does, I would like to quote these lines taken from the introduction section of one of the related papers from MIT. It states,
“The challenge of interacting with humans constrains how our robots appear physically, how they move, how they perceive the world, and how their behaviors are organized. We want it to interact with us as easily as any another human. We want it to do things that are assigned to it with a minimum of our interaction. In other words we can never predict how it is going to react to a stimulus and what decision it is going to take.
For robots and humans to interact meaningfully, it is important that they understand each other enough to be able to shape each other’s behavior. This has several implications. One of the most basic is that robots and humans should have at least some overlapping perceptual abilities. Otherwise, they can have little idea of what the other is sensing and responding to. Vision is one important sensory modality for human interaction, and the one in focus here. We have to endow our robots with visual perception that is human-like in its physical implementation. Similarity of perception requires more than similarity of sensors. Not all sensed stimuli are equally behaviorally relevant. It is important that both human and robot find the same types of stimuli salient in similar conditions. Our robots have a set of perceptual biases based on the human pre-attentive visual system. Computational steps are applied much more selectively, so that behaviorally relevant parts of the visual field can be processed in greater detail.”
I think that this completely justifies the claim made above. For us what is important is how useful it will be for us humans. Take for example, the compression algorithms used in audio and image processing. Audio compression is based on our ability to perceive or reject certain frequencies and intensities. It is compressed such that there won’t be any perceptual difference between the original and compressed data for our system. For a dog it might really play out weird! Image compression also works on the same basic concepts.
As you go on reading my posts you will get to know whether the problem is difficult or not (that is the main reason why I started writing). I can’t answer this question in one or two lines here. Coming to the third question, LASER will always give you an exact depth or distance of an object, but our brain doesn’t work on exactness. Even though your brain perceives depth it doesn’t really measure it. Secondly getting intelligence out of a LASER based system is a tough one. If you use a single ray to measure the depth of your surrounding, what if your LASER is always pointing on an object moving in unison with the LASER? We need a kind of parallel processing mechanism here, like the one that we get from an image. The entire surrounding is captured at one shot and analyzed, which a LASER fails to do. You cannot use multiple LASERs, because in that case, how would you distinguish the received signals from the ones that left out. The ray that leaves the transmitter a particular point need not comeback to the same point (due to deflections). In that case what will be resolution of the transmitters and receivers or how densely should we pack them? What if there was something we wanted to perceive in between this left out space? This is neither the best way to design a recognition system nor a competitor to our brain, so let’s just throw it away.
Assuming that evolution has designed the best system for us, which has been tried and tested continuously for millions of years, we don’t want to think of something else. We have a working model in front of us, so why not replicate it? And this is not something new for us; we have designed planes based on birds, boats based on marine animals, robots based on us and other creatures, etc, etc.
2 comments:
puneet...i was reading through computer vision...one strange thing that i noticed after reading is that finally at the end of it;it is leading no where!!i mean in wat way did u make the reader understand wat is meant by computer vision?
OK let me know what you were expecting, if it was the different techniques used in computer vision problem solving, I will come to it later. My heading of the blog (not the title) says "Depth perception through stereo imaging", which means using two images. I want the reader to first understand the need for two images for perceiving depth, because a common man feels he is able to understand a 2D image itself completely, I mean looking at the photograph or a 2D movie.
Post a Comment