

In spite of expending sleepless nights giving deep thoughts on what could be the technique behind our brain solving the problem of depth perception, my brain only gave me a drowsier day ahead. So I started to filter out the possibilities to narrow down to the solution. The question I asked to myself was; is our brain using knowledge to correspond the left and the right images, or is it something that happens more mechanically? I had tried out a lot of knowledge based approaches, but only in vain and even the discussion that we had in the earlier post concluded to nothing. I wanted to take a different route by thinking of a more mechanical and less of a knowledge based approach. My brain then pointed me to the age old theory proposed by Thomas Young to explain the wave nature (interference) of light, “The double Slit Experiment”. How could this be of use to solve a seemingly unrelated problem of depth perception? On comparing you will find a few things in common between the two setups. Both are trying to deal with light and both of them are trying to pass the surrounding light through two openings and combine them later. I excitedly thought, have I unlocked the puzzle?
Let’s analyze and understand it better to know if I really did! I am neither an expert in physics nor biology, so I can only build a wrapper around this concept and not verify its complete path.
λ is the wavelength of the light
s is the separation of the (slits/eyes)
x is the distance between the bands of light (also called fringe distance)
D is the distance from the (slits to the screen/eye and retina)
If a light source is placed at 3 different locations equidistant from the center of the slits, the one at red would produce a symmetric pattern and the other two I guess would not. I have not experimented this and hence the letters NX (Not eXperimented). If my guess is right, a light source placed anywhere in the 3D space would produce a unique pattern on the screen!!! This means an analysis of this pattern would tell us the exact location of the source in the 3D space.
Chak De
In the first image of the sequence, the focus point was moved just behind the LED and we see a similar image as when the focus point was placed between the matchstick and the LED. But now the rays have actually crisscrossed which is not observed here since the cone is symmetric. To demonstrate the crisscross nature, I placed an opaque object and covered the left half of the lens, which made the right semicircle of the circular projection of the cone, disappear! To come back to our proper cone I moved the focus point back to the matchstick and did the same experiment. Now covering the left portion of the lens masks the left semicircle of the LED! This means there no crisscross!
The intensity and frequency of the reflected light from these various points can be different and hence get summed up at a point on the retina. This scenario can happen for every pixel on the sensor and hence the image that you will get will just be the summation of the intensities and frequencies of the rays coming out from various points around you. As a result of this you will always end up with a uniform patch of light on the sensor if you try to take an image without a lens.
If you didn’t have a lens in your eyes, you would only be able to know the amount of light present in the surrounding and not the objects present in front of you. The various objects wouldn’t be distinguishable at all.
A camera takes the horizontal projection of objects (horizontal line pointing towards you), and so the distance between the horizontal bars along this direction cannot be shown in this 2D image. The vertical distance between them is ‘v’. In other words, if we try to capture this 3D setup in a stereo image pair to get the horizontal depth between the bars you will end up with exactly the same image in the left and right. There is no use seeing it stereoscopically, because, which point in the two images will the brain correspond? Since the camera is moved horizontally, the vertical distance between the two bars remains the same in the stereo image.
The red lines are traced when the eyes combine the rectangle and the green lines when they combine the circle. The point of intersection of the red lines gives the 3D location of the rectangle and the green lines that of the circle. As mentioned earlier the circle is in front of the rectangle when cross viewed. One of the points for the formation of the triangle comes from the point of intersection of either the red or the green lines and the other two points are the two eyes. The distance of the point of intersection of lines from the two eyes (d), depends on the separation between the images, so the absolute distance of the objects remains unknown in the stereo image pair. The relative depth of different objects from one another is obtained by corresponding objects form the two images, which moves the point of intersection of the lines according to the 3D placement of the objects (similar to red and green lines).