Neural Net Depth Perception: Overview

After taking the Machine Learning Coursera with Andrew Ng, I wanted to do another project using neural nets. At the same time, I was excited about an idea using computer vision. These 2 excitements combined when I realized a neural net should be able to create a depth map from stereo images, even if I don't necessarily know how to match features and areas from the left and right images together to measure their displacements.

Neural nets work much like linear regression, generalizing a data set with a "best" fit. Neural nets can take much higher dimensional data as input. They can can match arbitrary best fit "shapes" where higher dimensions mean generalizations to lines, surfaces, solids and hypercubes! But, they can be subject to over fitting the data, and need a large amount of data to prevent this over fitting.

Neural nets have been found to excel in image recognition (convolutional neural nets) and speech recognition (long-short term memory). Neural nets are highly adaptable to stereo images.

Where do you find a large data set for such a neural net to train on? The plan is to input the stereo images and output a depth map of the space, an image where the brightness correlates with the distance from the camera. The data set must contain sets of both the input and the output to train the neural net, and need millions of samples to train well. There are 2 options:

Set up a Microsoft Kinect or specialized LIDAR in conjunction with stereo cameras and take a million photo scans.
Use a video game with realistic lighting, setting up the view with 2 cameras and calculate the depth map from the virtual world

I decided option 2 would be more viable to getting a large data set quickly.

Neural Net Depth Perception

Pages

Thursday, March 31, 2016

Overview

No comments:

Post a Comment