A new technology developed by Qi Pan and other researchers at the University of Cambridge allows one to create 3D models on the fly by manipulating an object in front of a webcam. The reconstruction of the 3D model from the video can be viewed in real-time by the user as he moves and rotates the object. The program is called ProFORMA. Pan says the program will be publicly released soon.
The following video gives an excellent demonstration.
Previous work has allowed reconstruction of 3D models from photos or video; however, such work has been limited to offline processing, e.g. the algorithm takes a complete piece of video and builds a 3D model, so the user cannot adjust the video to take in new perspectives until after the whole 3D model is built. The clear advantage of real-time processing (or online processing) is that the user can see the model being built from the video he is recording. He can take more video of the object in different positions to correct any problems in the 3D model as they arise.
Some examples of offline model reconstruction include Microsoft’s Photosynth, Stanford’s Make3D, and the University of Adelaide’s Video Trace.
How does it work?
The program uses a single camera and commodity hardware. The demonstration in the video was performed with a 2.4 Ghz Intel dual core processor and Logitech Quickcam Pro 9000 (640 x 480 @ 15 fps). The program assumes that you are modeling a single object. It cannot model multiple objects simultaneously.
The video camera must be kept stationary and only the object to be modeled is moved and rotated.
The program can be divided into 5 steps:
- Image capture
- Extraction of point cloud
- Delauney tetrahedralization
- Tetrahedra carving
- Texturing the surface mesh
We will describe each step.
In image capture the goal is to identify the object as separate from the background and the user’s hand and collect enough data to form a point cloud representing the object in 3D space. This is done by sampling many small subwindows of the video (say 5 x 5 pixels), which are called features. Because the camera is not moving, it is easy to determine which subwindows are capturing part of the background because background features will be stationary. The features falling on the user’s hand can be easily identified because the hand moves in and out of the frame and can change shape. The remaining features are clearly part of the object and each is identified as a landmark on the object.
Each landmark specifies a point on the object and taken together they form a point cloud that roughly approximates the shape of the object.
From the point cloud, a process called Delauney tetrahedralization is run, which essentially creates a rough cut of the model that is “too large.” In other words, it has more stuff than necessary as can be seen in step 3 of the diagram above.
The model is so rough that some landmarks that were observed by the camera would now be obscured if indeed the real-life object looked like the model. So, parts of the model are cut away so that each landmark that was seen by the camera can now be seen on the 3D model. This is the tetrahedra carving stage. One of the primary innovations of the program is a new, more accurate tetrahedra carving algorithm. The picture below shows a model after using an older algorithm for tetrahedra carving (left) and a model after using the researchers’ new algorithm for tetrahedra carving (right). The new algorithm creates a smoother, more accurate model.
The resulting model is a close approximation of the real-life object in many cases. This object can then be skinned with the textures captured from the camera to make the model look life-like.
Note that all of these steps are performed in real-time, so that the user can actually view the 3D model as it is reconstructed by the program.
The program was used to reconstruct 3D models of several objects as seen in the picture below.
Reconstruction of the church took 75 s and reconstruction of the box took 61 s. Most models took about a minute to build including time for video capture.
In addition to the limitations noted above, that the camera must be stationary and only one object can be modeled at a time, there are a few other constraints. The program works the best on objects that are highly textured. This is because it uses features to identify landmarks on the object, and in the absence of texture, all features tend to look alike. Second of all, the program generally assumes that at the start of the video the object will be in the center of the frame, though it can later be moved around freely.