New AI method generates 3D point clouds from 2D images

Proximity Attention Point Rendering (PAPR) uses novel machine learning technique to create fully editable 3D shapes.

Taking a 2D image and transforming it into a 3D dimensional object is such a fundamental part of engineering that it goes all the way back to da Vinci’s famous drawings. We’ve come a long way since then, from photogrammetry to computer aided design (CAD). The latest innovation comes from a team of AI researchers at Simon Fraser University.

Their technique, called Proximity Attention Point Rendering (PAPR), can take a set of images of an object captured from different angles, along with information about the corresponding camera poses, and use them to create a point-based surface representation and rendering pipeline from scratch.

What sets PAPR apart from other recent approaches to 3D reconstruction is its ability to create shapes that can easily edited. One earlier alternative, neural radiance fields (NeRFs), is too cumbersome because it requires users to provide a description of what happens to every continuous coordinate with every edit. A more recent approach, 3D Gaussian splatting (3DGS) generates shapes that can lose cohesion when edited. This is because the so-called splatting technique distributes the influence of each point over its neighbours, which helps with surface resolution but also makes local adjustments difficult.

In contrast, PAPR represents each point in a 3D cloud as a control point in a continuous interpolator. As a result, when a single point is moved, the rest of the shape changes automatically in an intuitive way. Mathematically defining an interpolator that was up to the task way the researchers’ principal challenge but machine learning (ML) provided a solution in the form of proximity attention, a mechanism that helps ML models capture local information and contextual dependencies more effectively.

The remarkable success of machine learning in areas like computer vision and natural language is inspiring researchers to investigate how traditional 3D graphics pipelines can be re-engineered with the same deep learning-based building blocks that were responsible for the runaway AI success stories of late,” said Ke Li, assistant professor of computer science at Simon Fraser and senior author of the paper, in a press release.

“It turns out that doing so successfully is a lot harder than we anticipated and requires overcoming several technical challenges. What excites me the most is the many possibilities this brings for consumer technology – 3D may become as common a medium for visual communication and expression as 2D is today.”

Li may be excited about the consumer applications, but the engineering possibilities are much more tantalizing. Imagine being able to create a CAD model of an out-production-part from a few photos snapped on your smartphone. Given the interactions between points, it might even be possible to train PAPR models to recognize entire assemblies and how they function.

Obviously, there’s more work to be done. But with the speed at which AI is advancing, it wouldn’t be surprising if this tool becomes available to engineers sooner than you might expect.