A second look at Polycam, in a conversation with company cofounder Elliott Spelman.
Polycam, the most popular scanning app on the Apple store—and perhaps in the world—has found an ingenious way to use AI in its scanning. Polycam users will be able to create a totally immersive view with the ease of creating a panorama. The normal process for getting an immersive view involves moving the camera up and down while rotating, what another iPhone-using scanner refers to as a “painting the wall” approach. But Polycam is able to take the panorama and guess the rest, as it were.
You can get a full 360-degree view of an outdoor scene in less than a minute.
Polycam uses a process it has named “stable diffusion complete,” which is based on “stable diffusion,” the technology similar to one used by media sensation, DALL-e, to generate fanciful pictures, except stable diffusion uses your computer rather than the cloud. To use stable diffusion, users must have a GPU and 8GB of VRAM. While DALL-E and stable diffusion generate images, inpaints and outpaints (more on that later), Stable Diffusion Complete appears to generate a picture based on the image itself, essentially guessing what the rest of the sky or ground would look like, and turns what is normally a low-resolution thin strip of images stitched together and puts you in the center of a full resolution, 360-degree view about the vertical axis (that’s the yaw axis if you are flying a plane), where you can also have a full 180-degree up and down view (the pitch axis).
Meeting Elliott
If Elliott Spelman, cofounder of Polycam, doesn’t strike you as the typical Silicon Valley startup founder, it’s because he is not. Spelman is a designer by education with a master’s in fine arts from Stanford. Being at the helm of a cutting-edge imaging technology is fun for Spelman.
Spelman wants scanning to be fun for everyone—not just professionals with expensive LiDAR rigs. We watch Spelman’s short instructional video that demonstrates how simple it is to take a full 360-degree image.
Sure enough, that is all that is needed to record a full 360-degree immersive image. Polycam may have ruined panoramas forever. After seeing the big picture that Polycam can create, a breathtaking sky-to-ground and all-around seamless (almost) shot, could you ever go back to a thin, distorted, low-resolution image you got from Apple?
How?
Polycam creates this magic with AI, explains Spelman in a Teams meeting. We expect no less from a company with AI in its domain name.
Although several 360 imagers have the ability to create a high-resolution immersive view (e.g., Matterport, Envisioneer, Luxolis), all of them require that your camera view includes the sky and the ground.
I take it out of the office for a spin. Polycam uses the full resolution of the iPhone’s camera and its image stabilization to take a series of pictures on its own as I rotate in place. The images are uploaded and Polycam returns with what at first looks like any other picture taken with an iPhone. And then you press on the image with your finger, and lo and behold, it’s a full 360-degree, immersive image. Polycam has added a fake sky and ground. It’s pretty good fakery. You have to squint to see where reality stops and fakery starts. The ground below is an excellent guess at what would be underfoot—without me having to extricate myself from the picture.
The stitching together of images in the 360-degree sweep is not perfect—though, in all fairness, that may be Apple’s imperfection. It’s a rare overcast May day in the Bay Area and the sky looks darker suddenly from left to right in one place. But overall, the result is impressive and the process effortless. The scan took less than a minute. I didn’t have to wait more than a minute for the processing. Studying the image, I see that tilt has been magically added to the pan. The sky and ground were completely filled in.
“It’s called inpainting,” says Spelman.
Outpainting would be more appropriate, but okay.
How well does it work inside? I ask.
The AI has its limitations, Spelman admits. It can create ceilings, but it may have trouble in the Sistine Chapel.
Should you, though? There are only a million pictures of the Sistine Chapel’s ceiling, snapshots shared on Instagram, for example. In fact, Autodesk was able to create 3D models of the statues destroyed by the Taliban from publicly available photos on Google. Surely, your AI could use photos similarly?
Spelman tolerates impertinent questions graciously. He could have reminded me that Autodesk has over 10 thousand employees, whereas Polycam has but 15. It is another sign that he is not your typical hyper, type A startup founder.
But the lightweight Polycam punches above its weight. Its reality capture is easier to use than Autodesk’s ReCap Pro. Instead of clicking away and taking photos, Polycam takes pictures continuously as needed all on its own. And it generates better images faster than the “pop-up” metaverse seen at PTC’s recently held LiveWorx conference.
Better Than LiDAR?
While having the LiDAR scanner in your pocket, as with the latest iPhones, is initially exciting, the lack of accuracy is somewhat of a detriment. Also, the images from the camera look better.
Responding to a less-than-favorable review of LiDAR scanning in our first look at Polycam, Spelman has a quick solution.
“You should have used the photo mode.”
It’s unexpected that LiDAR is less accurate than photogrammetry, but here we are, with the result of the iPhone, even the biggest one being too small for multiple, accurate LiDAR sensors, leaving the iPhone with plus or minus one-inch accuracy. Professional LiDAR systems, on the other hand, command mm-level accuracy.
Why not have professional quality LiDAR on Apple’s much larger iPad, I have to wonder.
About Polycam
Spelman and Polycam’s other founder, the equally boyish-looking Chris Heimrich, left a scanning company to create an app for simple, easy-to-use scanning. They have taken a page out of the Autodesk playbook, essentially democratizing a sophisticated process available to the elite. Anyone can take their smartphone and create a 3D picture with Polycam. And have they ever. The Polycam site (“3D capture for anyone”) gives them all gallery space.
Yet, Spelman insists that Polycam is useful for the professional. It outputs its LiDAR and photogrammetry data to standard formats—for example, DXF for AutoCAD.
It was the iPhone’s ability to store the images that made all the difference. Otherwise, there is too much lag in the shooting if you have to wait for each photo to upload.
Also, unlike every other startup, Polycam is now profitable. Polycam’s revenue model is simple, $79.99 a year for unlimited use. There is a free version, which has usage and export format limits.
The free intro. version encourages engagement, and the all-you-can-eat price promotes continued use. Other pricing schemes I have seen, such as price per square foot, discourage continued use.