It doesn't seem to work that well. Once you move off the primary camera axis, like rotate around, you notice that there are many regions with only sparse resolution and there are gaps everywhere. It is totally unusable for anything.
Sure, it solves for the primary view, but this is claiming it is a 3D scene reconstruction/inference technique and in that claim it only sort of works.
What's the use case for putting AI into everything? Pretty much every AI product so far has been and still is subject to hallucinations and inaccuracies and on top of that it's hugely computing intensive. Sure, it's the best we have right now and it allows us to do things that were previously next to impossible with manual programming work, but it's far from being something that's actually viable. And what would be the use case for turning a picture into an approximated 3d mesh that is only really complete from one angle? LIDAR does a stunningly accurate job at that already, reproducibly (although granted that this cannot retroactively be applied to existing photos).
So I agree with you, but to be fair it is neat, and I think academia should be allowed to try things with little to no *immediate* commercial value. Being "neat" is enough IMO if there's enough resources to go around.
In the long run, yeah this *exact* application is sort of pointless. I expected to see the lens parameters factored into the process. It's not. This would mean that everything is not only dimensionally inaccurate since there's no reference measurement, but also proportionally inaccurate to other things in the scene. You can actually see the effect of that on the "flower car" example. (the entire shape of the car is warped) Let alone the fact that the entire scene that can't be seen in the original photo is made up.
Maybe someone would use this to make game assets? But you'd need to fix them up a ton before using them. Other sibling comments make the point that there's no wireframes... so we can assume the polygon count here is insane.
Every single time a new "Generate 3D" thing appears, they never show the wireframes of the objects/scenes up front, always you need to download and inspect things yourself. How is this not standard practice already?
Not displaying the wireframes at all, or even offer sample files so we could at least see it ourselves, just makes it look like you already know that the generated results are unusable...
They usually don't show the material channels either, which I assume is because there aren't any, and instead the lighting is statically baked into the asset. That works for a demo where you just wiggle the camera in a circle, but it'll immediately fall apart if the lighting environment changes or anything in the scene moves.
Apples, not oranges. This isn't for 3d models and scenes- think of it as a fancy version of streetview or an apartment walkthrough. Those project pictures onto a sphere around you. Gaussian splats are an improvement that uses multiple images to interpolate viewpoints- take two pictures from different angles, and guassian splats let you take a look from between those views, or above or below them.
This method uses ai to generate even more unseen structure, so with relatively few images you can still represent a real scene with some level of fidelity. It will never need dynamic lights or animation because the point is just to look as close as possible to a still image. Splats do that FAR better and more efficiently than you ever could with dynamic lighting, triangulated models, and visual effects.
I see your point, but also consider that interactivity comes to mind in part because 3d models are so expensive to describe compared to 2d shapes that they're largely worth it for interactive stuff. We might see more innovation on that front with a low-cost barrier to entry.
Plus, scaling dynamic lighting up has always been the Big Bad of computer graphics, and precomputation will always give us an amazing heuristic to use against it. Everything else basically tends towards not mattering: we can only absorb a finite number of details, but we live in a world with virtually infinite lights.
Honestly, I thought this was the most practical and usable example of AI generation I've seen to date. I actually found it refreshing after all the guff we usually see.
I bet in a couple of years it'll be standard for estate agents to show 3D views like this on their web sites, architects converting quick paintovers of existing sites to 3D models, improvements to Street View, and so on. Anywhere where you want a quick 3D view of a space based on a few photos taken on a smartphone and where accuracy isn't 100% important.
For things like games, it still follows the existing photogrammetry workflow (with all of those problems), but it might reduce the number of photos needed to create a point cloud.
Yeah, but isn't still the expected outcome to end up with actual 3D objects, not point clouds? Or did people start integrating point clouds into their 3D workflows already? Besides for stuff like volumes and alike, I think most of us are still stuck working with polygons for 3D.
Agree, I'm not sure why you'd think that's the only use case for 3D, unless I misunderstand your argument here.
How would you handle visual effects with point-clouds for example? There are so many use cases for proper 3D, and all I can think of as use-case for point clouds are environments with static lightning, which seems like a really small part of what people generally consider "3D scenes".
Maybe I missed the mark on “gamedev”, but 3D is larger than just “aesthetically pleasing 3D VFX” for its own sake
Often I’m trying to use something as a reference for a design where a 3D model isn’t the actual end goal, or I’m performing analytics on a 3D object (say in my case for a lot of GIS and simulation work)
The whole “mesh is the be all and end all of 3D modelling” irks me as while yes it’s a really important way of representing an object (especially with real time constraints), it doesn’t do justice to the full landscape of techniques and uses for 3D
It would be like 2D sprite artists from the gamedev world saying “what’s the point of all this vector art you illustrators are doing” or “what’s the point of all these wireframe designs you graphic designers are doing” - “these aren’t raster images!”
I suppose my snipe was trying to communicate the idea that 3D is larger than just a vehicle for entertainment production. It intersects many industries that may eschew polygons because real time rendering is irrelevant
3D tooling has uses beyond producing 3D scenes, just as Photoshop is used for more than touching up photographs
Edit: for anyone stuck in a rut with meshes come join the dark side with nurbs - it makes you think about modelling in a radically different way (unfortunate side effect is it makes working with meshes feel so so “dirty”)
The whole “mesh is the be all and end all of 3D modelling”
No one said this, it seems like you are making up fake questions and not dealing with the actual questions that the person you replied to asked.
You can view point clouds and you can warp them around, but working with them and tracing rays becomes a different story.
Once you need something as a jumping off point to start working with, point clouds are not going to work out anymore. People use polygons for a reason. They have flexible UVs, they can be traced easily, they can be worked with easily, their data is direct, standard and minimal.
Games are the least of it, the vast majority of scientific applications to do with physics use meshes rather than point clouds.
This is because a point cloud does not represent a surface or a volume until the points are connected to form, well, a surface or a volume.
And physical problems are most often defined over surfaces or volumes. For instance, waves don't propagate over sparse sets of points, but within continuous domains.
However, for applications where geometric accuracy is needed, I think you wouldn't want to use a method based on a minimal number of photographs anyways. For instance, the Lascaux cavern was mapped in 3D a decade ago based on "good old" algorithms (not machine learning) and instruments (more sophisticated than a phone camera). So these critiques are missing the point, in my opinion. These Gaussian Splatting methods are very impressive for the constraints they operate under!
I don't know what you mean by lacking structure, but perhaps you are not aware of all the tools that exist, because fixing surface meshes is a rather classic problem. Just type "surface remeshing" or "surface mesh optimization" on google scholar and you'll see thousands of results.
This is a separate problem from triangulation (turning point clouds into meshes) done with entirely different algorithms. It's likely the software you used for this assumes the user will then turn to other software to improve their surface mesh.
Even for operations that are naturally in sequence, you will often find the software to carry out those steps is separated. For instance turning CAD into a surface mesh is one software, turning a surface mesh into a tetrahedral volume mesh another (if those are hexahedra, then yet another), and then optimizing or adapting those meshes is done by yet another piece of software. And yet these steps are carried out each time an engineer goes from CAD to physical simulation. So it's entirely possible the triangulation software you used does not implement any kind of surface optimization and assumes the user will then find something else to deal with that.
If you wanted to show someone a walkaround of the Sistine chapel or David, would you be better off using triangles and PBR and raycast lighting? You don't really gain anything from all that; you're doing a tremendous amount of computation just to recapture the particular lighting at an exact time. If you want the same detail that a few good pictures capture -tens of millions of pixels- you need to have many billions of triangles onscreen.
With splats you can have incredibly high fidelity with identical lighting and detail built in already. If you want to make a game or a movie, don't use splats. If you want to recreate a static scene from pictures, splats work very well.
splats augment 3D scenes, they don't replace them. i've seen them used for AR/VR, photogrammetry, and high-performance 3D. going from splats to a 3D model would be a downgrade in terms of performance.
Yea, someone can say, “Look, we have just created the first color computer and it displays images. Look at this first ever real life photo on this digital screen!” There will always be the people who ask, “Yeah, but does it run Photoshop?”
Isn't https://svraster.github.io/ just superceding gaussians? Voxels are also not meshes, but might they not prove even more useful for coming rendering engines..?
I think there should be a standard set of images for comparison, because I've never seen a mesh generator readme that wasn't impressive. I test each one I get my hands on and the results are often disappointing.
This is previous work. Bolt3D uses the same principle, of predicting a per-pixel Gaussian splatting representation but it also trains a diffusion model, which is only feasible if you have substantial compute available.
Given that it's a work done at Google I will not expect them to release source code. But it will be reproduced by someone else soon enough.
Impressed with Bolt3D AI model !
- Speed of the 3D generation,
- Accurate 3D mesh deduction.
It's a wonderfull chock.
I agree, this is the way forward:
- "some photos" as imput.
- Convenient, a camera is in every pocket (Smartphone).
On WE, I have been trying for years to generate 3D from photos.My tool now works well, but there is still this big problem of the time it takes to "recreate" the 3D mesh from photos. I remind that photos are in ... 2D.Not convenient.
Here is an example of my Tool's generation : https://free-visit.net/fr/demo01
Here, Bolt3d takes away the 4 hours combersome work into a automatic process. Wahoo !
Sure, it solves for the primary view, but this is claiming it is a 3D scene reconstruction/inference technique and in that claim it only sort of works.
For example: https://i.postimg.cc/43tj36jv/Screenshot-2025-03-20-at-8-52-...