Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem of using computer vision to compute distances to moving 3D objects in any weather conditions is too general - it is precisely the generalized vision problem I was talking about.

Ultrasonic or radar or lidar sensors are much simpler solutions to the problem of computing distances, as they do not rely on computer vision.

Note that the reason this problem is very hard for computer vision is that there is simply not enough information in a 2D picture to get 3D distance information, even with parallax. Our eyes also can't solve this problem. Instead, our visual system uses numerous heuristics based on our inherent understanding of simple classical physics and the world around us.

For example, we recognize that a particular blob of color represents a car, and that cars are solid objects, where each part of the car moves at the same speed as each other part (this assumption breaks if something flies off the car). We recognize that objects throw shadows, and what effect shadows have on color, so we can very easily tell the contours of an object even if it's speckled in light and shadow. We also know what approximate sizes cars are, and that means we can immediately differentiate a far away car from a close one. We also know that cars sitting on the road are probably moving, while cars on the sidewalk are probably parked, and that again helps us estimate their movement based on relatively little data.

If you don't believe that depth perception is based on far more than parallax, just try to explain how come people/animals with a single eye can still estimate distances to a very good degree (not enough to be marksmen, but more than enough not to run into walls).



> there is simply not enough information in a 2D picture to get 3D distance information, even with parallax. Our eyes also can't solve this problem.

That is certainly false - perhaps with extremely limited amounts of parallax yes but that isn't how this setup is working. Humans certainly use heuristics not only because our brains are capable of it but also because a face only has a few mm of separation, with enough parallax you can get phenomenal 3d information out of a visual system. Replacing a single sensor here is certainly not too general for good performance but also my point again was never to say this was an optimal or perfect solution only to say evolutionary arguments are stupid here because we build machines and algorithms every day that operate better than evolution does.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: