Are we talking about Tesla's cameras or the "best" cameras? There are smartphone cameras that do depth sensing and HDR, and cameras are cheaper than eyeballs so composing them to get more angular resolution seems OK.
ToF/structured illumination cameras are honestly not that capable.
The maximum dynamic range of the eye is ~130dB. It's very difficult to push an imaging system to work well at the dark end of what the eye will do with any decent frame rate.
It's not as different as it used to be, but even so: the Mk. I eyeball does pretty damn well compared to quite fancy cameras.
> There are smartphone cameras that do depth sensing and HDR
Depth sensing is again, estimated or using time of flight sensors which is pretty much short-range lidar. HDR is used already in AV perception, but still loses to your eyeballs in dynamic range and processing time.
Eyeballs have high dynamic range but with high mode switching times. Walk from a bright area to a dark area and it'll take seconds for your eyes to adjust. Cameras are so cheap you can just have a regular day camera and a dedicated night vision camera together, switching between feeds can be done in milliseconds.
Robots aren't humans. You need accurate depth perception to maneuver a robot precisely, and you need ground truth depth measurements to train learned depth perceivers as well as to understand their overall performance. Humans learn it by combining their other senses and integrating over very long time using very powerful compute hardware (brain). To date, robots learn it best when you just get the raw supervision signal directly using LiDAR.
> Walk from a bright area to a dark area and it'll take seconds for your eyes to adjust
You do realize cameras have the same issue, and that HDR isn't free / is very computationally intensive?