I made an open source project – STM32 for stereovision tasks

fxtentacle · on Feb 14, 2022

When I experimented with using an STM32H7 for accessing sensors, it was just nowhere near powerful enough for handling real-time 720p @ 30fps on 2 cameras. Also, good luck getting USB3 out of that CPU...

Here's a similar commercial product that - in my opinion - also failed badly: https://openmv.io/collections/cams/products/openmv-cam-h7-r2 $80 uses a STM32H743VI CPU and "Most simple algorithms will run between 25-50 FPS on QVGA (320x240) resolutions and below"

I predict that his will be vaporware when the authors switch over to using FPGAs like everyone else who tried (and failed to) use cheaper alternatives. Also, once you ramp up your numbers, producing FPGA-based 4k @ 60fps USB3 cameras for $50 per piece becomes possible. So there is in my opinion no good reason - neither technolgy-wise nor price-wise - to use the STM32 here. But welcome to the club, I made the same mistake ;)

adolph · on Feb 14, 2022

Since you write with an authoritative voice, I ask:

  * I've most often heard of the STM32 series as microcontrollers instead of microprocessors and never as CPUs. What is your reason for referring to them as a CPU?

  * What is wrong with "25-50 FPS on QVGA (320x240) resolutions and below" for low cost use cases, like a step up or longer range than an ultrasonic obstacle sensor?

  * What is the advantage of an FPGA over a microcontroller or microprocessor of similar cost? Are there disadvantages in terms of additional technical debt to programming a FPGA instead of a higher level controller or processor API?

fxtentacle · on Feb 15, 2022

Dear Adolph,

shouldn't you be the one speaking with an authoritative voice? ;)

I'd say "microcontroller" and "CPU" are both designations of their purpose, whereas "microprocessor" is a designation of technology. So yes, usually the STM32 are used as hardware/sensor controllers alongside with other microprocessors and then it makes sense to call them "microcontroller". But in this specific hardware design discussed here, the STM32 acts as the central processor coordinating everything. That's why I called it CPU to highlight this central function.

"25-50 FPS on QVGA (320x240)" is just too little performance for most practical applications. Also, $80 is very far from "low cost use case". That's why I pointed out that for the same price, one can have much more performance using FPGA. Who wouldn't want 100x faster at the same price?

The big advantage of an FPGA is that you can drive its clock from an external source. If you hook a CPU up to a sensor, it needs to burn CPU cycles (or have dedicated FPGA-like hardware) to read bits from the port when the sensor sends them. With an FPGA, you can take the sensor's clock and use that to drive your circuit.

But much simpler than that, CPUs are built to handle a lot of different tasks at an OK price. FPGAs are more like GPUs. They do signal processing very well, but fail badly in other areas. Plus CPUs can be programmed easily in a variety of languages, while using an FPGA well requires obscure tools and knowledge.

CPUs are Easy+Generic+Slow. FPGAs are Difficult+Specialized+Fast.

adolph · on Feb 15, 2022

Thanks for the reply. I found the below page as a relatively updated info source. It is interesting to see your note about FPGAs being more akin to GPU and then see many development boards pair a FPGA with an Arm processor.

https://www.joelw.id.au/FPGA/CheapFPGADevelopmentBoards

mediaman · on Feb 14, 2022

Openmv is actually pretty great. I agree if you need 4k at 60fps, or stereoscopic, it’s not for you, but there are lots of industrial tasks that require nothing like that. I’m using it now and it’s great to program the thing in micropython and do blob tracking, fiduciary marker recognition, etc with it at 20-30fps or so, which is all I need.

I can see how you’d think it’s a commercial failure if you approach it with an incompatible use case.

dannyw · on Feb 14, 2022

I speculate OP isn't trying to build a commercial product, but rather tinkering with STM32s for this task because they can. Sometimes it's fun to write on very constrained hardware.

cinntaile · on Feb 14, 2022

Can you give some example applications where you would want robot stereovision? Does this help us add depth in images?

froh · on Feb 14, 2022

Stereovision is the lidar-less lidar, yes. After calibration you can tell the distance of objects in your surroundings.

sircastor · on Feb 14, 2022

I was just contemplating something like this today as a means of SLAM for a VR headset.

Can the chip read both streams simultaneously, or does it need to multiplex between them?

joshvm · on Feb 14, 2022

Hard to say as the firmware isn't in the repo as far as I could see. Most of the ARM camera interfaces are 14 bit parallel buses so it probably has one input. The cameras also output 10 bit parallel so I imagine it multiplexes somehow. You can capture at 60 fps so provided you hardware trigger the sensors you could read out at 30 and still be OK.

If you want something that'll do this, Cypress/Infineon makes a USB3 chip specifically for camera control. I think people have also modified the FX3 for multiple cameras. Omni also have an asic I think, but good luck if you're not selling a billion units.

https://www.infineon.com/cms/en/product/universal-serial-bus...

zh3 · on Feb 14, 2022

It's actually very possible to synchronise any number of Pi cameras, to sub-millisecond precision if required. We do this for a machine vision task, measuring the angle of rotating disks at a very precise time (and use an IR flash which is also precisely timed, which avoids any issues with the rolling shutter).

joshvm · on Feb 15, 2022

Yeah there are a few "bullet time" demos for clusters of Pis, very cool. Does your application use network based synchronisation, or do you hardware trigger? Whenever I've built computer vision systems it's simple just to hardware trigger everything (no freerunning) and then you just readout the last frame. Almost all image sensors support external trigger inputs, but you need the breakout board to expose that pin (basically all OEM machine vision cameras provide trigger inputs as standard, but they cost more than a Pi+PiCam!).

At IceCube we have 5k sensors synchronised to within a few nanoseconds using GPS fanout and some clever clock distribution algorithms [0]. Cool stuff.

In theory if you use a bright enough flash, the absolute synchronisation of the cameras is less important. For example very high speed photography tends to rely on hardware flash synchronisation with the event. In this case you assume that background illumination is low enough that a many-ms exposure will be black and all your light comes from flash pulse which can easily be sub-ms. Provided the event falls within the camera exposure window, you don't care if there's some uncertainty. You can do (1) camera trigger, start exposure (2) delay to compensate for maximum expected jitter (3) flash triggers (4) brief delay and then stop exposure.

[0] https://www.phys.hawaii.edu/~idlab/taskAndSchedule/ARA/0810....

zh3 · on Feb 15, 2022

That's pretty much exactly what we do, we get the cameras into sync using the network then fire a precisely synchronised (us accuracy in our case, so 1000 times worse than yours :) ) a 10 to 150us IR flash to capture the specific details we're after (in our case using retroreflective targets). Main claim to trickery for us is we're worked out a way to sync the Pi cameras in software (i.e. no additional hardware needed).

Very cool to get a reply from someone working in your area, I've followed AMANDA etc in Science and Nature for many years - thanks for the comment!

lnsru · on Feb 14, 2022

Look at the schematics. The camera pins on microprocessor is being multiplexed between both camera connectors.

With 1 Mbyte of SRAM and 2 VGA sized pictures inside it will be lots of fun calculating depth map with this microprocessor.

michaelt · on Feb 14, 2022

There are Intel Realsense cameras, available off-the-shelf today, which provide stereo vision complete with on-device processing, IR texture projector, and a built-in accelerometer.

If you want to play around with stereo vision and figure out the feasibility of your idea, it could be a good place to start.

geeB · on Feb 14, 2022

They look interesting, but they are being discontinued with a few other product lines as intel refocuses their efforts. Do you know of other alternatives?

dendrite9 · on Feb 15, 2022

There was this discussion a while back when the news about discontinuing came out. It might be a good place to start: https://news.ycombinator.com/item?id=28225065

immmmmm · on Feb 14, 2022

an STM32 might be a little on the shy side for realtime stereo reconstruction. the OpenCV algorithms i use take quite a bit of time even to process low resolutions images (don't have the number in mind). there's quite a bit of image processing involved to correct for lens distortion and the block matching algos are pretty heavy.

even if camera are multiplex, they might be triggered at the same time (which is essential for stereo).

donquichotte · on Feb 14, 2022

Interesting. They are planning to use an OS called "EMBOX" [1] which has the main focus of "[making it possible to use] Linux software everywhere including MCUs".

I wonder how that is going to work the restricted memory and no virtual memory (making e.g. `fork` and dynamic memory allocation unsuitable).

[1] https://www.embox.rocks/

bfrog · on Feb 14, 2022

Maybe, maybe you could do this on an imxrt with 1hgz m7 and another core dedicated to usb or something, but yeah to do anything vision wise… I would think even a low end fpga could do so much better

ComputerCat · on Feb 16, 2022

Have you looked into stereo cameras already in the market? Labforge has some cool 4K options that can also be run at lower resolutions and faster speeds.