My never-ending quest of throwing the ESP32 at stuff had a new turn recently. This time the target was machine vision, or I guess more precisely streaming a video feed from the ESP32 to OpenCV and sending commands back based on the result of the machine vision algorithm. Instead of performing the CV on the ESP32 (as the CameraWebServer example does) we send the frames to a more performant computational device (my laptop in this case) and PRESTO we have object recognition on the ESP32 (sort of).
First I thought I would have to venture deep in to the underworld of video streaming formats but luckily I managed again to avoid learning since somebody else has already done most of the work. This time that somebody was Kevin Hester aka geeksville. His Micro-RTSP was really easy to get up and running. Extra-cool cause it was designed to work with platformio.
I used this board with the OV2640 camera module. It doesn’t have a USB so you need to hook it up via an USB-UART breakout. I used one which can toggle the GPIO 0 and RST so I don’t have to do it manually. If you don’t need access to too much IO, I would probably recommend the m5stack camera module over this one since it seems to be easier to use.
To get an idea of the delay involved with this kind of approach, I cobbled up a simple color detection timing script in python. The script shows a picture matching the color filter and then counts how long it took for the ESP32 to acknowledge the detection. This highly scientific setup is shown below.
Long story short it takes around 200 ms on average for the color to be recognized which was Much faster than I expected. Here I used websockets for the communication back (i.e. whether an object was recognized and where) but there are plenty of other options for that as well. The connection was bit shoddy though with intermittent lagging and dropping. Seems slightly more reliable when running in AP mode. The source is up on my gitlab.