Using RP2040 PIO to drive a poorly-designed displa

m463 · on April 10, 2023

"Using RP2040 PIO to drive a poorly-designed displa"

need a new modeline, the "y" fell off the back porch.

nyanpasu64 · on April 10, 2023

video nerd voice But the beginning of the blanking interval is the front porch...

sgtnoodle · on April 10, 2023

It's neat that the author was able to completely offload the work to the PIO. I'm not sure I really understand what's so poor about this display's design, though? It sounds like it's just a framebuffer on a SPI bus. Even if you were stuck with a more limited DMA peripheral on a different MCU, you could poll the touch screen and interrupt on every scan line and do 30fps with fewer than 10khz interrupts, and have plenty of cpu cycles left over.

dmitrygr · on April 10, 2023

The fact that touch and display are on the same SPI bus making it impossible to just set up a repeating DMA to the display. On a non-RP2040 it wouldn’t be possible to drive it easily.

Most DMA units wouldn’t allow you to feed a scan line at a time and then poll touch. They aren’t that flexible. You’d be stuck with a lot of IRQs

sgtnoodle · on April 10, 2023

The last time I worked with DMA was on an Atmel cortex-M7. Definitely a higher end MCU, though. It supported arbitrarily complex DMA chaining via linked lists in RAM. That being said, doing ~10Khz or so IRQs just to kick off a DMA transfer wouldn't have been much CPU in the grand scheme of things.

For the touch input, is it necessary to truly sample at a high rate continuously, or is it more a need to oversample and take an average? If it's a need for oversampling, you could sample the touch screen between frames rather than interleaved between scanlines. Likewise, you could read from the SD card between frames as well. Writing to an SD card would be pretty terrible, given that cards often block for hundreds of milliseconds.

Once again, awesome work with the PIO. I imagine you're running the display at its theoretical update limit unless you try overclocking?

dmitrygr · on April 10, 2023

Between frames would be too rare for touch. You really do not want touch at just 45Hz. More samples are for averaging for smoothing. The touch controller here is noisy and the board layout waveshare did … does not follow the data sheet recommendations for lowering noise.

Sd between refreshes is a poor idea for the reason you mention, yes.

And yes screen interface is run as fast as it can as per data sheet. Overclocking the MCU wont help as we are not cycle bound there. Overclocking the SPI bus will work but not far. Past 70MHz it gets glitchy.

sgtnoodle · on April 10, 2023

I specifically meant rapidly sampling the touch screen N times between frames rather than 1 time between frames. Depending on the characteristics of the EMI, it seems like that could work just as well.

dmitrygr · on April 10, 2023

But in terms of temporal resolution you still have 45Hz which is too low for good inking.

crest · on April 10, 2023

You could use the DMA engine to reconfigure the function routing using a pair of DMA channels.

dmitrygr · on April 10, 2023

Sadly in many chips this is not possible. Eg STM32H7 has 3 types of DMA units and each has a complex table of where they can read and write from. Two kinds of DMA cannot write to anywhere that contains a DMA controller.

Rp2040 is rather unique in how powerful its DMA units are due to their reach. But it also has faults. Its DMA units seemingly cannot access GPIO, since that is bolted directly to the core (via the single cycle IO port). The only way to affect pins from DMA is via PIO as far as I can tell.

sgtnoodle · on April 10, 2023

Reminds me of a memorable Ethernet peripheral. The peripheral would use the MCU's RAM for its own buffering. There were frame headers that both the peripheral and CPU would need to manipulate. Being an M7 the CPU had a data cache, though, and so the CPU couldn't safely manipulate the headers without first completely disabling the cache.

I ended up using a DMA channel to indirectly manipulate the headers, bypassing the data cache. It was pretty silly but worked, and was way more efficient than disabling the data cache.

dmitrygr · on April 10, 2023

On an m7, you can use the MPU to assign non-cacheable attribute to a memory area. As small as 32bytes and as large as 4GB. Any size representable as 2^n - k * 2^(n - 3) for k = 0..7 and n >= 8

sgtnoodle · on April 10, 2023

The particular driver was being retrofitted into a large enough variety of existing firmwares that we really wanted to avoid introducing any sort of MPU or link time configuration. The DMA channel approach was the least impactful path.

sgtnoodle · on April 10, 2023

Just for the fun of it, I've been reading a bit about the STM32H7 since you mentioned it. One half baked idea I have to achieve only one interrupt per scan line would be to use multiple DMA channels to drive the same SPI peripheral, one for the display SS and the other for the touch SS. Have a scanline ISR start both transfers at the same time, and rely on the well defined DMA channel priorities to deconflict them. It seems like this could possibly work given that the high level DMA documentation says it can control SS.

What I don't know at the moment is how exactly the DMA channel actually controls SS. Does the channel itself have knowledge of the signal, or does it just poke the SPI peripheral at the right time? Said another way, are SS pins assigned to DMA channels, or are they assigned to the SPI peripheral? There's also clock frequency and data mode that would need to be associated with the DMA channel. My guess is it's the latter and my idea wouldn't work.

Both options seem like they would have been reasonable from a design perspective. Being able to associate a specific slave device with a DMA channel and letting the hardware arbitrate the bus would certainly be a beneficial feature. The implementation would be more complicated than just letting the SPI peripheral do all the configuration and state management, and simply enabling DMA to shovel bytes in and out of the FIFO seems the simplest. My guess is that it's the absolute simplest implementation, and the SPI peripheral just has a setting to automatically de-assert SS when its TX FIFO and shift register is empty.

Once again practically, though, I don't think it's too crazy to drive this particular display and touch screen with per-scanline interrupts. Assuming 30Hz 240 lines, that's a baseline of 7.2Khz IRQs. If you want 400Hz touch samples, that's only 400Hz more IRQs. Go crazy and sample the touch screen at 2.8Khz as long as there's enough bus bandwidth. While it feels a bit inelegant to not have the scanlines be harmonic with the touch updates, the resulting jitter is at a sufficiently short timescale that it doesn't matter. Also, for this class of hardware, you're devoting a significant chunk of RAM to any sort of framebuffer. Expecting to budget a modest amount of CPU to push the framebuffer to the display seems completely reasonable.

As you mentioned, it seems like the original intent of the designers were for folk to treat the display hardware as a SPI accessible frame buffer. They were probably targeting 8-bit AVRs popularly used in 3D printers? Update rate was probably the last of their requirements. They probably didn't even consider someone trying to drive the display with DMA. The fact that you're able to drive 50fps is quite remarkable, on top of your achievement of fully offloading it on the rp2040.

dmitrygr · on April 10, 2023

Sadly SS is associated with an SPI peripheral, and only one will drive the same set of pins so this will not work without software to manually control nCS lines as far as i can tell

> 7.2Khz IRQs

M7 takes at least 14 cycles to enter an IRQ handler, and 12 to exit. Your compiler will push at least 2 registers (as per abi alignment requirements) and pop 2 at the end, that is 6 cycles more, and that is before you've done anything, AND assuming 0-wait-state flash (unlikely). Add in, say 100 cycles for your irq handler actual code, and suddenly 1MIPS is gone, AND your main code can have latency spikes of over 132 cycles, which may matter (i have an M7 project where random latency over 6 cycles breaks the required timings for example)

sgtnoodle · on April 11, 2023

That's my point, though. 1 MIPS is less than 1% of the available CPU cycles for a typical M7! :-) As long as you can tolerate the jitter caused by interrupts at all, it seems like it should be fine.

6 cycles is pretty darn tight! I'm curious what you're controlling.

dmitrygr · on April 11, 2023

I was pretending to be a memory stick (the Sony memory storage device). And thus had to reply to external commands. On tight deadlines. Very tight.

pinchies · on April 10, 2023

Fascinating to see how the peripheral hardware resources can be used by someone smart who both knows what they are doing and thinks creatively. Very nice work.

I do hope someone takes you up on making this neat PIO LCD api into a library for others to use!

codetiger · on April 10, 2023

I've done something very similar and created a GL class to draw stuff. This was built specific to support Waveshare 240×320 display for my gaming console built on RP2040.

https://github.com/codetiger/GameTiger-Console

dale_glass · on April 10, 2023

Perhaps this might have started as a display for 3D printers using Arduinos?

Given that the Mega has a pitiful 8K RAM, it's not really suitable for driving displays in anything resembling a normal way. The way you get 3D printers with fancy graphics is that the display has another microcontroller attached, plus SD card software, and takes commands in the form of "Draw rectangle here", and "put this image there".

Fortunately 3D printers seem to be transitioning towards hardware that's less insanely limited.

londons_explore · on April 10, 2023

> Fortunately 3D printers seem to be transitioning towards hardware that's less insanely limited.

If you use some of these more powerful CPU'd printers, you find a problem...

Typically the powerful CPU's run an OS (eg. FreeRTOS), and have multiple processes doing different things.

However, the Marlin software which is common in the 3d printer world is designed to run bare metal. When you put it into an OS thread, it stutters and misses steps on the motors when moving fast. There are places in the code which are timing critical - eg. "this loop must complete in 168 clock cycles or fewer for correct operation".

Modern CPU's might be far faster, but they aren't good at microsecond accurate timing. Rewriting marlin to not rely on being able to rely on such low latency stuff would be a lot of work.

dale_glass · on April 10, 2023

Marlin barely fits on the Arduino anymore, and that's why recent releases also target ARM and ESP32.

In any case, my point is that the Atmel Arduino architecture is old and limited, and imposes a number of design constraints. There's hardware that's both better and cheaper out there now.

indrora · on April 10, 2023

There is no such thing as the "Atmel Arduino"

You're talking about AVR8, which is the ATMega328p and its children and siblings. There exists, though less popular, a complete rebuild of the AVR architecture in 32-bits called AVR32. The instruction set is different, the peripheral IO is different, but the design is similar. The AVR32 was powerful enough for quite some time to run a full fat Linux, though this was dropped for a handful of reasons.

Doxin · on April 11, 2023

Another solution that in my experience seems to work pretty well is what mainsail does. The motion planner runs on a raspberry pi. Theoretically this needs to be a realtime thing, but in practice the RPi is fast enough to where it's never a bottleneck. The actual stepping of the motors is handled by a more traditional microcontroller board with some very basic firmware. As far as I understand the RPi is sending it (step count, direction, rate) commands.

This solves a lot of issues since the microcontroller is doing some really basic things it's well suited for, and the RPi is freed from having hard realtime responsibilities.

rcxdude · on April 10, 2023

Modern CPUs can do microsecond accurate timing just fine. You do need to design the software correctly, however, not just shove something into an RTOS thread.

RobotToaster · on April 10, 2023

RepRapFirmware was better designed for more powerful CPUs.

HeyLaughingBoy · on April 10, 2023

> the display has another microcontroller attached

No, they just use a better MCU like an ESP32. I bought an MKS DLC32 3D printer board hoping to repurpose it for something else. It has a 2.4" TFT touch panel, 3 or 4 motor drivers, and a bunch of I/O, all controlled by an ESP32. Source code on github. All that for $50!

indrora · on April 10, 2023

Something people forgot about 3D printer design is just how many THINGS went into early 3D printer hardware.

The early Reprap hardware was comprised of at least 8...10 discrete hardware devices, all speaking RS485 at each other. The Generation 3 (first MakerBot generation) hardware required a separated, feedback-enabled stepper driver for each axis including extruder.

RobotToaster · on April 10, 2023

>Given that the Mega has a pitiful 8K RAM

It has support for external SRAM, unfortunately it takes a lot of pins so isn't used that often.

erosenbe0 · on April 10, 2023

I think a lot of legacy 1980s style systems already had an 8 bit external data bus (think z80, 80xx, 68xx), so if you wanted to upgrade with more I/O, uart, seven segment displays and the like, the atmega is a flexible choice that could be tacked onto the bus. That's my understanding.

stefan_ · on April 10, 2023

I like people that try to get the most out of the integrated peripherals in creative ways. It's a dying craft in a world where a 300 MHz "micro" is just a dollar away.

ta988 · on April 10, 2023

What kind of 300MHz MCU can you find for a dollar?

polpo · on April 10, 2023

Welll.... the RP2040 is a dollar and can be overclocked to 300MHz (peripheral clocks get a little weird around that point, but they can be clocked down).

_Microft · on April 10, 2023

The name polpo sounds familiar - are you creator of PicoGus?

polpo · on April 10, 2023

Yep, I am.

TJSomething · on April 10, 2023

I remember reading some PIO documentation that explained that PIOs were for handling all sorts of interfaces, from basic flash to cursed displays found on AliExpress.

Brian_K_White · on April 10, 2023

I love this for it's example of what to use pio for as well as how to do it. Showpiece. Put it right into an ides built in examples/templates menu.

jrexilius · on April 10, 2023

wow. this is amazing work, and answers a question I've had for a while. Thanks for putting this out there!

ta988 · on April 10, 2023

The RP2040 is a really capable little microcontroller. http://wiki.picosystem.com/ https://kilograham.github.io/rp2040-doom/

dmitrygr · on April 10, 2023

> and answers a question I've had for a while

I am curious, which?

jrexilius · on April 10, 2023

If it was possible to drive a touch interface and SPI display of that size from an MCU of that size/speed and still be responsive. I have only driven them from Pi Zeros, PocketBeagles and other much larger/faster systems, and still noticed some drag. Up to now, I've stuck to smaller displays without a touch layer for projects where I'm using the RP2040. [caveat: I am _not_ an embedded systems SWE so still figuring out where limitations are hardware vs. my knowledge]

dmitrygr · on April 10, 2023

Now you can use that display too :)

As the driver uses no CPU, CPU speed doesn’t matter

djmips · on April 10, 2023

Impressive work. I hope the original designers of the RP2040 PIO system are well pleased.

ta988 · on April 10, 2023

I hope they are. I also hope their next generation will have more PIOs and ideally a bit more instructions.

dmitrygr · on April 10, 2023

I only have one wish for PIOv2 in future chips: wire it up to the AHBlite bus as a target, behind a small cache, so i can make my own QSPI iface (or 8xSPI, or 16xSPI, or SDRAM, etc)

HeyLaughingBoy · on April 10, 2023

I mentioned upthread that I bought an ESP32-based DLC-32 3D printer/laser controller board. It uses a 2.4" TFT SPI display. Admittedly, I didn't use it for very long before reflashing it with my own test code, but the display seemed pretty responsive.

I've been doing a lot of work with TFT_eSPI recently, but only on the display side (not using touch input) using the LVGL framework and it is very nice and smooth.

geerlingguy · on April 10, 2023

Possibly "is the Waveshare display really that bad?" — as someone who's implemented a few projects with it but suffered from its terrible existing driver support... I know it's nice to see a more in-depth explanation of what makes it bad (and creatively working around that) in this post.

gmiller123456 · on April 10, 2023

FYI, cheap = $15.99 US + shipping.

dmitrygr · on April 10, 2023

The cheapest 160x160+ screen that can be bought off-the-shelf and has touch integrated. At least the cheapest I found. Also available on aliexpress for $12 and free shipping :)

If you find a cheaper one, please let me know. I'll thank you.

picture · on April 10, 2023

Is there anything wrong with the ubiquitous ILI9488 3.5 in displays? I see some on aliexpress with resistive touch screen for below 6 dollars not including shipping. These are not integrated on a board but has 40 pin standard FPC connector, with some passives on the ribbon. (I personally prefer this over waveshare's pcbs) EDIT: 320 by 480 btw

dmitrygr · on April 10, 2023

Nothing wrong with them, but that is too many pixels for my purpose. PalmOS 5.2 needs a minimum of 64KB of storage RAM, 128K of dynamic RAM, 32K for the kernel, AND a framebuffer to run, and RP2040 has too little RAM to handle a 320x480 display.

Narishma · on April 10, 2023

You could use it with a 320x240 framebuffer and just scale 2x vertically when rendering.

dmitrygr · on April 10, 2023

It is possible. But this screen works better and people who want to play with my project can get it easily. Even those who cannot solder.