Show HN: A pipelined RISC-V processor written in VHDL

nekoeth0 · on July 28, 2015

Nice! I wrote a 5-stage pipelined RISC MIPS processor in SystemVerilog last semester, in one day, drunk as HELL, but hey, it worked amazingly well. Thank goodness they didn't ask for branch prediction.

ajross · on July 28, 2015

Does it do anything different or better than Rocket (or Sodor) that would be notable? Why VHDL instead of the more conventional (heh, for this ISA) Chisel? Would be good to see some notes about architecture in the README just to tell me what I'm looking at.

inforichland · on July 29, 2015

I am still leery of the "unconventional" HDLs. I don't really see many of them as much of an improvement over say, VHDL-2008; you get records and functions and instantiation is not as verbose. Yes, the language is a little wordy but it's well proven and I already know it; and many other people know as well, so they can read it and understand it better w/o learning a new language.

sklogic · on July 29, 2015

Higher level generator HDLs shine when you want something heavily parametric. VHDL and Verilog code generation features are way too weak, so having a higher level meta-language helps a lot.

rjsw · on July 28, 2015

I expect that the author is using VHDL as there is one less translation step involved than when using Chisel.

If Chisel generates HDL which won't synthesize then a typical hardware designer isn't going to get very far trying to debug it.

_chris_ · on July 28, 2015

Chisel by definition can ONLY generate synthesizable Verilog.

rjsw · on July 28, 2015

It wasn't doing for rocket when I tried building it at the weekend and won't even build RISC-V at all right now.

Feel free to fix it.

gluggymug · on July 28, 2015

There's two possibilities.

- Rocket has no automated build flow that checks whether committed code is synthesizable.

- you aren't using the same flow as them. I struggled to find what synthesis tools they used just now so I don't blame you!

rjsw · on July 29, 2015

The code that is committed isn't directly synthesizable, it needs to be transformed to HDL by a tool that is itself being updated.

I tried what looked to be a supported flow, generate Verilog for FPGAs then synthesize it for Xilinx Zynq devices. The Verilog errors looked correct to me.

gluggymug · on July 29, 2015

I saw a little info on that at https://github.com/ucb-bar/fpga-zynq

I guess you are referring to that?

It's supposed to be a relatively mature tool so there's got to be a version that works ok.

_chris_ · on July 28, 2015

Then please submit an issue to the github issue queue. Don't complain on some internet forum and hope that somebody will read your mind. It works fine for me.

rjsw · on July 28, 2015

I don't have a github account.

I just suggested a reason why someone might choose to use a conventional HDL.

_chris_ · on July 28, 2015

Sure, I don't mean to suggest there aren't valid reasons for writing cores in VHDL, etc., I just want to clarify that Chisel (unlike say Verilog) will, by fiat, only ever generate synthesizable code. If that's ever not the case, you should let them know.

gluggymug · on July 28, 2015

Firstly, I commend you on your work.

Second, I looked into your tests directory and from first impressions there's not much there. What is there is kinda messy and not conducive to thorough testing.

If I could make a suggestion: you should look at reusing the work of others. You are in the lucky position where some work has been done for you! There's a massive amount of tests for RISC-V that already exist at https://github.com/riscv/riscv-tests (by weird coincidence someone posted links about this in another thread a few days ago!)

If I were you I would be going through that stuff to try figure out how to get those tests running in my environment. E.g. You have to compile a test and create a mechanism to load the binary into your testbench memory or whatever.

_chris_ · on July 28, 2015

"E.g. You have to compile a test and create a mechanism to load the binary into your testbench memory or whatever."

God, that's always the hardest part about these things. The core takes you a weekend, but the connection to the outside world takes forever.

gluggymug · on July 28, 2015

I think I am one of those who would do it the other way around, from the outside inwards connecting to the outside world first. I already know ALL the I/O for the core is somewhere in the RISC-V design code (exact signal names etc). It has to be in Chisel or something.

I would translate that to VHDL to get my ports. It becomes a stub to build my core-level testbench on. If I can mirror their test environment, I at least have a start point. Maybe I could even reuse their testbench somehow.

Then I'd start duplicating the sub modules and their interconnections in the core. And so on.

Do the tough stuff first then enjoy things getting easier as I go along hopefully!

inforichland · on July 29, 2015

Thank you :) Also thanks for the information; I had been wondering if someone had made some more thorough automated tests, I will definitely check those out!

KMag · on July 28, 2015

One thing I've wondered about recent ISAs is regarding split register files. Since most CPUs are single-chip implementations these days, why not have integer, fp, and vector registers unified at the ISA level to allow different implementation points and less state spilling/loading during context switches:

(1) High performance implementations use register renaming anyway, so they can easily use a split register file internally without exposing it at the ISA level.

(2) Low power implementations can use a single register file (at the cost of fewer I/O ports).

This would also mean that when switching threads, only a little bit more state than the vector unit registers would need to be stored and loaded.

_chris_ · on July 28, 2015

The RISC-V ISA manual covers your question on page 37 (riscv.org).

> "a split organization increases the total number of registers accessible with a given instruction width, simplifies provision of enough regfile ports for wide superscalar issue, supports decoupled floating-point unit architectures, and simplifies use of internal floating-point encoding techniques. Compiler support and calling conventions for split register file architectures are well understood, and using dirty bits on floating-point register file state can reduce context-switch overhead."

(1) Not really. It's easy to go from ISA says "split" and your processor uses "unified", but it's much harder to go the other way with it... the whole point of a unified ISA register file is you can trivially write to a "FP" register and then read it for a "integer" ALU operation. Now you've made that very hard if you try to internally split the RF.

milspec · on July 29, 2015

MMU?

I see one place seemingly using RISC-V with an MMU in the classic desktop PowerPC style (Linus Torvalds posted a great rant about the stupidity of that MMU) and another place that is seemingly using RISC-V with an MMU that is very much like x86 (the paging part, obviously no segmentation) but with distinct rwx.

Which is it? Did this not get specified? Constantly changing the MMU greatly hurt 32-bit SPARC and PowerPC.

FWIW, this is good: Bits 0..11 direct mapped, bits 12..29 are x86-style page table tree node indexes that are hardware-walked, and bits 30..63 are software-filled like MIPS. (a forest of trees) In the low bits of the bottom level you get: can read, did read, can write, did write, can execute, did execute, user/super (exclusive), type ram/framebuffer/mmio/pte (two bits), reserved, and validity. The "did foo" bits on PTE pages do get updated.

_chris_ · on July 29, 2015

You can find your answers in the Privileged ISA spec (http://riscv.org/download.html#tab_spec_privileged_isa).

Of course, it's an open ISA, so you can do whatever you want. The style of virtual memory you choose to use will depend on the target application.

milspec · on July 29, 2015

Thanks. The following concerns paging, not the base/limit system:

From a security and reliability perspective, I'm saddened to see that rwx got supported while --x did not get supported. That is backwards. Having to change permissions after code modification is not bad; this provides a convenient point for cache flushing and ASLR-enforced address changes. Preventing executable code from being misused as data is valuable.

I'm also saddened to see that user access implies supervisor access. This too is exactly backwards; nothing should be both user and supervisor accessible. Given that data access can be performed at a less-privileged level by setting MPRV=1, the ability of the supervisor to access user pages normally is especially strange.

Lack of distinct did-execute and did-read bits is mildly annoying. If a page is marked as being accessed and executable, one must assume that it is now in BOTH the instruction cache and the data cache.

I have mixed feelings about having page frame numbers shifted over by two bits. The win is Sv32 getting a reach of 16 GiB. I suppose this is worth the minor annoyance when debugging OS kernel code.

Other than that, I like it. It's certainly sane. The traditional page table is pretty good for the middle bits of the virtual address. I think it is less good for the upper bits due to ASLR, and I hate to see anything that encourages a failure to use all 64 bits of the virtual address space.

_chris_ · on July 29, 2015

You should read through the RISCV mailing list archives (https://lists.riscv.org/lists/) for discussions on these topics, and contribute your own thoughts if they haven't been covered. The ISA-dev list should be the most relevant. They have thought very carefully about these things and I'm sure they'd appreciate additional feedback on the topic.

milspec · on July 29, 2015

Oh, one other problem. The referenced (R) and dirty (D) bits are only updated in leaf nodes. I strongly suspect that any performance you gain here will be more than consumed by OS software needing to scan all the leaf nodes for these bits. If you update non-leaf nodes, then the OS can use those to avoid checking many of the leaf nodes.

alain94040 · on July 28, 2015

Interrupts / Exceptions are probably the most difficult piece and may force a complete redesign. Since they are not done yet, I'd wait a little bit longer...

_chris_ · on July 28, 2015

It's actually really pretty easy to do. I put together a handful of different RISC-V cores that all implement the privileged/supervisor spec (https://github.com/ucb-bar/riscv-sodor/blob/master/src/rv32_...).

Basically, detect a few cases in Decode, pass the rest of the instruction down the pipeline to the commit(memory) stage, and let the commit stage detect exceptions and redirect the PC as required.

inforichland · on July 29, 2015

Exactly, that's how I would (hopefully will soon) implement it as well; it takes a few extra cycles but avoids extra overhead.

sklogic · on July 29, 2015

or1k used a very simple approach - duplicate all the pipeline registers in all the stages and use two register files. Then store the current state once interrupt is raised and restore when handler done working. This design can be easily derived from an existing one.

Gladdyu · on July 28, 2015

If you would want to generate some nice block diagrams displaying the components at different abstraction levels and the signals connecting them you could try synthesizing your design in Alteras' Quartus (a free version is provided on their website) and then using the Netlist / RTL viewers. It exports to PDF.