Nice! I wrote a 5-stage pipelined RISC MIPS processor in SystemVerilog last semester, in one day, drunk as HELL, but hey, it worked amazingly well. Thank goodness they didn't ask for branch prediction.
Does it do anything different or better than Rocket (or Sodor) that would be notable? Why VHDL instead of the more conventional (heh, for this ISA) Chisel? Would be good to see some notes about architecture in the README just to tell me what I'm looking at.
I am still leery of the "unconventional" HDLs. I don't really see many of them as much of an improvement over say, VHDL-2008; you get records and functions and instantiation is not as verbose. Yes, the language is a little wordy but it's well proven and I already know it; and many other people know as well, so they can read it and understand it better w/o learning a new language.
Higher level generator HDLs shine when you want something heavily parametric. VHDL and Verilog code generation features are way too weak, so having a higher level meta-language helps a lot.
The code that is committed isn't directly synthesizable, it needs to be transformed to HDL by a tool that is itself being updated.
I tried what looked to be a supported flow, generate Verilog for FPGAs then synthesize it for Xilinx Zynq devices. The Verilog errors looked correct to me.
Then please submit an issue to the github issue queue. Don't complain on some internet forum and hope that somebody will read your mind. It works fine for me.
Sure, I don't mean to suggest there aren't valid reasons for writing cores in VHDL, etc., I just want to clarify that Chisel (unlike say Verilog) will, by fiat, only ever generate synthesizable code. If that's ever not the case, you should let them know.
Second, I looked into your tests directory and from first impressions there's not much there. What is there is kinda messy and not conducive to thorough testing.
If I could make a suggestion: you should look at reusing the work of others. You are in the lucky position where some work has been done for you! There's a massive amount of tests for RISC-V that already exist at https://github.com/riscv/riscv-tests (by weird coincidence someone posted links about this in another thread a few days ago!)
If I were you I would be going through that stuff to try figure out how to get those tests running in my environment. E.g. You have to compile a test and create a mechanism to load the binary into your testbench memory or whatever.
I think I am one of those who would do it the other way around, from the outside inwards connecting to the outside world first. I already know ALL the I/O for the core is somewhere in the RISC-V design code (exact signal names etc). It has to be in Chisel or something.
I would translate that to VHDL to get my ports. It becomes a stub to build my core-level testbench on. If I can mirror their test environment, I at least have a start point. Maybe I could even reuse their testbench somehow.
Then I'd start duplicating the sub modules and their interconnections in the core. And so on.
Do the tough stuff first then enjoy things getting easier as I go along hopefully!
Thank you :)
Also thanks for the information; I had been wondering if someone had made some more thorough automated tests, I will definitely check those out!
One thing I've wondered about recent ISAs is regarding split register files. Since most CPUs are single-chip implementations these days, why not have integer, fp, and vector registers unified at the ISA level to allow different implementation points and less state spilling/loading during context switches:
(1) High performance implementations use register renaming anyway, so they can easily use a split register file internally without exposing it at the ISA level.
(2) Low power implementations can use a single register file (at the cost of fewer I/O ports).
This would also mean that when switching threads, only a little bit more state than the vector unit registers would need to be stored and loaded.
The RISC-V ISA manual covers your question on page 37 (riscv.org).
> "a split organization increases the total number of registers accessible with a given instruction width, simplifies provision of enough regfile ports for wide superscalar issue, supports decoupled floating-point unit architectures, and simplifies use of internal floating-point encoding techniques. Compiler support and calling conventions for split register file architectures are well understood, and using dirty bits on floating-point register file state can reduce context-switch overhead."
(1) Not really. It's easy to go from ISA says "split" and your processor uses "unified", but it's much harder to go the other way with it... the whole point of a unified ISA register file is you can trivially write to a "FP" register and then read it for a "integer" ALU operation. Now you've made that very hard if you try to internally split the RF.
I see one place seemingly using RISC-V with an MMU in the classic desktop PowerPC style (Linus Torvalds posted a great rant about the stupidity of that MMU) and another place that is seemingly using RISC-V with an MMU that is very much like x86 (the paging part, obviously no segmentation) but with distinct rwx.
Which is it? Did this not get specified? Constantly changing the MMU greatly hurt 32-bit SPARC and PowerPC.
FWIW, this is good: Bits 0..11 direct mapped, bits 12..29 are x86-style page table tree node indexes that are hardware-walked, and bits 30..63 are software-filled like MIPS. (a forest of trees) In the low bits of the bottom level you get: can read, did read, can write, did write, can execute, did execute, user/super (exclusive), type ram/framebuffer/mmio/pte (two bits), reserved, and validity. The "did foo" bits on PTE pages do get updated.
Thanks. The following concerns paging, not the base/limit system:
From a security and reliability perspective, I'm saddened to see that rwx got supported while --x did not get supported. That is backwards. Having to change permissions after code modification is not bad; this provides a convenient point for cache flushing and ASLR-enforced address changes. Preventing executable code from being misused as data is valuable.
I'm also saddened to see that user access implies supervisor access. This too is exactly backwards; nothing should be both user and supervisor accessible. Given that data access can be performed at a less-privileged level by setting MPRV=1, the ability of the supervisor to access user pages normally is especially strange.
Lack of distinct did-execute and did-read bits is mildly annoying. If a page is marked as being accessed and executable, one must assume that it is now in BOTH the instruction cache and the data cache.
I have mixed feelings about having page frame numbers shifted over by two bits. The win is Sv32 getting a reach of 16 GiB. I suppose this is worth the minor annoyance when debugging OS kernel code.
Other than that, I like it. It's certainly sane. The traditional page table is pretty good for the middle bits of the virtual address. I think it is less good for the upper bits due to ASLR, and I hate to see anything that encourages a failure to use all 64 bits of the virtual address space.
You should read through the RISCV mailing list archives (https://lists.riscv.org/lists/) for discussions on these topics, and contribute your own thoughts if they haven't been covered. The ISA-dev list should be the most relevant. They have thought very carefully about these things and I'm sure they'd appreciate additional feedback on the topic.
Oh, one other problem. The referenced (R) and dirty (D) bits are only updated in leaf nodes. I strongly suspect that any performance you gain here will be more than consumed by OS software needing to scan all the leaf nodes for these bits. If you update non-leaf nodes, then the OS can use those to avoid checking many of the leaf nodes.
Interrupts / Exceptions are probably the most difficult piece and may force a complete redesign. Since they are not done yet, I'd wait a little bit longer...
Basically, detect a few cases in Decode, pass the rest of the instruction down the pipeline to the commit(memory) stage, and let the commit stage detect exceptions and redirect the PC as required.
or1k used a very simple approach - duplicate all the pipeline registers in all the stages and use two register files. Then store the current state once interrupt is raised and restore when handler done working. This design can be easily derived from an existing one.
If you would want to generate some nice block diagrams displaying the components at different abstraction levels and the signals connecting them you could try synthesizing your design in Alteras' Quartus (a free version is provided on their website) and then using the Netlist / RTL viewers. It exports to PDF.