Chisel: Constructing Hardware in a Scala Embedded Language

nickpsecurity · on July 26, 2015

Chisel is a nice toolkit that proved itself in the Rocket processor. One of the more unusual uses of it was Chisel-Q for quantum computing:

http://www.cs.berkeley.edu/~kubitron/papers/qarc/pdf/Chisel-...

Chisel is pretty well-known in academic, hardware community. So, here's a few that you might have not heard of.

Caisson - language-based security meets HDL http://www.cs.ucsb.edu/~chong/papers/109-Caisson-pldi.pdf

SHard - a scheme to hardware compiler http://scheme2006.cs.uchicago.edu/05-saint-mleux.pdf

Cx-Lang - A statically-typed, c-like, HLL for hardware http://cx-lang.org/

Note: I'd love for some people familiar with ASIC or FPGA design to check out Cx-Lang to see if it's good for beginners getting results on FPGA's. The I.P. they sell is so cheap that it's either (a) crap or (b) the result of a productive, synthesis tool (Cx). Just like to know if it's a decent HLS tool compared to FPGA or EDA company offerings. Additional advantage that it's open so it can be reviewed for subversion if one is willing to invest the effort.

gluggymug · on July 26, 2015

Just took at look at the cx-lang website. I think it may be too weird for a beginner looking for results. I've been in ASIC & FPGA engineering for 15+ years. The design flow I think they use is more like writing a program that gets translated into hardware. Very different from the regular methodology. Sounds great but there's a big leap between that and getting your FPGA working. For a beginner, it's too much to take on.

I think maybe Chisel and those others would be easier.

nickpsecurity · on July 26, 2015

Yeah, I was actually hoping it was a high-level synthesis tool. I appreciate your review on the ease-of-use aspect.

gluggymug · on July 26, 2015

I think it is a high level synthesis tool like a SystemC etc.

My concern for a beginner would be how to get your high level design integrated into the FPGA. For a beginner, there are a lot of what-ifs that they are probably going to stumble on.

trsohmers · on July 26, 2015

Chisel is not high level synthesis, which is one of the reasons why I love it. You are actually describing the circuit itself, with is what a hardware description language is supposed to do. While describing the function of a circuit like MyHDL and many other languages have tried to do it cute, they have never come close to a real engineer thinking of how to solve the problem with registers and combinational logic directly.

jandecaluwe · on July 27, 2015

You have it completely backwards. The best "real engineers" do not "think directly" on how to solve the problem with registers and combinational logic. They think about how to solve the problem functionally and let synthesis take care of the logic as much as possible. MyHDL is ideal for that.

That is how ASICs/FPGAs have been designed by the best teams for the last 2 decades. And it is much more productive than the textual schematic entry you are describing. There is simply no comparison. Synthesis works.

As for your word-playing attempt: "description" stands for a number of things, among those "describing behavior". Just check the Verilog/VHDL LRM or MyHDL manual.

nickpsecurity · on July 27, 2015

Exactly. It's crazy to even think, outside of analog, that someone would want to build a modern SOC with RTL directly. Even the full-custom folks like Intel, correct me if I'm wrong, essentially use a hardware approach where they can abstract away from and use EDA to synthesize to the stuff they hand-craft. Most SOC builders simply don't have the labor to waste working at RTL for a complex design & its inevitable problems. It's why they pay so much to the Big Three for synthesis tools.

For software people following, it would be like trying to build Microsoft Word with assembler while competition was using C/C++.

Note: Chuck Moore of Forth fame may be the exception to this rule in SOC design. Then again, he's an exception to a lot of them. ;)

http://www.ultratechnology.com/okad.htm

gluggymug · on July 28, 2015

It depends on the project.

For SOCs that I was involved with (DSPs, mobile phone SOC, wifi SOC), you are bringing together a lot of different IP from various sources. There's no way to use these HLS tools unless the third parties feel like writing a model for their IP in your choice of HLS tool language. This means you would have gaps in your HL design everywhere their stuff fits in. Verification in the high level language would be tough work until those gaps are filled.

No doubt guys like Intel can use the fancy HLS tools on their SOC because they own every module in the design. Same goes for the RISCV stuff: They can run Chisel sims because they wrote every part of the design in Chisel.

The rest of us idiots are doing straight Verilog RTL because SOC level is more about gluing a lot of different modules together with low level logic. Maybe there's a tool here or there that generates a little Verilog from some other language for you but that is very piecemeal.

It really is like building Word with assembler. But your job is more about linking together static libraries that already work. It's grunt work and nothing special. Being able to discover a bug via verification is the skillful part.

This is where the cosimulation that MyHDL supports is so handy. Reading up on it a bit more, it sounds very promising for the future. The Verilog RTL parts run in the Verilog simulator while the MyHDL parts run in the MyHDL simulator.

nickpsecurity · on July 28, 2015

" There's no way to use these HLS tools unless the third parties feel like writing a model for their IP in your choice of HLS tool language."

I worried about that as it's a common problem in any domain integrating different languages or models. Sounds like hardware equivalent of wrappers in cross-language, software development.

"The rest of us idiots are doing straight Verilog RTL because SOC level is more about gluing a lot of different modules together with low level logic."

That's actually good news and hopeful for people my research supports given stuff you worked on. If it's really grunt work, then all these amateurs digging into actual HDL wanting to do great things might get it done if they leverage FPGA or ASIC-proven I.P. With at least one pro on team, maybe two if mixed signal. People like me wanting to cheat it without RTL are apparently screwed lol.

"Being able to discover a bug via verification is the skillful part."

Two have said that in one day. The other person said this: "People can do a design without much skill and it might mostly work right. People screwing up on verification can mess up the whole thing." Rings true as I think of mask costs and Intel's recall.

"This is where the cosimulation that MyHDL supports is so handy."

All this time, I thought co-simulation (i.e. equivalence checking w/ tests) was standard in your industry. I know Sandia's HW people did it and high assurance does it between abstraction levels too to catch their gaps. It was essential to me in the latter as an assumption or structural detail would change to throw off safety/security properties. You saying equivalence checking at each layer is not normal in commercial, SOC design? That it's essentially only the shops using the best EDA tools and such?

Just surprising is all. Would also seem easy if you just use the execution-trace-based, equivalence checks. You can script those to a degree in most domains and languages. They're not perfect but I thought that MyHDL feature was a knockoff of what industry was already doing haha.

gluggymug · on July 28, 2015

You saying equivalence checking at each layer is not normal in commercial, SOC design? That it's essentially only the shops using the best EDA tools and such?

We have a difference in terminology. Co-simulation and equivalence checking refers to different things in HW and neither are what you are referring to.

Co-sim is when you have two models running in simulation and you could possibly compare them through time for mismatches. Or you run some sub-modules of the design in a Verilog simulator and other sub-modules in your HLS tool simulator and the modules can interact.

Equivalence checking is usually referring to different type of tool called the formal equivalence checker (FEC). It performs analysis of the two models without doing simulation with weird algorithms like decision trees. This is usually used to compare the Verilog RTL to the gate-level netlist as an additional quality control measure. If you can imagine the synthesis tool, it's optimizing the logic you expressed in RTL and possibly put a lot of different gates and signals. The FEC checks that usually.

When you have two cycle-accurate models, you usually would try to do FEC. But it's typically both Verilog models! There's probably no Chisel-Verilog FEC or "any HLS"-Verilog FEC tool!

So what you are referring to as co-simulation is typically just called simulation. That IS standard at each layer so you would be correct: everyone does simulation. It's just a question of how thorough.

What MyHDL offers in co-simulation is something more. The ability to mix the MyHDL designs and Verilog RTL designs into one simulation. So if you had a 3rd party mem cache in Verilog you could connect it to your MyHDL CPU and run a simulation.

Industry tools do support co-simulation as well! But for me it's a good surprise that MyHDL manages to knockoff that feature because the other HLS don't seem to be able to. Maybe they can though, I am not sure. E.g. Chisel can create a C++ model. I am sure I could hack something together given time.

nickpsecurity · on July 26, 2015

Of course it's not. That's why real HW types use it and people like me don't. ;) One of my goals on HW security side is to bring synthesis to a level that non-HW people can use it for an OK solution. I found Cx when looking at the awesome Qflow method and tools.

Since you post a lot, what's your opinion on Cx as a HLS tool for programmers without expertise as HW engineers? And outside big EDA, what is your recommendation for lowest cost vs effectiveness HLS for those wanting to clean-slate their hardware or at least accelerate things on FPGA's?

gluggymug · on July 26, 2015

While describing the function of a circuit like MyHDL and many other languages have tried to do it cute, they have never come close to a real engineer

I agree with you on that. For a beginner I think a high level language is the wrong direction to go because it's taking them further away from the method of designing a HW solution. It's adding another layer of abstraction.

nickpsecurity · on July 26, 2015

That's what people said about going to 3GL's. I'm glad, for my day-to-day computing, that most went in the other direction (pro 3GL's). Software people can't get the kind of results hardware people can while using HLS techniques. That's obvious. Yet, the existing research shows they should be able to get quite a bit of results with little knowledge of hardware. EDA tools actually solved most of the hard parts. Only thing that's lacking are tools to accomplish this and at prices most people can afford.

Two paths developed in parallel. One for those trying to boost their software with hardware generation. One for hardware designers trying to improve their own craft. And with the benefit that one's tools can integrate the other. Best of both worlds.

So, I look for both.

gluggymug · on July 26, 2015

I think maybe the HLS can get you a core design but then when you try to use it and integrate into an FPGA you will have to deal with the low level issues of registers and combinatorial logic. This is fine for the experienced HW person, but not for the beginner. In other words, the HLS is better for the experienced HW designer than the beginner even though it looks like its easy.

Unfortunately there is no infrastructure for you to build onto. It's like a blank slate. All the I/O needs to be done right. It's very seldom just instantiating wires unless your design is blazingly simple. And if it were simple you wouldn't need an FPGA in the first place. A generic coprocessor would kick most FPGAs butts.

nickpsecurity · on July 26, 2015

Ok that all makes more sense. One of my potential flows was actually having the amateurs build lost of functionality (eg accelerators, device protocols) with HLS tools that a hardware engineer could finalize for FPGA. Would save on the rare labor. What do you think of that model vs exclusively leveraging HW people?

Far as ASIC's, I'd just get HW people, haggle on the tools, use free one's if possible (see Qflow), and do a MPW run for prototyping/production. eASIC's 90nm maskless stuff, potentially. Start it on 180nm-350nm, though, as there's lots of cheap masks and fab capacity there.

gluggymug · on July 26, 2015

What do you think of that model vs exclusively leveraging HW people?

Sounds good.

Cheap fab I am not sure whether the costs make sense. Depends on the project I guess. I don't have a lot of experience with those manufacturing processes.

minthd · on July 26, 2015

Still , Cx - lang is having a real hard time to find customers. So what's missing ?

nickpsecurity · on July 26, 2015

That's part of what I'm trying to find out. To be fair, most open-source, high-level synthesis tools don't have a lot of customers. Even Chisel doesn't have near as many users as its successes deserve. That's common for ASIC or FPGA tools that aren't the big name.

For now, my preliminary answer is that people: (a) just use Altera and Xilinx HLS tools for FPGA because they're cheap and work well with their products; (b) use HLS tools from big three EDA companies on ASIC design; (c) straight up do HDL (majority probably) for ASIC as experienced HW engineers are used to doing anyway and use good RTL synthesis. Against the competition, there's no comparison of free/open synthesis tools except in price and subversion risk. Those are my reasons for investigating them, though, so I continue to get feedback on what I find.

Note: Not sure of your HW expertise. If you're not a HW person, I'll note that anything synthesized for ASIC's needs to have a rock-solid method because mistakes are expensive. Even older processes still cost hundreds of thousands for the masks that print an instance of a design. Production engineers are hesitant to use unproven technology when $$$ are on the line.

MootWoop · on July 26, 2015

Matthieu, co-founder of Synflow here. Maybe I could help? Don't hesitate to send me an email!

nickpsecurity · on July 27, 2015

I probably will soon since you took the time to show up. :) I think what I'll do is form a list of questions to assess the nature and capabilities of your method. These will represent what a lot of different kinds of people would ask. Then, your answers might get packaged up in FAQ's, blog posts, whatever that clarify things for more people than just me. How's that sound?

MootWoop · on July 27, 2015

Sounds great! I'll do my best to answer all questions :-) My email is matthieu.wipliez@synflow.com

Skooma · on July 26, 2015

In my opinion, Chisel feels like "lets write Verilog with Scala syntax." I personally see MyHDL as a better approach as you can leverage existing libraries for generating code (i.e. use scipy to generate the coefficients for your FIR filter). One plus is Chisel generates C++ code for testing your design which is a huge speed increase versus simulating Verilog.

trsohmers · on July 26, 2015

The biggest problem with MyHDL in my opinion (As someone using Chisel to make a commercial processor) is its (non existent) "real" simulation capabilities. If something is not cycle accurate (the way that Chisel's C++ simulator is), you can not really be positive of anything...and you don't want to get to physical design and find out you can't meet timing.

jandecaluwe · on July 27, 2015

Stop the misrepresentation and do your homework first please. MyHDL is as cycle accurate as Verilog & VHDL. It's based on the same event-driven paradigm - the paradigm that real industrial designers have been using to design real products for two decades. And you do not need conversion to Verilog/VHDL to achieve this - the core of MyHDL is a simulator.

The real difference is that in event-driven languages, clock events are explicit, instead of implicit like in Chisel and the whole array of dead HDLs that preceded it. So if history is any guide, Chisel is dead upon arrival.

cfelton · on July 27, 2015

Here is some information on the speed of MyHDL, http://www.myhdl.org/documentation/performance.html.

Converting to yet another format can be a pain and potentially dangerous, if you are creating a commercial ASIC you need to guarantee each of these formats/representations are equivalent, if there is a discrepancy between the generated C++ and generated Verilog for implementation - oh boy.

cfelton · on July 27, 2015

This is ridiculous, you can create cycle accurate (and bit accurate) descriptions in MyHDL. Anyone can verify this themselves by creating an RTL description in MyHDL, convert it to Verilog, and perform a Cosimulation. One can even Cosimulation with the back-annotated layout netlist and verify - yes cycle accurate.

gluggymug · on July 26, 2015

You saying MyHDL is not cycle accurate? Could you give a citation for that?

trsohmers · on July 26, 2015

Check out the bottom: http://www.myhdl.org/start/whatitisnot.html

I should say that in generating Verilog or VHDL from MyHDL, you can do proper simulations with that... but Chisel's C++ simulator is significantly faster than a Verilog simulator, while still being cycle accurate.

gluggymug · on July 26, 2015

Okay this is getting weird. I just read your link so this is all from my first impressions.

MyHDL is talking about co-simulation on the gate-level netlist. They do not recommend it. However they say MyHDL can do co-simulation on Verilog RTL. That will be cycle accurate I believe.

As I understand it Chisel can't even do co simulation at all. Am I incorrect? Essentially once you translate your Chisel design to verilog you basically can't reuse your verification environment for the RTL simulation or the gate-level. How are you going to check timing if you wrote all your tests in Chisel?

So Chisel seems worse than MyHDL but neither can support verifying a gate-level sim so you are somewhat screwed either way. It's just a matter of how badly screwed you are.

That's kinda the problem with all these tools I believe. A while ago someone posted about Clash. Same issue.

trsohmers · on July 26, 2015

Chisel doesn't require co simulation, as it is actually RTL in itself... it describes actual logic (registers and gates) instead of the functionality of the chip (as MyHDL does). Chisel being actually RTL makes it much easier for actual hardware engineers to use, but still allows for very powerful paramaterization that cuts down on total number of lines that have to be written. I would even say that it being embedded within Scala helps software engineers who have basic knowledge of hardware be able to be much more productive than if they were trying to write in Verilog.

Getting into my opinion here: High level synthesis has never been (and I doubt for a long time) able to beat an experienced RTL engineer because of that fact that a software description (such as MyHDL, Clash, SystemC, etc) of a piece of logic will never map perfectly to hardware. Chisel doesn't have this problem, as you are just describing (or "constructing") the logic itself. No "translation" process happens, and thus you don't have any problem actually simulating at the same RTL level as Verilog.

jandecaluwe · on July 27, 2015

Complete nonsense. First, a synthesizable MyHDL description is at exactly the same abstraction level as a synthesizable Verilog/VHDL description. This paradigm is the basis for the standard, successful industrial flow. This is what real hardware designers do and it is not HLS.

In sharp contrast to what you suggest, logic generation is not the main hardware design problem. And the fact that you think you can beat RTL synthesis by "generating" or "constructing" the logic itself proves one thing: you don't have the slightest idea of the real power of RTL synthesis.

The main problem with hardware design is verification. VERIFICATION. And for that reason, HDLs should not be limited a "fully synthesizable" subuset, but support powerful modeling concepts in the first place. And so far, nothing beats the event-driven paradigm for that purpopse.

gluggymug · on July 26, 2015

I've been skimming through MyHDL stuff. It does support RTL like Chisel supposedly.

http://docs.myhdl.org/en/latest/manual/rtl.html

So it should simulate RTL and give you cycle accurate waves.

No gate level sim for either tool means you're dead meat which ever way you go, so I am not sure why I am even bothering to look at this stuff.

The respective merits of each approach is somewhat pointless when crucial steps in the HW process are completely ignored.

And now you are saying you don't test on the generated Verilog from Chisel? That's no good. No one should just run tests in Chisel. You're delivering HW not Chisel code as far as I am concerned. That's a few huge steps more in the flow that need vigorous testing. Where is the quality control?

So much more logic is added to chips during/after synthesis these days and you have no way to get any of your tests running on that net list.

trsohmers · on July 26, 2015

Of course we do all of our real tests on the generated verilog. (Note: There is a big difference between generated verilog, in the Chisel sense, and synthesized verilog, in the MyHDL sense). We use Chisel to write our RTL and do quick and dirty testing with our gold model, and if that matches up, we generate the verilog and do the "real" testing from that point. This whole flow is a iterative process that can happen several times a day, or once a week, depending on how much development is going on. This of course leaves out the other steps once you synthesize the verilog for either a FPGA or VLSI tools, which we also do.

What you linked to there seems to be a "mode" (Can't think of a better name to describe it at the moment), similar to how you can embed verilog or C++ into Chisel.

cfelton · on July 27, 2015

This is useless without any data: "big difference between the generated verilog". You would need an actual case study to confidently state these and as pointed out with programming languages this is difficult to do.

gluggymug · on July 26, 2015

Ok. If you do the majority of your testing on the generated Verilog, I am not sure what the point of the Chisel C++ cycle accurate simulator is. I thought that ran on the Chisel RTL.

What simulator do you use to test the generated Verilog? And what language do you write the tests in?

stephenmm · on July 26, 2015

Actually this is the area that I am most interested in. I am responsible for writing SystemC models of our designs so that firmware/software guys are able to work in parallel with RTL development but it takes me some time to create the models and get the timing reasonably equivalent (AT in SystemC speak) to the RTL. If I had a flow were the same source could generate cycle accurate models AND verilog this might be an acceptable answer and allow us to develop SW/FW much more quickly.

In my world even though hardware takes an extraordinarily long time the FW/SW is still the long poll in developing complex systems and anything that shortens the time to $$$ in the door we should probably be looking into.

gluggymug · on July 26, 2015

That does sound like a good aspect of Chisel. How do they test that the Chisel RTL and the generated Verilog are equivalent though?

To me I would want a good verification environment that tests one against the other. Testing just via loading self-checking code into FW/SW is not enough. That's almost like SW verification. It assumes way too much is working correctly.

I am fantasizing if its possible to also write the Verification IP in Chisel AND convert that into a separate c++ library, then you can reuse it in a commercial Verilog RTL and gate-level simulator via a PLI.

That would serve many purposes and make Chisel useable in a generic flow.

stephenmm · on July 26, 2015

I have some of the same concerns and would like to see someone from industry who has actually done it before I spend much more time on it. My additional question is then how did the academics do it? There are some tools that negate some/most of the need for gate sims but at my company we still do not ship without SOME gate sims.

gluggymug · on July 26, 2015

how did the academics do it?

Tape out using these tools?

My guess is they skipped gate level and other stuff that I consider good quality (BISTs, DFT logic, power estimation etc). You either skip it or rewrite your tests in Verilog/VHDL/etc.

Static Timing Analysis etc. can only do a little of what's needed. You have to check your constraints are correct.

Academics don't know what they are doing. They probably never had to do a design flow for a real product or even touched a commercial tool. It's too expensive. That's why they make these other tools. And that's why these tools kinda suck at getting to a finished product.

trsohmers · on July 26, 2015

@gluggymug

You do realize you can do all of that simulation after you generate your verilog and put it through your RTL compiler?

trsohmers · on July 26, 2015

@gluggymug (2nd... don't know why the 'reply' button doesn't appear below your comment)

Yep. Check out this https://github.com/ucb-bar/rocket-chip ... there are plenty of testing options you can do, including waveform vcd testing.

EDIT: Also, relevant paper: http://www-inst.eecs.berkeley.edu/~cs250/fa11/handouts/lab3-... ... check out the part regarding test harness

gluggymug · on July 26, 2015

(Yeah you can't reply straight away)

Looking at a lot of stuff you referenced.

I think my definition of tests is a lot more generic than yours. Taking a look at your links it seems tests are written in RISC-V machine code? The code will execute some functionality of the CPU and check for an expected response then possibly through a logging IO from the chip send a pass/fail message to the test harness or update a status register that is polled.

They run Verilog simulation on RTL and gate-level using Synopsys VCS. (At least they didn't skip it! Which is good!)

This is a very specific way of doing stuff that really only applies to this particular core with very little I/O.

We don't do it this way in industry. (This is where a throw a drink in the face of whichever Berkley academic that put this crap together.) The last time I saw something like this was maybe 1999-2000. It is terrible. Usually testing is done at the interfaces.

A real-world design has lots of I/O. For me, a test generically should stimulate the inputs to the design as well as have code for the CPU if it exists. Checkers check the outputs of the design for correct behavior. These stimuli and checkers are written in the verification language of your choosing. This is the standard approach. More checkers peek into signals inside the design itself usually.

If that stuff is written in Chisel so it can be simulated by the C++, then you are screwed for the RTL and gate-level. This is what I am saying.

The RISC-V test harness pretty much only has a clock and reset for inputs, Some sort of host bus. That's it! And the tests are assembly code or whatever.

If your design is not a CPU, what would your test be? say it's a HW cryptography encoder thing, no code to load into a memory.

gluggymug · on July 26, 2015

How do I get my test stimulus into my simulator? And where is my checking?

Hypothetically I wrote my tests and checkers in Chisel. I now want to run a gate level simulation of my tests. I asked the same thing to the Clash guys. Can it do that?

stephenmm · on July 26, 2015

It seems like you would need some sort of formal equivalence checking if that is the flow. Is there such a thing for Chisel?

cfelton · on July 27, 2015

That is very dangerous not verifying, in some means, that each of the formats are functionally identical. One could skip the Cosimulation in MyHDL as well. This has nothing to do with the so called abstraction level of chisel.

cmrx64 · on July 25, 2015

I've been playing with this for the past few weeks. It's a lot of fun compared to Verilog or VHDL. The implementation is still young, though, and it's pretty easy to trigger an undecipherable crash. But I haven't had anything miscompile yet.

ci5er · on July 26, 2015

I'm trying to get the Amulet1 (Async/unclocked ARM from Manchester, back in the day) working as a test project. This, of course, requires Muller C-gate stuff. I'm having a bit of trouble getting it all to work out.

Have you seen any obvious techniques/methods for unclocked circuits that I am probably looking right past?

minthd · on July 25, 2015

They claim this language is highly parameterized. I wonder how much.

For example can you design a generic processor in such a way that everything will be parameterized?

rjsw · on July 25, 2015

They use this language for RISC-V, if you look at the sources you can see the kind of things that can be parameterized.

bra-ket · on July 26, 2015

it's used by Rex: http://www.theplatform.net/2015/07/22/supercomputer-chip-sta...

trsohmers · on July 26, 2015

Hey, CEO of REX here... would be happy to answer questions.

P.S. We're hiring Chisel developers! If you don't know chisel, but want to learn and have RTL experience, we'd love to have you learn on the job! Check out our website: http://rexcomputing.com

stephenmm · on July 26, 2015

Chisel is potentially a revolution in hardware design and I am following it intently but I have not heard of anyone creating an actual chip from a standard fab as of yet. I am trying to make an argument for trying this at my work but I think until we get some more feedback from people in industry it may be too risky of an endeavour. It would be great if you would be willing to share your experiences or know of some papers that would help me build an argument for trying it out.

trsohmers · on July 26, 2015

UC Berkeley has taped out ~10-12 chips entirely designed using Chisel through the standard flow and fabed at TSMC (As low at 28nm) and all have functioned. My start up has had minimal problems with using Chisel and going through both Cadence and Synopsys tools (most if not all the problems were user error :P)

Once we get closer to having silicon in hand, I'd love to publish our experience as both a startup making a new processor in this day and age, along with using Chisel and other new tools.

stephenmm · on July 26, 2015

So is the idea that the compiler would try and optimize it so that both code and data would be kept local in the scratchpad memory and if there was a scratchpad "miss" the cores would DMA the needed memory locations from DRAM to the scratchpad?

trsohmers · on July 26, 2015

DRAM, or preferably a closer core. The memory on chip is all physically addressed, and part of a flat global address space. The first 128KB of the address space is core 0's memory, then the next 128KB is core 1's, and so on to core 255. When a core accesses a memory region not in its own local scratchpad, it hops along the network on chip (with one cycle per hop) to get to the core which has the needed memory address. The compiler would try to keep the needed data by a core in that cores local scratchpad, or if it can't, as close as possible. Even in the worse case scenario where a core needs to access the memory in the opposite corner (Core 0 accessing core 255), it is still only 32 cycles to access it (less than the ~40 cycles it takes to access L3 cache on an Intel chip).

The NoC is also entirely non blocking... a router is able to read/write to its cores scratchpad and do a passthrough in the same cycle.

szabba · on July 26, 2015

I'm a layman, but the information on your website reminds be both of what GreenArrays are doing and of the 80's INMOS transputers. People might want to know how those compare with what you're working on.

Skooma · on July 26, 2015

What differentiates your company's Neo Chip vs Adapteva's Epiphany co-processor?

trsohmers · on July 26, 2015

A number of things... The first thing is that Neo is not a coprocessor, it is a fully independent many core processor. To quickly go over the basics:

1. Neo has a 64 bit core, and conforms the IEEE 754-2008 Floating Point standard... Epiphany is 32 bit, and is not fully IEEE compliant (along with only being capable of single precision FP).

2. The existing Epiphany chips cap out at 32KB of local memory per core (with the Epiphany IV having a total of 2MB of on chip memory), while the planned Neo chip will have 128KB of local memory per core (32MB of on chip memory).

3. Epiphany is limited to using it's 4 eLink (based on ARM's AXI interface) connectors to access the outside world, and would typically be connected to either other Epiphany chips or to its host processor. Each eLink port only supports 1.6GB/s bidirectional traffic, giving a total of 6.4GB/s of aggregate chip bandwidth. For Neo, we have developed a new 96GB/s (bidirectional, 48GB/s each way) interface with either 3 or 4 interfaces per chip, giving an aggregate chip-to-chip bandwidth of 288-384GB/s.

4. Neo can directly address DRAM attached to it, instead of having to go through a host processor.

5. Neo is a Quad issue VLIW core (capable of a 64 bit ALU op, 1 64 bit FPU op/2 32 bit FPU ops, and 2 load/store ops every cycle) compared to Epiphany's standard superscalar core (Capable of 1 32 bit ALU op, 1 32 bit FPU op, and 1 load/store op per cycle).

All of this adds up to actually being a commercially viable (for industry, not hobbyists) processor. Above all, memory bandwidth has been what kills Epiphany and completely prevents it from reaching their advertised performance.