Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Honest question: how well do HLS tools actually work in practice?

Maybe this is completely wrong, but I have this gut instinct that they must be extremely brittle. That's based on being a CPU/GPU compiler person and knowing, for example, how many difficult problems are still out there for automatic GPU code generation. I find it really hard to imagine that with all the additional challenges you get with FPGAs, that you'd be able to make this really work for anything beyond the most bare-bones examples. (But I'll admit that I don't really know what I'm talking about.)

Anyway, it's good that it's open source. Hopefully more of the magic will be open to public inspection now.



For many things I would take HLS over writing SystemVerilog. My background is software, but I've written a modest amount SystemVerilog professionally in recent years. While there are specific cases where I'd still pick SystemVerilog (building an AXI peripheral with a specific address space layout or unique semantics for that address space), I'd tend to prefer HLS for anything actually dealing with complex data processing. Think packet pipelines, sorts, database joins, index processing, etc. If you were already in the mindset of thinking about streaming data through a pipeline (as you might in SystemVerilog) you can write fairly boring looking C++ with ordinary unit tests and the HLS tooling will do a reasonable job of turning it into equivalent pipelined Verilog.


Totally agree. Certain signal processing pipelines that FPGAs are often targeted at map very well to HLS, often with 10x or more code reduction. Production SoCs sometimes use similar tech for less power/performance critical blocks. Mentor graphics catapult HLS marketing material contains some note that NVENC on the maxwell generation of Nvidia GPUS was coded in HLS.


Lookup tables are the big thing that break CPU and GPU programs since at relatively small sizes you'll hit the memory wall. A lot of algorithms like media codecs (audio, video) and pretrained neural nets are trivial to design with HLS and have a major performance advantage over using a CPU or GPU.

Add in the fact that CPUs and GPUs are expensive as fuck, in short supply, demand high power, while FPGAs are pretty cheap and efficient since they don't do much you're going to find a lot of designs that call for them over a beefy and expensive chip.

That said, arm and risc V cores are dirt cheap. If your algorithm isn't hitting the memory wall it probably isn't going to be faster via HLS.


I do not agree, that FPGAs are cheap. Nothing really efficient under 100€ unless you are Subaru buying millions of them. One can get lots of assembled interesting SoC modules for 100€.


I've seen decent FPGAs for 500€. That's "cheap" when the competitors are 1000€/month GPUs.


There's domains where they work pretty well (that tend to look like DSP pipelines). But they're for sure nothing like a panacea opening the world of reconfigurable hardware to software developers at the same parity as HDLs.


For DSPish type dataflowy applications HLS can help boost productivity. But if you were, say, designing a processor it would only get in your way.


The company I'm working for is only using HLS since quite some time, no directly written Verilog etc. anymore. I myself am a software developer in this company.

I know it's working perfectly fine for us and even the strongest advocate of using Verilog directly was converted by now to only using HLS.

I got it explained like this: HLS has it's own class of problems and you definitively need some time to get used to how you need to write code that is actually synthesized in a way you intended it to be. However, once you got used to it, development using HLS is way faster than writing Verilog directly and our FPGA guys basically said that we would not be where we are if it weren't for HLS because of this.

So it seems that it's actually working quite fine in practice but obviously not without it's own problems.


See somewhat recent thread on xls:

https://news.ycombinator.com/item?id=24354083


I know it's replaced some vhdl I wrote a few years back for shuffling for data onto an ftdi usb chip.

Same performance but the HLS made a smaller footprint




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: