more pornel's comments | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit | more pornel's comments

login

pornel 12 days ago | parent | context | [–] | on: A flowing WebGL gradient, deconstructed

Mixing of colors in an "objective" way like blur (lens focus) is a physical phenomenon, and should be done in linear color space.

Subjective things, like color similarity and perception of brightness should be evaluated in perceptual color spaces. This includes sRGB (it's not very good at it, but it's trying).

Gradients are weirdly in the middle. Smoothness and matching of colors are very subjective, but color interpolation is mathematically dubious in most perceptual color spaces, because √(avg(a+b)) ≠ avg(√(a) + √(b))

pornel 13 days ago | parent | context | | [–] | on: Hacktical C: practical hacker's guide to the C pro...

Less controversially, when you write C, you write for a virtual machine described by the C spec, not your actual hardware.

Your C optimizer is emulating that VM when performing symbolic execution, and the compiler backend is cross-compiling from it. It's an abstract hardware that doesn't have signed overflow, has a hidden extra bit for every byte of memory that says whether it's initialized or not, etc.

Assembly-level languages let you write your own calling conventions, arrange the stack how you want, and don't make padding bytes in structs cursed.

bmandale 13 days ago | | [–]

These are all such nonsensical misinterpretations of what people mean when they say C is "low level". You absolutely don't write C for the C abstract machine, because the C spec says nothing about performance, whereas performance is one of the primary reasons people write C.

The existence of undefined behaviour isn't proof that there is a C "virtual machine" that code is being run on. Undefined behaviour is a relaxation of requirements on the compiler. The C abstract machine doesn't not have signed overflow, rather it allows the compiler to do what it likes when signed overflow is encountered. This is originally a concession to portability, since the common saying is not that C is close to assembly, but rather that it is "portable" assembler. It is kept around because it benefits performance, which is again one of the primary reasons people write C.

pornel 13 days ago | | | [–]

I'm not trying to prove a novel concept, just explain how the C spec thinks about C:

> The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

This belief that C targets the hardware directly makes C devs frustrated that UB seems like an intentional trap added by compilers that refuse to "just" do what the target CPU does.

The reality is that front-end/back-end split in compilers gave us the machine from the C spec as its own optimization target with its own semantics.

Before C got formalised in this form, it wasn't very portable beyond PDP. C was too opinionated and bloated for 8-bit computers. It wouldn't assume 8-bit bytes (because PDP-11 didn't have them), but it did assume linear memory (even though most 16-bit CPUs didn't have it). All those "checking wetness of water... wet" checks in ./configure used to have a purpose!

Originally C didn't count as an assembly any more than asm.js does today. C was too abstract to let programmers choose addressing modes and use flags back when these mattered (e.g. you could mark a variable as `register`, but not specifically as an A register on 68K). C was too high level for tricks like self-modifying code (pretty standard practice where performance mattered until I-cache and OoO killed it).

C is now a portable assembly more because CPUs that didn't fit C's model have died out (VLIW) or remained non-standard specialized targets (SIMT).

pjmlp 13 days ago | | | | [–]

C performance exists thanks to UB, and the value optimising compilers extract out of it, during the 8 and 16 bit home computers days any average Assembly developer could write better code than C compiler were able to spit out.

codr7 13 days ago | | | [–]

And also because it doesn't get in your way of doing exactly what you want to do.

OCASMv2 13 days ago | | | [–]

If that was true then the optimizers wouldn't need to exist in the first place.

codr7 13 days ago | | | [–]

Compared to the alternatives.

It gets very frustrating to communicate at this level.

pjmlp 13 days ago | | | [–]

The alternatives outside Bell Labs were just as capable.

OCASMv2 13 days ago | | | | [–]

I don't think compilers allowing trash through is a good thing.

codr7 12 days ago | | | [–]

That's an opinion, another one would be that the flexibility allowed by undefined behavior is one of C's strengths at the same time. Strength/weakness are often two sides of the same coin. Which is why these discussions get a bit circular.

greenavocado 12 days ago | | | | [–]

Have you ever seen the error steamroller? https://github.com/mattdiamond/fuckitjs

dwattttt 12 days ago | | | [–]

ON ERROR RESUME NEXT rears its ugly head again

aninteger 13 days ago | | | [–]

> Less controversially, when you write C, you write for a virtual machine described by the C spec, not your actual hardware.

Isn't this true for most higher level languages as well? C++ for instance builds on top of C and many languages call into and out of C based libraries. Go might be slightly different as it is interacting with slightly less C code (especially if you avoid CGO).

pornel 14 days ago | parent | context | | [–] | on: Problems with Go channels (2016)

Channels are only problematic if they're the only tool you have in your toolbox, and you end up using them where they don't belong.

BTW, you can create a deadlock equivalent with channels if you write "wait for A, reply with B" and "wait for B, send A" logic somewhere. It's the same problem as ordering of nested locks.

pornel 14 days ago | parent | context | | [–] | on: Tesla Releases Stripped RWD Cybertruck: So Much Wo...

The point of the design wasn't to be pretty, but to be different, edgy, and look rugged.

But they made a mistake by announcing the design long before they were able to produce it. By the time they had built it, the hype cycle was over, and the design was already old news.

It was also supposed to be bulletproof (presumably looking like a tank), but in reality, it turned out to be a brittle underbody with glued-on panels that were peeling off.

At the $35K starting price that Elon hyped, it could have been excused as a utilitarian design. But Tesla instead released a beta-quality product at a luxury price.

The revolutionary new cheap Tesla batteries that were supposed to make that price point possible turned out to be as real as all the other stuff Elon promised.

asadotzler 14 days ago | | [–]

Not really. The point of the design was a single sheet of heavy steel folded origami style into an exoskeleton that the powertrain, suspension and other bits all hanged off of.

That completely novel design would have been neat to see, and perhaps worth the required aesthetic. But that was not to be.

They dropped the origami single steel panel, then they abandoned the whole idea of an exoskeleton design and opted not for a body on frame like all good pickup trucks, but a unibody that puts them in the Ford Maverick or Hyundai Santa Cruz category, light duty, but at 4 times the price with a pretty awful aesthetic, and from a company whose owner has become a pariah among decent people.

disqard 14 days ago | | | [–]

> a company whose owner has become a pariah among decent people

That's the "cherry on top" of this dung-heap.

throwaway48476 14 days ago | | | [–]

In 50 years cybertrucks are going to be worth something, just because they're different enough to be recognizable. Same as the DeLorean.

agildehaus 14 days ago | | | [–]

The DMC-12 has a classic movie attached to it. If it didn't, most every single one of them that exists today would be in the scrap heap.

throwaway48476 14 days ago | | | [–]

Instead of a movie the cyberpunk is attached to a highly polarizing political conflict, which ignoring the nuance, is highly memorable.

woodruffw 14 days ago | | | [–]

I hazard you that there's little to no market for things that are solely "highly memorable," especially politically polarizing ones. Most people don't want to own a Hitler Youth knife, for example.

throwaway48476 14 days ago | | | [–]

There is actually a market for HJ knives.

woodruffw 14 days ago | | | [–]

Sure; emphasis on "most people."

Gud 14 days ago | | | [–]

But in this case, what’s interesting is if there is a market, not what “most people” will do.

I predict that in 50 years, the cyber trucks in mint condition will be worth a decent amount of money. It is the first of its kind(electric SUV).

It’s still a crappy car.

beloch 14 days ago | | | | [–]

A better comparison would be the AMC Gremlin.

The DeLorean was not a particularly good car, but it featured prominently in a movie series that remains a cultural touchstone.

The AMC Gremlin was just endearingly ugly. Today, it has a bit of a cult following, but that has not made it a particularly valuable car.

pornel 14 days ago | | | [–]

This style is coming back — Rivian R3X looks eerily similar, just with more even proportions.

rchaud 14 days ago | | | | [–]

Teslas are more locked down software-wise than iOS, so their worth in half a century should be around whatever scrap metal is going for then.

asadotzler 14 days ago | | | [–]

I hear ya, but I think 50 years is wildly generous. I give it 20 years before they're entirely gone except for a few novelty chasers. It's the Yugo of the 21st century, except at luxury prices, actually, then maybe like combining a Yugo with the Cadillac Cimarron or the Pontiac Aztec. Garbage in every way that won't survive a human generation.

pengaru 14 days ago | | | | [–]

> In 50 years cybertrucks are going to be worth something, just because they're different enough to be recognizable. Same as the DeLorean.

Nobody would know anything about the Delorean if it weren't for its starring role in one of the most famous movie franchises ever.

Ekaros 14 days ago | | | | [–]

I wonder will they run or be fixable in 50 years. Not just the big battery, but every other such component too. In the end DeLorean is relatively simple car and you could even install different engine in it if you have to.

loloquwowndueo 14 days ago | | | | [–]

Hardly will be worth more than just plonking that $ into an investment account and letting it accrue interest for 50 years.

throwaway48476 14 days ago | | | [–]

Of course, most collectible categories underperform the market. But people like to collect things.

slashdave 14 days ago | | | | [–]

DeLoreans are rare (only 9,000 were built).

pornel 15 days ago | parent | context | | [–] | on: Shadertoys Ported to Rust GPU

Lifetime tracking isn't just for safety or resource management. It also helps write correct code, especially parallel one where shared vs mutable matters.

Unit testing of shaders is usually a pain. Traditionally they're a black box without any assert() equivalent, and you can at best try to propagate NaN and generate magenta pixels on failure. Sharing Rust code lets you unit-test parts of it on the CPU.

Sharing of data structures between CPU and GPU is nice too. WGSL is superficially similar to Rust, but using plain WGSL requires maintaining bindings and struct layouts by hand, which is a chore.

For CUDA, the alternative is C++. On the upside that's the official first-class API for CUDA, but the downside is that it's C++. With Rust you don't have the legacy language cruft, nor busywork like header files or makefiles. You get working dependency management, and don't get caught in the unix vs Windows schism.

exDM69 15 days ago | | [–]

> Traditionally they're a black box without any assert() equivalent

Thankfully these days we have printf in shaders that you can use for "asserts". You can detect if the shader printed anything and consider it a failure.

You can even add a conditional print in your pixel shader, run your app in renderdoc and find the pixel(s) that printed something. Once you find one, you can step through it in the shader debugger.

This seemingly simple feature is a huge time saver.

Const-me 15 days ago | | | [–]

> Sharing of data structures between CPU and GPU is nice too

How they did it? Hard to do because GPU hardware can convert data types on the fly, e.g. you can store bytes in VRAM, and convert them to 32-bit floats in [ 0 .. +1 ] in the shader. GPUs can do that for both inputs (loaded texture texels, loaded buffer elements, vertex attributes) and outputs (rendered pixels, stored UAV elements).

exDM69 15 days ago | | | [–]

If you are using plain buffers the GPU and the CPU access data pretty much exactly the same way. With scalar block layout all the alignments are pretty much the same too.

To get the format conversion stuff you talk about, you need to use images, vertex input or texel buffers and configure the format conversion explicitly.

It's a good question how much of these conversions are actually done by GPU hardware and how much of it is just software (which you could write yourself in a shader and get same perf). I have not seen an apples to apples benchmark about these format conversions.

Const-me 15 days ago | | | [–]

> If you are using plain buffers the GPU and the CPU access data pretty much exactly the same way

Yeah, that will work fine for byte address buffers, to lesser extent constant buffers (they don’t convert data types but the access sematic and alignment are a bit tricky), but not much else. Vertex buffers, textures, and texel buffers / typed buffers in D3D are all widely used in real-time graphics.

> which you could write yourself in a shader and get same perf

Pretty sure it’s hardware. Emulating anisotropic texture sampler with HLSL codes would need hundreds of instructions, prohibitively expensive. Even simpler trilinear sampling is surprisingly tricky to emulate due to these screen-space partial derivatives on input.

> I have not seen an apples to apples benchmark about these format conversions.

Here’s a benchmark for vertex buffers https://wickedengine.net/2017/06/should-we-get-rid-of-vertex... As you see, on AMD GCN4 he indeed measured pretty much the same perf, however on nVidia Maxwell vertex buffers were 2-4 times faster.

exDM69 14 days ago | | | [–]

> Yeah, that will work fine for byte address buffers, to lesser extent constant buffers (they don’t convert data types but the access sematic and alignment are a bit tricky), but not much else.

This is where sharing the CPU and GPU side struct declaration is helpful. With scalar block layout (VK_EXT_scalar_block_layout in Vulkan, not sure how about d3d land) you don't even need to worry about alignment rules because they're the same for GPU and CPU (just make sure your binding base address/offset is aligned).

> Vertex buffers, textures, and texel buffers / typed buffers in D3D are all widely used in real-time graphics.

Of course. You don't get to share "structs" here between CPU and GPU here transparently because you need to program the GPU hardware (vertex input, texture samplers) to match.

There are some reflection based trickery that can help here but rust-gpu afaik doesn't do that. I've seen some projects use proc macros to generate vertex input layout config for GL/Vulkan from Rust structs with some custom #[attribute] annotations.

> Pretty sure it’s hardware.

Now this is just guessing.

> Emulating anisotropic texture sampler with HLSL codes would need hundreds of instructions...

Texture sampling / interpolation is certainly hardware.

But the conversion from rgba8_unorm to rgba32f, for example? Or r10g10b10a2?

I've not seen any conclusive benchmark results that suggest whether it's faster to just grab these from a storage buffer in a shader and do the few arithmetic instructions or whether it's faster to use an texel buffer. Images are a different beast entirely due to tiling formats (you can't really memory map them so the point of sharing struct declarations is irrelevant).

> Here’s a benchmark for vertex buffers

I am familiar with this benchmark from 8 years ago, which is highly specific to vertex buffers (and post transform cache etc). It's a nicely done benchmark but it has two small flaws in it: the hw tested is quite old by now and it doesn't take into account the benefit of improved batching / reduced draw calls that can only be done with custom vertex fetch (so you don't need BindVertex/IndexBuffer calls). It would be great if this benchmark could be re-run with some newer hw.

But this benchmark doesn't answer the question whether the typed buffer format conversions are faster than doing it in a shader (outside of vertex input).

> however on nVidia Maxwell vertex buffers were 2-4 times faster.

The relevant hardware got revamped in Turing series to facilitate mesh shaders, so can't extrapolate the results to present day hardware.

Fwiw. I've been using custom vertex fetch with buffer device address in my projects for a few years now and I haven't noticed adverse performance implications on any hw I've used (Intel, NV and AMD). But I haven't done rigorous benchmarking that would compare to using vertex input stage.

I'm not using rust-gpu for shaders at the moment, but if I was, it would be helpful to just use the same struct declarations. All my vertex data, instance data, constant buffers and compute buffers are an 1:1 translation from Rust to GLSL struct declarations which is just redundant work.

Const-me 14 days ago | | | [–]

> This is where sharing the CPU and GPU side struct declaration is helpful

Indeed, but sharing code between CPU and shaders is not the only way to solve the problem. I wrote a simple design-time tool which loads compiled shaders with shader reflection API, and generates a source file with C++ header (or for other projects C# structures) with these constant buffers. At least with D3D11, compiled shaders have sufficient type and memory layout info to generate these structures, matching memory layout by generating padding fields when necessary.

> not sure how about d3d land

Pretty sure D3D11 doesn’t have an equivalent of that Vulkan extension. Not sure about D3D12 though, only used the 12 briefly.

> I've been using custom vertex fetch with buffer device address in my projects

In my projects, I sometimes use a lot of non-trivial input layout features. Sometimes I need multiple vertex buffers, e.g. to generate normals on GPU with a compute shader. Sometimes I need instancing. Often I need FP16 or SNORM/UNORM vertex attributes, like RG16_UNORM for octahedron-encoded normals.

pjmlp 15 days ago | | | [–]

Just as an example the way Vulkan lifetime work depending on the resource group and associated semaphores, doesn't fit neither with Rust affine types, nor with RAII, hence so many don't make use of the C++ RAII handles on the Vulkan SDK.

pornel 15 days ago | parent | context | | [–] | on: Shadertoys Ported to Rust GPU

https://www.scratchapixel.com

https://www.youtube.com/c/acerola_t

https://www.shadertoy.com

https://google.github.io/tour-of-wgsl/

pornel 15 days ago | parent | context | | [–] | on: Air pollution fell substantially as Paris restrict...

But this also means that almost all of the wear is from trucks.

This whole meme comes from junk science (https://www.emissionsanalytics.com/news/pollution-tyre-wear-...)

> we found that the car emitted 5.8 grams per kilometer of particles. Compared with regulated exhaust emission limits of 4.5 milligrams per kilometer, the completely unregulated tyre wear emission is higher by a factor of over 1,000.

They took plastic shedded by a gas car on non-EV tires, and compared it by weight to safety limits for gaseous emissions. This makes as much sense as saying that a lump of coal has 1,000 times more carbon than the safety limits for carbon monoxide.

jodrellblank 15 days ago | | [–]

That doesn't pass a sniff test; emitting 5 grams of tyre rubber per kilometer, a 12Kg tyre would be completely vanished in 3000km but really they can last 60,000km with only the tread worn down beyond safe levels and the bulk of the tyre still there.

hugh-avherald 15 days ago | | | [–]

Looks like mg was changed to a g, though I'd suggest even then the estimate is too high (but probably to the correct order of magnitude).

rowanG077 15 days ago | | | [–]

What meme? The article you linked talks about tire wear. Not road wear. I didn't even touch on tire wear. Road wear is well studied. https://en.wikipedia.org/wiki/Fourth_power_law

I really can't place your comment, you simply start talking about something completely unrelated to what I was talking about.

pornel 15 days ago | parent | context | | [–] | on: Air pollution fell substantially as Paris restrict...

Heavy particles and gaseous emissions are not comparable in such a simplistic way. If you take a dump on the street it doesn't mean you caused 50 million times more emissions than the EPA limits for ICE car exhaust.

For example, iron from brakes is heavy but ecologically pretty harmless. OTOH NO₂ weighs almost nothing, but is toxic. You can eat 30mg of iron per day to stay healthy (just don't lick it off the asphalt directly), but a similar amount of NO₂ would be lethal.

Heavy particles don't stay in the air for long, and don't get easily absorbed into organisms. OTOH gaseous emissions and small particulates from combustion can linger in the air, and can get absorbed into the lungs and the bloodsteam.

jijijijij 15 days ago | | [–]

Yeah, but brakes are not not made from pure iron and you won't have atomic erosion. Silly argument, really. Notoriously, you could still find brake pads with asbestos not too long ago. Pretty much any fine dust is very unhealthy to inhale, but brakes and tires are made from material mixes you really don't want to breath in. Even the "inert" fraction we find as microplastics in everything, the rain, fish and newborn, and we're only beginning to understand their biological reactivity and long term health consequences.

Ericson2314 15 days ago | | | [–]

the break pad and tire particles in question are not so large they precipitate immediately. They aren't iron but rather real/synthetic rubber and other organics. There is research on them being bad for human health.

pornel 15 days ago | parent | context | | [–] | on: The Bitter Prediction

AI will be cheap to run.

The hardware for AI is getting cheaper and more efficient, and the models are getting less wasteful too.

Just a few years ago GPT-3.5 used to be a secret sauce running on the most expensive GPU racks, and now models beating it are available with open weights and run on high end consumer hardware. Few iterations down the line good-enough models will run on average hardware.

When that Xcom game came out, filmmaking, 3D graphics, and machine learning required super expensive hardware out of reach of most people. Now you can find objectively better hardware literally in the trash.

cardanome 15 days ago | | [–]

I wouldn't be so optimistic.

Moore's law is withering away due to physical limitations. Energy prices go up because of the end of fossil fuels and rising climate change costs. Furthermore the global supply chain is under attack by rising geopolitical tension.

Depending on US tariffs and how the Taiwan situation plays out and many other risks, it might be that compute will get MORE expensive in the future.

While there is room for optimization on the generative AI front we are still have not even reached the point were generative AI is actually good at programming. We have promising toys but for real productivity we need orders of magnitude bigger models. Just look how ChatGPT 4.5 is barely economically viable already with its price per token.

Sure if humanity survives long enough to widely employ fusion energy, it might become practical and cheap again but that will be a long and rocky road.

pornel 15 days ago | | | [–]

LLMs on GPUs have a lot of computational inefficiencies and untapped parallelism. GPUs have been designed for more diverse workloads with much smaller working sets. LLM inference is ridiculously DRAM-bound. We currently have 10×-200× too much compute available compared to the DRAM bandwidth required. Even without improvements in transistors we can get more efficient hardware for LLMs.

The way we use LLMs is also primitive and inefficient. RAG is a hack, and in most LLM architectures the RAM cost grows quadratically with the context length, in a workload that is already DRAM-bound, on a hardware that already doesn't have enough RAM.

> Depending on US tariffs […] end of fossil fuels […] global supply chain

It does look pretty bleak for the US.

OTOH China is rolling out more than a gigawatt of renewables a day, has the largest and fastest growing HVDC grid, a dominant position in battery and solar production, and all the supply chains. With the US going back to mercantilism and isolationism, China is going to have Taiwan too.

joshjob42 15 days ago | | | | [–]

Costs for a given amount of intelligence as measured by various benchmarks etc has been falling by 4-8x per year for a couple years, largely from smarter models from better training at a given size. I think there's still a decent amount of headroom there, and as others have mentioned dedicated inference chips are likely to be significantly cheaper than running inference on GPUs. I would expect to see Gemini Pro 2.5 levels of capability in models that cost <$1/Mtok by late next year or plausibly sooner.

jamil7 15 days ago | | | | [–]

I think there’s a huge amount of inefficiency all the way through the software stack due to decades of cheap energy and rapidly improving hardware. I would expect with hardware and energy constraints that we will need to look for deeper optimisations in software.

pornel 15 days ago | parent | context | | [–] | on: Rust to C compiler – 95.9% test pass rate, odd pla...

It could fail if the generated C code triggered Undefined Behavior.

For example, signed overflow is UB in C, but defined in Rust. Generated code can't simply use the + operator.

C has type-based alias analysis that makes some type casts illegal. Rust handles alias analysis through borrowing, so it's more forgiving about type casts.

Rust has an UnsafeCell wrapper type for hacks that break the safe memory model and would be UB otherwise. C doesn't have such thing, so only uses of UnsafeCell that are already allowed by C are safe.

FractalFir 15 days ago | | [–]

I have workarounds for all "simple" cases of UB in C(this is partially what the talk is about). The test code is running with `-fsantize=undefined`, and triggers no UB checks.

There are also escape hatches for strict aliasing in the C standard - mainly using memcpy for all memory operations.

cryptonector 15 days ago | | | [–]

> It could fail if the generated C code triggered Undefined Behavior.

> For example, signed overflow is UB in C, but defined in Rust. Generated code can't simply use the + operator.

Obviously, yes, but it could generate overflow checks.

bregma 15 days ago | | | [–]

Wait until you find out how unsafe software written in the machine language that Rust usually transpiles to is.

adwn 15 days ago | | | [–]

That's not the same, and not what pornel is talking about. The x86 ADD instruction has a well-defined behavior on overflow, and i32 + i32 in Rust will usually be translated to an ADD instruction, same as int + int in C. But a C compiler is allowed to assume that a signed addition operation will never overflow (the dreaded Undefined Behavior), while a Rust compiler must not make that assumption. This means that i32 + i32 must not be translated to int + int.

For example, a C compiler is allowed to optimize the expression a+1<a to false (if a is signed), but a Rust compiler isn't allowed to do this.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact