More

jfindley · 2025-03-18T15:13:05 1742310785

There's a lot of rumours flying around that HL3 might be coming soonish, and when it arrives it'll almost certainly support RTX. It's also possible that it'll be built on an evolution of the HL2 engine. If so, it might have been seen as a useful development and marketing exercise to backport the RTX changes to the older HL2 engine version.

jfindley · 2024-09-10T13:01:27 1725973287

This is from 2017, errors.Join did not exist at the time. But yes, today you'd do it differently.

jfindley · 2024-09-04T11:31:47 1725449507

As long as that very description is spelled out clearly near the top of the job ad, it doesn't matter terribly much (within reason) what you call it - job searchers will try various different strings to find it. Personally, I'd call it an ops role, however.

In general, most of the issues I've seen with these sorts of roles aren't the naming of them but rather the third bullet in your list. Having an ops role that's responsible for all the problems with software they didn't build and weren't allowed to have meaningful input to the design/implementation of isn't healthy. It sucks up their time and energy fighting fires they didn't create and likely aren't really empowered to fix. This is the problem with this role (as often implemented) that Rachel talks about in her first paragraph.

If you really care about having a good and reliable product you need people involved with the design and implementation that are deeply invested in making it reliable and maintainable in the long term - which means either making your dev roles shoulder some of the oncall burden or having people that straddle the ops and dev teams. Or both. If you're having difficulty filling this role, perhaps this is the real problem?

joshstrange · 2024-09-04T11:43:29 1725450209

I haven’t started trying to hire for this position yet, it’s been something I’ve been thinking of for a while though. Right a small number of developers, myself included, are on call and I’d like to reduce that a bit or just share the burden.

My goal isn’t to throw crap over the fence and say “make it work” but rather empower someone to make maintaining and growing our platform their main goal. The developers (again, myself included) are not great at the ops side of things and can rarely focus on the infrastructure itself due to other priorities (yes, we can talk about how that itself is an issue). If I could clone myself and one of specialize in ops and the other on programming for the platform I would in a heartbeat.

Infra/Ops and programming are two different mindsets (much like managing people or qa differs from writing code). Switching between them is hard and you pay a penalty to do so. Not to mention there are skills (networking is high on that list) that I’m not good at. I can scrape by but that’s not where my skills lie. That’s why I’d like to hire someone who is good at it, who _does_ enjoy it, and who push for changes from a ops perspective that I can’t due to time or skill.

jfindley · 2024-09-04T12:18:10 1725452290

> Infra/Ops and programming are two different mindsets [...]

HARD disagree on this. I've done both. Most of the really excellent programmers I've worked with have, at least a little. You can't write highly reliable networking software without a deep understanding of how networking actually works. You can't write highly performing software without a deep understanding of the infrastructure and hardware it's running on. And so on.

I'm not trying to bersmirch yours or your teams abilities here - if you're writing in a high level language and most of your challenges are implementing biz logic then not knowing very much about the underlying infrastructure and hardware is fine, you aren't trying to write a distributed RDBMS, you probably don't need to know this stuff.

But do remember that there are lots of people for whom the hardware, the infrastructure and the application they're writing are inextricably linked. It's not a different mindset, it's just people with additional skills you haven't needed to learn yet.

joshstrange · 2024-09-04T12:38:20 1725453500

Note that I said mindset not skillset.

I have zero doubt that I could do an Ops job well, what I can’t do is switch between writing business logic and maintaining servers at the drop of a hat. Similar to how I can’t go from QA to engineer without a context switch/penalty.

As I said elsewhere in this thread if I could clone myself and do both roles I would in an instant. But if I have to pick one I’d pick writing code (as my primary thing), not saying that Ops doesn’t write code, just not the same type of code.

attendant3446 · 2024-09-05T14:30:06 1725546606

For me it's very similar, I was in dev, now I'm in ops and I can easily switch back. But to think that all good programmers can do infra is a falsehood.

I've met so many developers who don't even know how computers really work. They're good at a particular tech stack and do their job very well, but they can't do much else. Let alone infra.

Personally, I agree that it's two different mindsets, but sometimes they can overlap.

jfindley · 2024-07-23T10:56:28 1721732188

The months of R&D to create a workaround could simply be because the subset of motherboards which trigger this issue are doing something borderline/unexpected with their voltage management, and finding a workaround for that behaviour in CPU microcode is non-trivial. Not all motherboard models appear to trigger the fault, which suggests that motherboard behaviour is at least a contributing factor to the problem.

ploxiln · 2024-07-23T18:04:06 1721757846

I think this issue was sort of cracked-open and popularized recently by this particular video from Level1Techs: https://www.youtube.com/watch?v=QzHcrbT5D_Y

Towards the middle of the video it brings up some very interesting evidence, from online game server farms that use 13900 and 14900 variants for their high single-core performance for the cost, but with server-grade motherboards and chipsets that do not do any overclocking, and would be considered "conservative". But these environments show a very high statistical failure rate for these particular CPU models. This suggests that some high percentage of CPUs produced are affected, and it's long run-time over which the problem can develop, not just enthusiast/gamer motherboards pushing high power levels.

jfindley · 2024-07-17T12:58:22 1721221102

It's definitely not working. Several of the channels I'm currently in (which are not private channels for the record) aren't listed at all, and it's not sorted by user counts at all. There are channels with hundreds of users listed, but according to the "sorting" the largest channel has 20 users.

jfindley · 2024-05-14T13:43:07 1715694187

Memory de-dup is computationally expensive, and KSM hitrate is generally much worse than people tend to expect - not to mention that it comes with its own security issues. I agree that the security tradeoffs need to be taken seriously but the realworld performance/efficiency considerations are definitely not negligeable at scale.

There are also significant operational concerns. With containers you can just have your CI/CD system spit out a new signed image every N days and do fairly seamless A/B rollouts. With VMs that's a lot harder. You may be able to emulate some of this by building some sort of static microvm, but there's a LOT of complexity you'll need to handle (e.g. networking config, OS updates, debugging access) that is going to be some combination of flaky and hard to manage.

I by no means disagree with the security points but people are overstating the case for replacing containers with VMs in these replies.

jfindley · on March 5, 2024

Generics in go, as they're implemented today, sadly have a fair bit of performance overhead. I haven't tried but my assumption would be that go generics are not fast enough to make an effective arena allocator. I'd be thrilled if someone could prove me wrong though!

throwaway894345 · on March 8, 2024

The performance overhead is when you're calling a method on the generic type--Go has to lookup the specific implementation in a dictionary. Pretty sure that doesn't apply for straight-up container use cases like this one.

jfindley · on Nov 30, 2023

High barriers to entry on the interview process don't mean as much as you may think. Even with the best interview process in the world, you're only going to have a small number of hours to try to evaluate a lot of complex factors about a human you know nothing about. You're going to hire people you shouldn't - and lots of them. You're also going to miss hiring people you should. It sucks, but that's life.

With that in mind I do think your conclusion's a little suspect - there really will be a good amount of underperforming people you really do want to part ways with. Maybe not 6% - I don't work in HR, so I don't see those sorts of metrics - but I definitely have encountered lots of people who got through the interview process but nevertheless had no ability to do the job adequately.

I'm sure a bunch of people will jump on this to then complain about the arduous interview process - but NO interview process is perfect. Having a tough process is a reasonable way to reduce the number of people you end up not keeping on, and expecting any process involving humans to be anything close to perfect is wildly unrealistic.

ghaff · on Nov 30, 2023

And for roles at companies with very quantifiable outputs--like sales for example--the approach at a lot of companies is not to sweat the hiring process too much and just let go people who don't make their numbers (whether it's really their fault or not). Someone I knew's shorthand for this was that sales managers have no trouble firing people.

HDThoreaun · on Nov 30, 2023

Right. FAANG interviews generally aren't even trying to figure out if someone will be good at their job. Leetcode tests for IQ and being willing to sink tons of hours into bs to get the job. FAANG companies have decided those are important qualities needed to succeed, but they clearly aren't the only ones.

jfindley · on Oct 19, 2023

Clock speed isn't a particularly meaningful measurement anymore, and hasn't been for years. For example, an AMD Genoa chip, depending on SKU, may have fairly comparable base/boost clock speeds compared to an Intel Sapphire Rapids - but in practice the single-core performance of the Intel is going to be substantially better for most code.

Your cache question doesn't really have a simple answer either. E.g. an AMD CPU is split into different CCXs. To simplify somewhat, each core is broken up into several smaller compute units, with their own caches and memory controller. Intel has a completely different ring-based approach that's harder to summarise in once sentence.

Overall though, for the sort of work you're describing the limiting factor is often memory bandwidth, not raw compute. Different platforms have very different membw/core figures, and I suspect if you started measuring that then you'd find it easier to predict your codes performance.

adrian_b · on Oct 19, 2023

While at equal clock frequency the Intel CPUs are a little faster in single-thread applications, their main weakness is that at equal power consumption their clock frequencies are much lower in multi-threaded applications, which leads to much lower multi-threaded performance.

This can be easily noticed when comparing the base clock frequencies, which are more or less proportional with the actual clock frequencies that will be reached in multi-threaded applications. For instance a 7950X has 4.5 GHz versus the 3.2 GHz of 14900K. Similar differences are between Epyc and Xeon and between Threadripper and Xeon W.

In desktop CPUs Intel can hide their very poor multi-threaded performance by allowing a much higher power consumption. However this method does not work for server and workstation CPUs, because these already have the highest TDP that is possible with the current cooling solutions, so in servers and workstations the bad Intel MT performance is much more visible. Intel hopes that this will change in 2024, when they will launch server and workstation CPUs made with the new Intel 3 CMOS process.

In the absence of actual benchmarks, a good proxy for the multi-threaded performance of a CPU is the product between the base clock frequency and the number of cores. For Intel hybrid CPUs, an E-core should be counted as 0.6 cores. For example a Threadripper 7960X should be expected to be (24 cores x 4.2 GHz) / (16 cores x 4.5 GHz) = 1.4 times faster than a 7950X in multi-threaded applications that are limited by the CPU cores (but twice faster in applications that are limited by the memory throughput).

sirn · on Oct 19, 2023

> However this method does not work for server and workstation CPUs, because these have already the highest TDP that is possible with the current cooling solutions, so in servers and workstations the bad Intel MT performance is much more visible

I disagree on this point. I would say this problem is much more critical on Intel's desktop platform than their workstation platform. Xeon Sapphire Rapids is actually very easy to cool, even on air, thanks to the CPU having a much larger surface to dissipate heat than their desktop equivalent.

I have Xeon w9-3495X, and while power consumption is one of its weakest points, it stays under 60°C with water cooling while I pump 500W into it (25°C ambient), of which I see between +30% to +50% gain in multithreaded performance over the default power limit. (Golden Cove needs around ~10W per core, so the default 350W/56c = 6.25W is way below its performance curve.) Noctua has also shown that they're able to achieve ~700W on U12S DX-4677[1] on this platform.

[1]: https://www.youtube.com/watch?v=dCACHpLzapc

jfindley · on Oct 19, 2023

Not only is this complete and utter rubbish, it should have been obvious from context that we're not talking about desktop CPUs. 96-core desktop CPUs are not a thing, and neither of the product families I mentioned are desktop CPUs either. I neither know nor care what the difference between those desktop cpus are, and I doubt GP does either.

Your metric about clock speed is, I'm afraid to say, so horribly oversimpified as to be flat out wrong. You can't just multiply core count by clock speed like that, as you're failing to take into account all sorts of other scaling factors such as memory bandwidth, cache size, avx support and so on, which matter as much or more than simple IPS.

jfindley · on Aug 16, 2023

In the UK, where this is happening, that's simply not true. There were 1695 road fatalities in 2022 total. That's very different from "thousands a day".

I do not believe it is desirable (or practically possible) to eliminate all risk of death or injury from life, and increasingly orwellian surveillance measures to clamp down on every misdeed and ill advised risk does not, I think, make for a happier or freer society.

elil17 · on Aug 16, 2023

The UK makes up less than 1% of the worlds population, so it does make sense that there are many more driving deaths globally than in the UK.

I can agree that complete safety is not practical or desirable.

But in our current society, simply walking down the street can get you killed by a driver. That is not freedom in my mind.

mdp2021 · on Aug 16, 2023

> That's very different from "thousands a day"

So you had us check. We hope it is now clear to you that you showed having misunderstood the poster: «thousands of people are being killed by drivers every day» does not imply "here".

But according to data,

( Road deaths over the long-term, 1900 to 2021 // Annual number of reported deaths resultant from any type of road accident. This includesvehicles, pedestrians and cyclists. - https://ourworldindata.org/grapher/road-deaths-over-the-long... )

taking 2019 (just to be sure of having more data), you are partially right: apparently not even half-a-thousand per day. But hundreds per day.

Top by Country: China: 172 ; USA: 99 ; Russia: 47 ; Turkey: 15 ; Japan: 11 ; S.Korea: 9 ; France: 9 ; Italy: 9 ; Germany: 8 ; Mexico: 8 ; Poland: 8 ; Uzbekistan: 6 ; Chile: 5 ; Kazakhstan: 5 ; Romania: 5 ; UK: 5 ; Canada: 5 ; Spain: 5 ; Australia: 3 ; Azerbaijan: 2 ...

Conclusions from those numbers, that is another matter.

jfindley · on Aug 16, 2023

Putting an "AI" camera on a road in Cornwall, UK has no ability to affect how many people are killed on China's roads, and so it's deeply silly to use those numbers to try to justify doing so. There's far less invasive things that could be done to dramatically improve road safety in many of the countries you list.

bluecalm · on Aug 17, 2023

Things like what? Traffic violence and risky behavior if a raising problem and it seems we don't have a solution for it despite many attemps. My view is that most of it comes from a small group of drivers. It's not "everyone speeds and uses their phone sometimes" but "there are people who speed recklessly and are on their phone most of the time". If you share my view then you want the camera in areas with the most traffic as the goal is to fish out and eliminate those dangerous drivers not punish sensible drivers who make mistakes sometimes.

Putting it on a major "safe" motorway is perfect. As you will have a chance to check most cars there. Being and to go at the speed limit where it seems safe (from inside the car anyway) to go faster is also the most important psychological trait for safe driving. Catching people who aren't up for that will improve safety in other places as well.

mdp2021 · on Aug 16, 2023

Never said I support the activity.

But let us keep the logic terse. There are dead because of X, and if some practice P reduces X there will be some advocacy to have it adopted around. The numbers are just relative so largely irrelevant to the advocates.

But it is best to have them right so the lower instances summoned by "Millions!" and "One per million!" - out of context if a cost-risk-benefit with quantifications in all parts is not fully construed - can be avoided.

matthewfelgate · on Aug 16, 2023

>Last year there were 48 road deaths and 738 serious injuries on roads in the two counties.

Please let me know which of these deaths you are happy with.