This may be an unpopular opinion, but I find it completely fine and reasonable that CPUs are optimized for games and weakly optimized for crypto, because games are what people want.
Sometimes I can't help but wonder how the world where there is no need to spend endless billions on "cybersecurity", "infosec" would look like. Perhaps these billions would be used to create more value for the people. I find it insane that so much money and manpower is spent on scrambling the data to "secure" it from vandal-ish script kiddies (sometimes hired by governments), there is definitely something unhealthy about it.
Bernstein agrees with you. His point isn't that it's dumb that CPUs are optimized for games. It's that cipher designers should have enough awareness of trends in CPU development to design ciphers that take advantage of the same features that games do. That's what he did with Salsa/ChaCha. His subtext is that over the medium term he believes his ciphers will outperform AES, despite AES having AES-NI hardware support.
Besides, CPUs are only optimized for games if you write your code to take advantage of it.
I worked on AAA engines that completely disregarded caches and branch prediction and while that worked great 10 years ago the same architecture became crippled on modern CPUs. Its so very easy to trash CPU resources that at this point I'm convinced most programs easily waste 90% of their CPU time.
For the last AAA title I shipped we spent weeks optimizing the threads scheduling, priorities and affinities along with profiling and whatnot; it was still an incredible challenge to use the main core above 80% and the other cores above 60%. If your architecture doesn't take the hardware into account from the ground up, you're not going to fully use the hardware :)
There is a potential counter-argument that this is because multicore is an example of a CPU feature that was not designed for videogames. :)
I don't know if that's strictly true, but I do feel games aren't as easy to split across threads/cores as many other types of software. Ultimately, a game boils down to one big giant blob of highly mutable and intertwined state (the world) that is modified very frequently with very low latency and where all reads should show a consistent view.
In other domains, avoiding interaction between different parts of your application state is a good thing that leads to easier maintenance and often better behavior. In a game, the whole point is having a rich world where entities in it interact with each other in interesting ways.
I don't think any CPU features were designed specifically for video games. But video games often have workloads that fit nicely on top of the CPU.
You don't need interactions between entities at every point in time. There are very specific sync points during a frame and there isn't many of them (maybe 5 or 6). Your units can also be organized in a table where each row can be updated in parallel before moving on to the next.
Many types of software can get away using only concurrency or parallelism. Game engines have to master both. You want parallelism to crunch your unit updates and you want concurrency to overlap your render pass on top of the next frame's update pass.
To me the whole view of the game world being a "giant blob of highly mutable and intertwined state" stems from the fact that for over a decade all games had to do was have an update and a render pass in the main loop. So you could have a big mess because all of the world updates were serial, and so the developers didn't really experience the problems arising from such design.
Now it takes a change in perspective on how to structure code and data to properly exploit multiple cores to their full potential. It is certainly possible to have interaction in the game world and do it in a multi-core multi-threaded way, it just needs smarter structures and better organisation of data.
Its not nearly as easy as it sounds. I think game engine developers now understand multi-core a LOT better than programmers from most other domains.
Games are still just an update and render pass in a loop. Nothing will change that, they might be decoupled and overlapped but they're still present. They're just much more complex than they used to be. Some engines will fork/join at every step (sometimes hundreds of times per frame), others will build task graphs and reduce them and a 3rd team will use fibers to build implicit graphs. No matter what you do you're still executing code with very clean sequential dependencies.
Game engines started going multi-core at least a decade ago. First by using one thread per system, then moving towards tasks. Today I don't know of any AAA game engine not already parallelizing its render path and unit updates massively.
> There is a potential counter-argument that this is because multicore is an example of a CPU feature that was not designed for videogames.
That's a common trope; it mostly applies to the rendering path. AI / Physic engines are nicely scalable with the number of cores (AI in particular, because a) it's oftern agent based, which naturally work concurrently and b) with AI, slighly relaxed synchronization constraints sometimes generate some desirable degrees of fuzziness and unpredictability ... although this is probably a very slippery slope ...)
> I'm convinced most programs easily waste 90% of their CPU time.
Hah, in my experience that's hopelessly optimistic. The C I write is probably 10-100x slower than it should be (leaving SSE and threading aside entirely), but the Python I write most days is a hundred times slower still.
This is optimizing the same numeric algorithm (word2vec) using different languages and tricks, from naive pure-Python down to compiled Cython with optimized CPU-specific assembly/Fortran (BLAS) for the hotspots [0].
Eh, doubtful. Sure, it'll be faster than unoptimized C. But when you observe strict aliasing rules in C a modern compiler will generally produce similar quality assembly for both languages. The reason BLAS is so fast is that every single line of code in a hot loop has been hand-optimized and tons of CPU-specific optimizations exist
Fortran yields some of the fastest code out of the box, yes, but C/C++ can yield equivalent performance by adhering to certain basic rules.
Even with strict aliasing, the only way C/C++ standard effectively lets one type to be cast to another is incurring a copy. Tell me what can be done better for performance in C that can't be done in Fortran or "another static language without pointer indirection".
You can get a lot of performance back just by taking branch predictions and cache misses into your designs.
I did just that in C# last years and got at least two orders of magnitude of performance gain from it. Still, I constantly wished I was working in C instead :)
On consoles we get access to the hardware specs and profilers for everything - even if behind huge paywalls and confidentiality contracts. Yet with all that information available its still insanely hard to optimize the codebase.
There's absolutely no incentive to push these tools/specs to hobbyists because even professionals have a hard time getting them. Its also worth mentioning that having access to the tools and specs doesn't mean you'll understand how to properly use them.
It is in retrospect a pretty banal observation. If cipher design is about achieving a certain security level at the lowest cost per byte, it sort of stands to reason that you'd want to design them around those features of a CPU on which the market is going to put the most pressure to improve.
In fairness, some of this design philosophy --- which Bernstein has held for a very long time! --- gets a lot clearer with the advent of modern ARX ciphers, which dispense with a lot of black magic in earlier block cipher designs.
A really good paper to read here is the original Salsa20 paper, which explains operation-by-operation why Bernstein chose each component of the cipher.
Eh. Remember that a lot of these guys think about hardware implementations, so they're fixated on ASICs, not GPUs.
At some point the inequality gets too far out of whack and it's time to reconsider your values. Same reason we keep switching back and forth between dominant network architectures every 8 years. Local storage gets too fast or too slow relative to network overhead and everyone wants to move the data. Then we catch up and they move it back.
There can often be a strong disconnect between theory and implementation. I don't believe you can be truly effective unless you can get your hands dirty with both. Although if you have to choose, I'd rather know a bit of theory, and a whole lot about reality, than vice versa. Constants factors matter in big O.
People spend a lot of money on physical security as well. They put locks on their homes and cars, install safes in banks, drive money around in armored cars, hire armed guards for events, and pay for a police force in every municipality. The simple fact is that if your money is easy to get, someone will eventually take it without your permission. That is reality, but calling it "unhealthy" implies that the current state of things is somehow wrong. I agree with that premise, but it carries with it a lot of philosophical implications.
> That is reality, but calling it "unhealthy" implies that the current state of things is somehow wrong.
I don't spend a lot of money on physical security. I leave my car and front door unlocked usually, and don't bother with security systems.
If you find yourself having to lock and bolt everything under the sun lest it get damaged/stolen, then yes, I think it is an indication that the current state of things is wrong. There is something wrong with the economy/community/etc. in your area.
I realize that "the internet" doesn't really have boundaries like physical communities do, but I too wish for a world where security was not an endless abyss sucking money into it and requiring security updates until the end of time. In other words - a world where you could leave the front door unlocked online without having to worry about malicious actors. It will never happen, of course (at least not until the Second Coming ;)
You're only able to be lax about physical security because institutions take care of it for you. Your taxes go to police, FBI, military. You keep your money in a bank, and the bank spends money on security. If people stopped keeping their money in banks and started storing it in their homes, the incentive for burglary would go up in your area.
I also assume you don't live in a dense city. Even in a utopia you couldn't have perfect safety without investing in security: some people are going to commit crimes and violate property just for fun, and when there's a lot of people in one area, that becomes a concern.
There has never in history been a time when people can be secure without investing in security. To claim that investing in security is in indication of something "wrong" is therefore at best a sentiment far ahead of its time.
I live in Japan, where there seems to be a much stronger respect for other people's property. It's a common story for people who forget their wallet on a park bench or something to come back hours later and it's either still there or at the local police box. There's also a lot less vandalism - I don't usually see things like trashed bus stops that were a daily sight back home. Most people seem to lock their bikes with a simple lock between the frame and back tyre - those would have all been hauled away by a truck and disappeared by the next morning back where I used to live. (edit: another anecdote - I was very surprised when my wife thought nothing of leaving a brand new MacBook Pro visible in the back seat of the car when going into a restaurant for lunch)
Japan also has some of the densest cities on earth.
Of course, it's far from perfect (there's plenty of crime, people still lock their doors, and sexual violence seems to be off the charts), but it shows that paranoia doesn't have to be the default state. If they made it this far, how much further is possible within human nature?
I live in Zürich. The joke here is that if you left a pile of money on the pavement in the city centre, it would stay there for a week and then you'd be fined for littering.
People here are basically honest. There aren't any turnstiles on the transit system --- they just assume you'll have bought a ticket. Children roam the streets on their own and think nothing of talking to random adults.
Of course, they back up this trust with enforcement; there are occasional spot checks on the transit system with big fines; and if you don't pay a driving ticket the police will come to your house and remove your license plates...
In Paris, I used to leave my bag on the street in front of my high school whole days. It was just leaning on the front door, for hours. That was before.
Now I guess someone would call a bomb disposal team and cut the traffic 3 blocks around it. (and I would have been very sad, because to this day I still love that bag, leather, light, durable, practical, it still fits a modern 13" laptop perfectly)
That's true. And crime statistics aren't much more useful because willingness to report and how well the police handle filing differ a lot between cultures.
Most places also have data from crime surveys, so it's possible to make decent predictions about how much crime is reported or not.
The crime statistics themselves are very useful, there's no need to chuck them about because some level of crime goes unreported. Statisticians have long ago realised that we can measure this too.
I moved to Rome, Italy last year and was quite surprised how little respect they seem to have for other people's property.
All the apartments have thick security doors[1], lower windows of buildings have iron bars and all gardens have tall fences. If you leave anything that looks even remotely valuable in your car, someone will break a window and grab it. There is tonnes of vandalism, and even if you leave a motorbike locked with a secure lock it's somewhat likely it won't be there the next day.
It sounds the complete opposite of Japan!
[1] A few weeks after I arrived I was at a friends house and there was a disturbance of some sort in the apartment next door. The police couldn't get in, so called the fire service who had to work for 20 minutes to break the door down. They considered going in through the outside, but it was the third floor and all the windows had metal shutters.
Breaking the rules is fun. The thrill in taking such risks is exhilerating and if you get good at it — if you can repeatedly break the rules and get away with it — you feel powerful. Unless you're suggesting a transhuman approach, there's no way to get around that. You can eliminate any incentives for crime, and eliminate all sources of childhood strife, and still have people who commit crime because it's a great hobby.
It certainly is. Personally, I've grown beyond purely breaking rules for the sake of it, without concern for the repercussions. I still break the rules, all the time, but only when I think the rules are wrong and no true harm will come from my breaking them. It's gotten much less fun in my old age, just seems rather boring and rational.
Sexual desire was fixed at a cultural level until it wasn't. The narrative that people only steal because they're poor and desperate is just that, a narrative. Anyone can feel excitement at taking another's property. Your perfect society free of crime is begging for a "disruption."
> Anyone can feel excitement at taking another's property.
My upbringing has had this "golden rule of ethics" (don't do to others what you don't want done upon yourself) ingrained in me pretty well, so I would never feel excitement about this.
(Hypothesis: This ridiculous objectivism has had people forget how much cooperation is ingrained into humans by nature.)
That said, I am aware that such thoughts exist in my subconsciousness, because the subc is always exploring all possible paths, but I have never experienced that as an even remotely acceptable (in terms of my own moral) path of action.
> don't do to others what you don't want done upon yourself
This only works for people who sufficiently appreciate their own possessions. Others might develop a very relaxed "stuff comes, stuff goes, who cares it's just materialistic crap anyways" attitude and and that equips them to have surprisingly little remorse in regards to theft. This is probably exemplified best in the extremely high rate of theft affecting near-zero-value bikes in many Dutch cities. That kind of thief might even rationalize by inverting guilt, "if he feels bad about the loss it's his own fault that he is not as cool about shitty old piles of rust as I am"
Have you ever watched a heist film? If so, you can't claim not to understand how it could be exciting to break into someone's property, risk imprisonment or death, and take their stuff. Go watch one some time, the writers make sure to paint the victims as bad men and the thieves as anti-heroes so that you can enjoy the fantasy without guilt.
You've missed my point for the opportunity to moralize and posture. Consider this: You know cheating is wrong, you probably consider yourself to be someone that would never do it, yet you can still acknowledge that it would most likely be enjoyable for at least a very short while, right? That's all I'm saying, not (lmao) some kind of faux-objectivist philosophy on the right to take other people's property or whatever it is you seem to be imagining.
> ... instead of trying to change it at its roots?
Because eugenics went out of fashion a long time ago. The "roots" is free will, so you can't really change that without horrific consequences. Using birth control (voluntary and involuntary) to bias the population was a popular idea in the 1920s. China's latest gamifaction of "social credit" is another biasing attempt. I'm not aware of any "roots" solution that doesn't require total surveillance or head measurements...
Now I have to apologize for playing both sides of this conversation, but we've been working on this tribalism for a few hundred years, maybe if you were a better student of human history than I am you could make a case for a few thousand.
We have a way to go, but we have made at least some progress. Maybe in another few hundred years humans won't be so 'Us and Them'. Maybe materialism will be the next thing we work on. Who knows, maybe by the time we reach the next galaxy we'll have it sorted out.
It's human nature to crave sex, maybe not with every single person you meet, but certainly with more than a single partner in your entire life, right? For thousands of years, Christianity attempted to "work on this" aspect of human nature and make non-monogamous sexuality taboo. It was fairly successful at it. Yet (depending on your perspective) it only took somewhere between a few years and a few decades post-sexual revolution for it all to come crashing down spectacularly.
If the most powerful institution in the western world couldn't "fix" this aspect of human nature over millennia, it is sheer arrogance to think we can do better today.
Feminists have selected against male aggressive behavior in our education systems for decades. They have "successfully" ruined the lives of countless western boys and laid the foundation for immigrants from non-castrated cultures to take their stead with redoubled aggression. Turns out that it's much easier to beat boys into submission than it is to subvert female attraction to dominance.
Human nature is the least productive concept ever. It is only ever brought up to defend the status quo and declare it as inevitable. It is a mystical concept impervious to examination. Whether it's "God" or "human nature" it's just a term for an abstract master that keeps us all in bondage.
>Human nature is the least productive concept ever. It is only ever brought up to defend the status quo and declare it as inevitable. It is a mystical concept impervious to examination.
Actually it's the exact opposite: very pragmatic and empirically verified.
It's exactly what people, in statistical quantities, tend to do over what they tend not to do.
Plus, most of it is shared with our animal siblings.
Except if you think that an Elephant or a Cat doesn't have certain natural characteristics (insticts, behaviors, traits) based on their species.
Do you see a specific claim backed up with empirical research in the above usage of human nature? I don't. Do people ever? Any time someone brings up human nature they could omit the term and just talk about the research itself. They don't. Human nature is a rhetorical shortcut. You can't assail human nature, you can't examine it like scientific evidence. It is intellectual laziness to make any argument from human nature.
>Do you see a specific claim backed up with empirical research in the above usage of human nature? I don't.
That's because it's a casual online conversation. I don't stuff my responses with citations when I'm not writing a paper.
>Do people ever?
Yes. There are tons of research on human psychology, cognition, physiology, evolutionary traits, instincts and other aspects of what colloquially is called "human nature".
Do you think animals have a "nature"? If so, why are humans any different? Sure, we can reflect on our behavior and try and change it, but it's undeniable there are innate drivers that can be hard to overcome.
I take issue with the way "human nature" is used, not the idea that we have a nature. It is almost always unproductive. Even so, most who make an argument from human nature always convince me that they know nothing about other human beings within a sentence or two.
Yea, I wish we could all live in Anarchy without rules or police or armies etc. But there is always an asshole who want more, or think he deserves more. I mean, we reward these people today with jobs on WallStreet etc, but these people would still exist in an anarchy, and would do their best to "solve" the world to their liking.
Since the discussion is about criminality, and it was suggested that this arises due to some religious notions rather than eg (a) people being defective or (b) social-economic disadvantage, it thought that out to be challenged. From an evolutionary perspective you could say that there must be an advantage in such behavior to the individual, even if the society suffers.
Also, thanks for the unjustified rudeness, you've made a great contribution today.
Idealistically and for the fun of philosophically considering the matter, by creating a world composed of societies in which the inhabitants are free from want.
Do you mean chemical lobotomies or a post scarcity economy? Because there are some wants that are impossible to fulfill, for example: there are a bunch of murderers in Syria motivated by the desire to fly a winged horse to a sky mansion where 72 infinity virgins await. I don't see that happening, even post scarcity.
More than just for fun. In a global society so encapsulated by money people are always going to tend toward the path of least resistance to maximise that goal. In many cases, despite the moral factors, risk, etc; this is going to be crime. It seems to simply be human/animal nature and something which society itself won't easily be able to change.
I live in a place where anything not locked up will get stolen. I agree with you that something is wrong here, and I am doing my best to fix it. But it's tightly connected with intergenerational poverty. I would love it if you would come help me.
If you're suggesting I should move, then sorry. I would love to live in the woods on a lake and live out my days. I could do that easily. But I think I have a responsibility to help solve the problems the economy that pays my rent is causing. I choose to live in the ghetto where I can see what's going on so that I can help.
> I choose to live in the ghetto where I can see what's going on so that I can help.
If that is true, I very much respect that ...
Anyway, I suppose you know, that people usually adopt to their enviroment. So if you see not so nice changes in your ethics and livestyle, you should probably reconsider your home ...
It doesn't matter whether or not you buy or use locks.
You passively pay for security at a bar when you buy a drink - the price of the bouncer is part of how they derive the cost of your beer. Similarly you pay taxes (local, state, federal) that pay for a massive amount of physical security that you utilize unknowingly on a day to day basis.
Just because you don't find it necessary to lock up your stuff in your driveway doesn't mean that you can claim that you don't use or pay for physical security.
The boundaries are key here. You just found a community without thieves and our physical world has natural boundaries which prevent other people to enter your community. An analog is some kind of LAN with trusted peers. There's often little to no security in such a networks, the only security is firewall preventing any connections from the outside. But Internet is requires a completely different thinking. You are always living in the worst nightmare, everyone's attacking you constantly, checking your doors and locks. And there are such places in the Earth.
In my city you can't use a wooden door, because thief can easily break it with good kick, so everyone uses heavy iron doors with good locks. You can't leave your automobile with factory locks, it'll be stolen, sooner or later. Everyone uses additional security systems. You can't even leave your bag in the closed car, if your windows are not tinted. Someone will see it, break window and stole your bag. You must use heavily tinted window, so no one from the street could see anything inside. Even the thought that I could leave my car or home unlocked is foreign for me. It's my property, so it's my work to defend it.
An endless abyss sucking money? Most online services, even huge ones, manage to thrive for years with terrible security practices. OpenSSL, the cornerstone of online communication security, had until recently a single full-time developer.
I think it's amazing we manage to spend so little on security.
Wow, what country do you live I that you feel comfortable keeping you door and car unlocked? I've lived I the US and Germany and wouldn't have felt comfortable doing that in either place.
Countries are not granular enough to use for this sort of thing. I've lived in several places in the US where unlocked doors were the norm, and then in several places where it would be a bad idea if you want to keep your things.
This is an embarrassing story, but one time I left my car running outside a coffee shop while I had coffee with a prospective employer for about an hour. When I got to my car it was no longer running but it was blisteringly hot because I had left the heat blasting, and there was a note on the dash that said "I turned your car off, I hope that's ok, your keys are in the console". So yeah, sometimes and in some places stuff doesn't get stolen when it could.
I have relatives in the rural Dakotas who don't really need to lock anything up.
I went to college in Atlanta, where leaving anything slightly valuable inside your vehicle would lead to a smashed window. So I've developed a lifelong paranoia about never leaving valuable items in sight in a locked car.
Also, HN crew if someone is going to rob your house they don't just try to enter in a locked door. The robber is going to knock, if you answer, make up some story how they are looking for Alice & Bob, and go on their way. If nobody answers, then they try to see if your doors are locked. I used to frequently get strangers "looking for someone" at my door in Atlanta, too. Few things are scarier than hearing a knock at the door when you are in bed, ignoring it, then hearing them try to open your door.
May I ask where you lived in Atlanta? I'm attending Tech right now and chose to live outside the city for safety reasons - in Dunwoody to be precise. I never knew it was that bad though.
I went to Tech too. This was all > 8 years ago though. Staying on/near campus you are much better off, in the sense that access to the building is more restricted and people are going to notice someone sketchy in the hallway. But even on campus cars were frequently robbed, especially at night when I was there. I remember seeing a guy with a backpack full of LCD monitors getting arrested outside the College of Computing once. And bikes were stolen left and right, I lost one.
I once did a Habitat for Humanity build out the Bankhead (not Buckhead, big difference) highway. The had to have Atlanta PD watch the lot DURING THE DAY because multiple cars had been stolen from Habitat Volunteers. I still think that was crazy cars were stolen that brazenly in broad daylight.
I don't have a pulse how things are these days. But don't get scared off by these stories, there are lots of good things about living in the city too. Go Jackets. :)
Yeah, I heard that things are much better than they were back then. I guess I'll reconsider once my lease is finished. The good thing is that MARTA is pretty good and on-time usually from what I saw.
Upper-middle-class low-density areas are great for this. Your neighbors have too much to lose to steal anything, and it's too far to bother for out-of-towners.
The epitome of this is most of Switzerland. I went on a Tinder date with a girl recently whose apartment door did not even have a lock.
I wouldn't feel exactly comfortable doing that anywhere I've lived since I've began urban living, but here is a fun story (in the U.S, in a high crime city, but a lower crime area):
Before I moved into my second college apartment, the previous tenants warned that they got robbed a few times, and then learned never to open the blinds, and didn't have a problem after that. We therefore never opened the blinds. We pretty quickly broke off a key in the lock and ended up leaving it unlocked except when everyone was gone for multiple days (I won't bother explaining how this worked, or why we didn't get it fixed for a year.)
So our door was always unlocked, but since you couldn't see inside, theft was never the goal of the folks who wandered in. It was always those who were friends and customers of other people in the building who were told they could hang out with us until their friends got back from class or those who had to use the bathroom late at night.
One of my roommates was up until 4 am in the living room every night, and one time someone wandered in while no one was there, and left a note thanking us for letting them use the bathroom, but nothing was ever taken except xbox controllers, and we thought we knew the culprit.
This is an entirely normal thing to do if you live in an area that doesn't have a semi-permanent criminal underclass.
It's not even really a poverty issue - there are places where people are poor as churchmice, but the social stigma around theft is strong enough to keep people in line.
I spent a semester abroad in New Zealand and there were people there that left their doors unlocked. Similarly, in college a lot of my classmates left their doors unlocked even though the college administration repeatedly told us not to.
I think it's a density thing. In a small town, there's nobody to steal from you, and even if there was you'd eventually figure out who it was since everybody knows everyone. In a big city, someone can be in and out without you ever knowing, and you would never find them again. Indeed, most of the city residents I met in NZ still locked their doors (the people mentioned above were in smaller villages), while when my family went on vacation to a rural vacation home (in the US) my mom felt comfortable enough to leave the back door unlocked.
I lived in Atherton, CA for a while (granted, one of the richest and safest [1] ZIP codes in the US) and we never locked our house. In fact, I did not even have keys!
I was once walking through Atherton and a police officer stopped me to make sure I wasn't doing anything suspicious. I suppose the number of "walking while white" infractions a community tallies might be a good (if unfortunate) indication of how wealthy they really are.
The same police officer later bought me a coffee for no particular reason. What a country.
The wealthy people may send their kids to private schools. The public schools may have students from working class families in adjacent areas in the same school district.
Part of Atherton is in the Menlo Park School District and part of it is in the Redwood City School District. The former is considered to be much better than the latter.
There is a lot of places where you can get away with it even in dense area of the US and Germany.
There is some social engineering going on there. If you live in an area where everybody locks his door and don't have a huge problem (like bands of bored teens trying every house, or lots of junkies, ...) the chances are very small that anyone would try to open your door, and if they do, they most likely come prepared. I remember that in my previous flat in the center of London, we left the kitchen window opened for years without even realising it. It would have been trivial to break into our place, but nobody cared. They entered 2 times over the same period at my neighbour place, likely because they saw something they wanted to steal through the window.
I come from a small town, small enough to yell across and we're perfectly safe leaving our doors unlocked. Home invasions and break ins are unheard of entirely, in the last 30 years at least. Similar behavior in surrounding towns with slightly larger populations.
I lived in a small, ostensibly safe town in Sweden, but people would come in from outside to violate the trust the people had. You need to be good friends with your neighbor so they can confront or report any suspicious outsiders.
my parents live in an affluent southern california suburb and still do this all the time. when i visit i just walk in the front door.
in most suburban places you can also leave the garage door open all day long just fine.
it's primarily dense cities that are the issue. just like online, when there's transience and anonymity, there's a high incentive for people to behave poorly.
>when there's transience and anonymity, there's a high incentive for people to behave poorly
Sometimes people tend to behave even more poorly if there is no anonymity. This happens when they think that they are morally right in an important area although they are totally wrong. Many racist "this needs to be said" comments are written using real names.
I don't lock up unless I'll be away for several days, I know many of my neighbors don't either; I live in the outskirts of Copenhagen and it's not even in one of the "fancy" areas.
>I don't spend a lot of money on physical security. I leave my car and front door unlocked usually, and don't bother with security systems.
Try living in a bad neighborhood then.
If you already live somewhere nice and safe, the spendings and effort you don't do on security has been done by the state (police), buying or paying rent for the house, etc.
Most of your physical security comes from two things:
1) the fact that the rest of us lock our doors, so most ne'er-do-wells have accepted the notion that closed doors tend to be locked. It's a form of herd immunity that protects you so long as only a small number of people behave like you.
2) the shift from cash to electronic money, meaning that physical theft is less lucrative than it once was (and meaning that most money is protected by entities that invest meaningfully in security.)
You're a free rider, benefiting from effort made by others, while being a (small) net negative to society. You're welcome for the security we've provided for you.
"People spend a lot of money on physical security as well."
You would have spent more money on personal entertainment than on your personal security. Your TV, tablet and books would have cost you a lot more than your locks.
Add in the cost of tickets to concerts and plays and your entertainment expenses dwarf any security expenses that you have had.
Anti-virus scanners are viruses. There's nothing like having some yahoo decide that Windows Defender isn't good enough, and so you need multiple, heavily intrusive real-time virus scanners running, fighting with each other, constantly thrashing your hard drive, and making a beefy development workstation waddle along - I already have Visual Studio doing that, but at least that's helping me do my job, not obstructing it...
Given that games tend to be deployed on a wide range of computers, it's hard for them to use newer x86 extensions (e.g. AVX), while most scientific computing applications tend to be compiled for a specific machine. To get to see the larger improvements, you must use the newer instructions and registers, like you can see in BLAS benchmarks[1].
Yep, at my old job we still had to support Windows XP 32 bit for the game client (internal tools were Windows 7 64 bit) as so many people were still playing on systems. I don't remember if we could use SSE2 or what the minimum was in that regard.
> Seems like they are being optimized to be better at vector math, and games just happen to highly use these pieces of HW.
It may in fact be that desktop/mobile CPUs are being optimized for their contemporary benchmark suites, thus targeting the ensuing benefits in marketing. The benchmarks themselves were, for a fair amount of their existence, focused on games-related performance, at least from what I recall.
I don't mind the downvotes, but more interested to know why if anyone cares to comment? I should have specified that by desktop/mobile, I do mean desktop/laptop essentially (and, not smartphone CPUs). And, I am talking about Intel/AMD largely, and how I perceive their evolution over the past two and half decades.
In short, I think "optimizing for games" means nothing to a CPU designer at say Intel, and anyways qualitative differences (expanding the ISA, or integrating new features on the IC) are more expensive to develop and sometimes tricky to market. Instead, marketing quantitative differences is much easier (hence optimize for benchmarks) - though arguably no easier to develop. Witness Intel's first rocky attempt at targeting the benchmarks in the early 2000s: Netburst [0]. Of course, in the past 5 years, things have changed (the rise of smartphones, meteoric rise in GPU performance with expanding markets & new software, CPU "per core" performance stagnation), so Intel is in the process of re-positioning itself.
My experience is that double precision is very common on demanding game areas, notably physics. I recall older versions of the ODE physics engine recommending and defaulting to double precision and lots of newbies on the forums being surprised there wasn't much performance difference. That was all some time ago.
My experience is that it's extremely uncommon, including on physics systems. While I no longer work in games, I did for several years, I probably can count the times that I used double precision on one hand (if you exclude the JavaScript work that plagued the end of my game development career).
The reason for this is simple -- it's twice as slow. The vector width is half the size, and so you can do half as many operations at a time.
Probably because double precision floats are just as fast as single precisions if you don't vectorise. I bet you didn't.
This would also explain for instance why many programming languages drop single precision floats altogether: they don't plan to vectorize in the first place.
A pretty solid rule of thumb is that 32 bits has enough precision for rendering, physics needs doubles if it's going to use a lot of iterations. Simplified models like Super Mario jumps or the tractor beam in "Thrust" (single rigid joint) can get away with 16-bit fixed point precision.
OTOH rendering of geometry only needs about as much precision as the display offers, which often means 8 bits on older hardware.
Which is funny, because games would have benefited more from half (16 bit floats) HW support. There is almost no support on CPUs (X360's CPU had conversion to and from half, IIRC, and ARM in handhelds supposedly supports full half arithmetics), even less on x86/x64 CPUs specifically.
The comedy here is that AMD and NVIDIA are adding it to CPU and GPU designs for the benefit of DNN training workloads; something that took off because of hardware designed for video games ;)
> Sometimes I can't help but wonder how the world where there is no need to spend endless billions on "cybersecurity", "infosec" would look like.
A goofy example, but I suspect that the aliens in the Independence Day film lived in such a world; their system didn't seem to have too much security - and why would you need any, in a telepathic society?
Additionally, I believe they functioned as a hive mind. It would be like an individual trying to keep their password secret from another part of their body.
That brings up an interesting hypothetical... Is it possible to have a password that you don't even know? Sure biometrics is one method, but are there any password schemes that work on things like word associations or unconscious behaviors found during the typing process, like statistical analysis of the time between key presses?
I remember doing an online course and they had me type several paragraphs in order to determine my typing "signature" but it seems doubtful to me that it would be precise enough to be used for authentication.
Cryptographic systems often rely on the secrecy of cryptographic keys given to users. Many schemes, however, cannot resist coercion attacks where the user is forcibly asked by an attacker to reveal the key. These attacks, known as rubber hose cryptanalysis, are often the easiest way to defeat cryptography. We present a defense against coercion attacks using the concept of implicit learning from cognitive psychology. Implicit learning refers to learning of patterns without any conscious knowledge of the learned pattern. We use a carefully crafted computer game to plant a secret password in the participant’s brain without the participant having any conscious knowledge of the trained password. While the planted secret can be used for authentication, the participant cannot be coerced into revealing it since he or she has no conscious knowledge of it. We performed a number of user studies using Amazon’s Mechanical Turk to verify that participants can successfully re-authenticate over time and that they are unable to reconstruct or even recognize short fragments of the planted secret.
Another possibly unpopular opinion: having some bad people makes us safer on the whole. If the world were perfectly safe, and we didn't have to study and invest in security, redundancy, weapons, emergency response, and so on - we become more vulnerable as a race to potential future badness. Phrased differently, if badness is physically/theoretically possible, we're better off having to practice defending against it, than be caught off guard when it does manifest later on (by natural events, aliens, whatever).
Similarly with war, there is likely some optimal level above zero that increases our overall safety as humans by honing our abilities in force.
Games use floats and pointers, and I suspect the only floating point bits they use is the 52-bit integer multiply that was added specifically to optimize crypto algorithms.
The integer vector extensions they're almost certainly using I'd argue are primarily designed to optimize various codecs not games; unsurprisingly, crypto algorithms tend to have similar CPU loads to compression algorithms.
I didn't read this as lamenting the state of CPU development. Instead, the argument showed how chacha and similar software ciphers benefit more that hardware ciphers (such as aes) from the current state of CPU development.
It was a really big deal for a long time that a better 2d graphics card meant your spreadsheet software ran faster.
People would justify getting a better machine for playing games by saying that it would make their work more productive too.
In more recent times, look at all of the video games that made sure they could run well enough for casual players on a reasonably recent vintage laptop. I know I bought a new laptop at least once specifically so that a video game would be playable. My code compiled a lot faster on it (SSD) but I accept that I really bought it for playing games.
A long, long time ago, I read a blog post in which a developer chided people for making expensive-to-develop games that could only run on 5% of PCs, then complaining about low sales. His point was, "Why would you drive up your development costs just to drive down your potential sales?"
We lived in a world where we didn't spend anything on security, and it was awesome. That's the environment that unix, the internet, web browsers, IRC, and all the other really cools stuff we use every day was developed in. Hacking, exploring, and sharing was encouraged, easy, and expected.
Now what do we have? Ssl protecting the packets that we broadcast to everyone on ad delivery platforms.
Now what do we have? Now we're spending more money time and effort, getting poor and limited solutions, trying to patch up email sender verification, shove SELinux around things, wrap every browser request in "are you sure?" modal dialogs and "Request refused for your protection" trip-ups, replace IRC-plain-text-chat with "text-chat in a browser" (Slack, Discord) or walled gardens (iMessage, Google, Skype), fight hard to replace C with languages where buffer-overflows aren't one mistake away and everything isn't de-facto built around rudimentary-types and string-concatenation, and do it all without compromising backwards compatibility or established user experience too hard.
Plus, cryptography's speed/strength levels aren't really the weak-link in security anyway.
The real problem is, for lack of a better term, the mentality of individuals and businesses. It doesn't matter how uncrackable and fast your encryption is if nobody uses it because it's fundamentally inconvenient or hard to justify on a balance-sheet.
I feel you. Plus being a security expert means that you are competing in an arms race which you have no chance of winning. It always bothers me that there are a lot of people who want to brake things and I have to think about security concerns just because of them.
Games can usually tolerate some cheating because it's just a game. But they're hardly immune. Spam and abuse make everything suck, definitely including games.
Now, put real money on the line and you get the gambling industry which is much more serious about security.
Interested in understanding why spending billions to literally "play games" is okay, but it's illogical for governments to secure digital assets and if needed, attack digital assets as needed.
Yes, I get it makes sense that there's more of a market for fun stuff than serious stuff, but putting aside security theatre, false flag operations, etc. - the idea that there aren't really threats in the world seems like it needs a bit more explanation.
Ehmm. That's how the world has functioned since before humans existed; it's inbuilt into our evolutionary system. Competing for resources is what we have evolved to do, whether this competition is against other humans or other animals is irrelevant.
Nature doesn't have any inherent ethical system, so unless you can fault the universe for existing incorrectly, I don't buy your argument.
Yeah but games the majority of people want to play don't need special hardware. I mean how much horsepower does Solitaire, Tetris, Minecraft or Angry Birds take?
Don't disregard the amount of computation Minecraft does. It is not in the same league as the other games you mention. Even more so if you place blocks that require frequent updates (such as redstone logic).
I'm sure Minecraft does a lot of impressive things on a technical level.
However, I remember running it just fine on an old XP machine with a 32bit, single-core pentium III and 2GB of RAM. In addition, this game has been available on smartphones quite some (phone model) generations ago.
The smartphone version isn't the same as the desktop version - it was literally rewritten from scratch, and as such is much more performant, although it lost support for mods of the original version as a result.
Yeah, because gamers have the deepest pockets in consumer-grade hardware, that isn't a tiny niche segment. Business users aren't tricking out desktops to the tune of thousands of dollars to push the envelope of performance.
Well, aside from the cult of Mac... They spend the same amount of money as gamers, but for bog-standard commodity hardware in a pretty case.
I agree with the question though maybe not the assumption that goes with it.
I imagine professional workstations for industries such as software development, visual arts, industrial design, film production, music production etc. are large users of high end CPUs.
My gut says that all of these fields are much smaller worlds than you assume, especially when compared to the 7-11 million concurrent Steam users and countless more PC gamers in countries like China and Korea.
"I imagine professional workstations for industries such as software development, visual arts, industrial design, film production, music production etc. are large users of high end CPUs.
None of them uses overclocked CPUs (like Intel's K-series) that's fairly standard for high-end gaming
Most frequently, good security is the absence of vulnerabilities.
When vulnerabilities are bugs, security is software quality.
A lot of other techniques are just mitigation, and the "endless billions on cybersecurity" is often security theater and optics.
Games are also representative of the apps that actually squeeze the performance out of CPUs. When you look at most desktop apps and Web servers, you see enormous wastes of CPU cycles. This is because development velocity, ease of development, and language ecosystems (Ruby on Rails, node.js, PHP, etc.) take priority over using the hardware efficiently in those domains. I don't think this is necessarily a huge problem; however, it does mean that CPU vendors are disincentivized to optimize for e.g. your startup's Ruby on Rails app, since the problem (if there is one) is that Ruby isn't using the functionality that already exists, not that the hardware doesn't have the right functionality available.
Interestingly, the one thing that typical web frameworks do do very frequently is copy, concatenate, and compare strings. And savvy platform developers will optimize that heavily. I remember poking around in Google's codebase and finding replacements for memcmp/memcpy/STL + string utilities that were all nicely vectorized, comparing/copying the bulk of the string with SIMD instructions and then using a Duff's Device-like technique to handle the residual. (Written by Jeff Dean, go figure.)
No idea whether mainstream platforms like Ruby or Python do this...it wouldn't surprise me if there's relatively low hanging fruit for speeding up almost every webapp on the planet.
Why is this even a thing? Copying and the like is such a common operation. Why don't chip providers offer a single instruction that gets decoded to the absolute fastest way the chip can do? That'd even allow them to, maybe, do some behind-the-scenes optimization, bypassing caches or something. It's painful that such a common operation needs highly specialized code. I know you can just REP an operation but apparently CPUs don't optimize this the same way.
This is too obvious an issue, so there must be a solid reason. What is it?
The CPU has a limited amount of silicon to spend on instruction decoding. That silicon is also part of every single instructions issue latency. Machines like the Cray with complex internal operations paid the cost in issue delays, and most workloads won't earn them back.
It's really something the stdlib should do, which, when I've seen it implemented, is usually what happens.
The compiler should just provide good inlining support, so that if eg. you include the short-string optimization in your stdlib, the compiler can optimize it down to a couple bit operations, a test, and a word copy. If the test fails and your string is more than 7 bytes, it's perfectly fine to call a function - the function call overhead is usually dwarfed by the copy loop for large strings. And then if new hardware comes out and you vectorize it differently, you can get away with replacing that one function in the stdlib instead of recompiling every single program in existence.
Why? It's a common op that requires internal knowledge of every microarchitecture, isn't it? Seems like something that should be totally offloaded to the CPU so you're guaranteed best performance.
The message you were referring to was talking about code for copying strings. If you wanted an instruction to copy lots of strings, the CPU would need to know what a character is (which could be 7 bits, 8 bits, 16 or 32), what a string is, how it's terminated, what ascii and unicode is, be able to allow new character encoding standards etc etc. Then you would need other instructions for other high level datatypes. That's not what CPUs do, because you're limited by how much more logic/latency you can add to an architecture, how many distinct instructions you can implement with the bits available per instruction, how many addressing modes you want etc.
So instead, this information/knowledge about high level data types is encapsulated by standard libraries and then the compiler below that. Most CPUs have single instructions to copy a chunk of data from somewhere to somewhere else and a nice basic way to repeat this process efficiently, and it's up to the compiler to use this.
In the olden days of x86: "REPNZ SCASB" to get the length of a zero-terminated string and "REP MOVSB" to copy bytes from place to place. But I think more modern CPUs actually work faster with the RISCier equivalents.
Summary: 'Why doesn't "hardware support" automatically translate to "low cost"/"efficiency"? The short answer is, hardware is an electric circuit and you can't do magic with that, there are rules.'
I remember way back in the DOS days there were a lot of clever things you could do with setting up DMA transfers which would then run in the background... With CPU/memory latency and bandwidth being so much more of an issue these days I can't see why this isn't still the case. Maybe it's all done automatically in the background now?
DMA is done automatically by modern hardware, PCIe(/Thunderbolt) and SATA just do it.
Somewhat related is e.g. the sendfile() syscall that's used by web servers/frameworks to pass a file directly from your fcgi application to the outgoing socket.
Sorry, I wasn't clear. That's exactly what they do do. These aren't separate APIs; they are modified versions of libc and the STL that speed up the implementation. Facebook apparently has similar STL replacements that they've open-sourced as part of Folly.
(The string utilities I was referring to actually are separate APIs, focused around manipulating string pieces that are backed by buffers owned by other objects. It's like the slice concept in Go or Rust. With the growth in Google's engineering department, it got very difficult to ensure that everybody knew about them and used them correctly; this is probably easier if they're part of the stdlib. Indeed, they're in boost as string_ref, but most of Google's codebase predates boost - indeed, they were added by a Googler.)
If your allocator is fast, the IO-list approach used by Erlang is a lot faster at copying and concatenating strings than anything that involves copying the characters around. I used this to good effect in Ur-Scheme. But then processing the contents of the strings becomes potentially expensive, and you may have aliasing bugs if your strings are mutable.
Python 2.7.3 concatenates strings in string_concat using Py_MEMCPY, which is a macro defined at Include/pyport.h:292. That invokes memcpy, except for very short strings, where it just uses a loop, because on some platforms memcpying three bytes is a lot slower than just copying them. In http://canonical.org/~kragen/sw/dev3/propfont.c I got a substantial speedup from writing a short_memcpy function that does this kind of nonsense:
if (nbytes == 4) {
memcpy(dest, src, 4);
} else if (nbytes < 4) {
if (nbytes == 2) {
memcpy(dest, src, 2);
} else if (nbytes < 2) {
The main case in eglibc 2.13 memcpy, which is what Python is invoking on my machine, is as follows:
/* Copy just a few bytes to make DSTP aligned. */
len -= (-dstp) % OPSIZ;
BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);
/* Copy whole pages from SRCP to DSTP by virtual address manipulation,
as much as possible. */
PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len);
/* Copy from SRCP to DSTP taking advantage of the known alignment of
DSTP. Number of bytes remaining is put in the third argument,
i.e. in LEN. This number may vary from machine to machine. */
WORD_COPY_FWD (dstp, srcp, len, len);
/* Fall out and copy the tail. */
}
/* There are just a few bytes to copy. Use byte memory operations. */
BYTE_COPY_FWD (dstp, srcp, len);
In sysdeps/i386/i586/memcopy.h, WORD_COPY_FWD uses inline assembly to copy 32 bytes per loop iteration, but using %eax and %edx, not using SIMD instructions. It explains:
/* Written like this, the Pentium pipeline can execute the loop at a
sustained rate of 2 instructions/clock, or asymptotically 480
Mbytes/second at 60Mhz. */
This is presumably what Jeff's code was a replacement for. Too bad he didn't contribute it to glibc, but he presumably wrote it at a time in Google's lifetime where the Google paranoia was at its absolute peak.
PAGE_COPY_FWD sounds awesome but it's only defined on Mach. Elsewhere PAGE_COPY_FWD_MAYBE just invokes WORD_COPY_FWD.
My tentative conclusion is that Ulrich scared everyone else away from wanting to work on memcpy so effectively that it's been unmaintained since sometime in the previous millennium.
Interesting - the last time I looked seriously at Erlang (~2005), its string handling was a mess. Strings were lists of characters, which ate 8 bytes of memory per character, couldn't be addressed in O(1), and were often slow to traverse because of cache misses. IOLists seem to be a big improvement on that. They're functionally equivalent to a Rope, right? Ropes are used all over the place at Google (where they're called Cords); I worked on a templating engine while I was there that would just assemble pieces into one for later flushing out to the network.
Erlang has always used IO lists for I/O, and they can include both strings and "binaries", i.e. byte arrays. IO lists are very similar to ropes, but because they don't even store the length in each node, they're even faster than ropes to concatenate, but linear-time to index into.
I don't necessarily agree that it's a problem of the language ecosystems. In my experience games are one of the few areas where it is likely to be CPU bottlenecked and not I/O bottlenecked.
My sample size is only the set of projects that I've personally worked on, but in the projects where I have intentionally profiled where the time is being spent, I have only ever run into UX-impacting CPU bottlenecks when working on games. This is spanning code I've written from assembly all the way up to Python. Not counting JS because the DOM is a bit of a black box to me. In almost every other case, the bottleneck was not being able to pull data out of physical storage or some network store fast enough. In some cases it was not being able to pull data from RAM fast enough. In a few cases it was not being able to pull data from a database fast enough due to needing to cover too big of a table, which I lump into the "waiting for I/O" category under the lightly investigated hypothesis that it's not because the CPU is having trouble iterating over the indices, but because the database can't keep the whole index in memory.
> My sample size is only the set of projects that I've personally worked on, but in the projects where I have intentionally profiled where the time is being spent, I have only ever run into UX-impacting CPU bottlenecks when working on games.
Try out any "enterprise" app that thinks its a good idea to load 25,000 widgets on one page. That'll show you the meaning of "CPU bottleneck".
I've seen a page that just displayed a grid of images. It took seconds of CPU time on an i7. The page doubled its load time if you had CPU throttling. I'm not sure what the ultimate reason was, but the end result was that, as far as the user saw, the page did a fancier looking <table> and required billions of cycles to do so.
While that technically may be a cpu bottleneck, I like to look at it as an implementation bottleneck. I can write a pong game that will be bottlenecked by any cpu if I make it calculate a million digits of pi at the start of each game loop, but it doesn't mean my game is more hardcore than Crysis.
It's obviously an implementation issue. The question is how do otherwise sane-appearing individuals end up writing code like that? How does it even pass a preliminary usage test? And so on. This wasn't some big enterprise, either.
That question haunts all of us... For instance... Lets say you need to query a db to get 4 known sets of things aggregated by other things... That should scream at you "in clause with the known params and group by" or.... if you take an approach I saw recently you spawn 4 worker threads each querying the same tables but with one of those 4 params then combining the records.... I'm afraid this board may constitute the remaining sane individuals...
Edit those known params were all used as a param for the same column
It's pretty crazy how video games can display thousands of objects and millions of vertices at 100+fps, yet web browsers and applications can struggle to display more than a handful.
Its not crazy when you consider typical video game economics (hundreds of developers/long dev time/high budgets/high price per unit) vs. typical website economics (handful of developers at best/short dev time/low budget/microscopic price per "unit").
Video games are often intimately tied to the underlying hardware/software stack allowing further optimization whereas the same webpage can often operate unaltered on a multitude of devices.
And video games are often judged by how many "objects and vertices" they can show and how fast they can show them, whereas websites have other criteria.
That's because the video game engines doesn't have to be hardened against malicious asset files. All the code in a video game runs at the same permissions level, which you definitely can't say about a random page on cnn.com.
Browser content has to support all the features and they add up pretty quickly. For example a large scrolling list with thousands of items can be solved by a gamedev by mandating all content be the same size and applying a tiling rule. A list in a webapp is likely to be held to a higher standard and so needs to consider heuristics to find the position of content, dynamic updates, etc.
True, but in cases like that it seems like the CPU is hardly the problem in the big picture. ;)
Sounds like you and I mostly agree. Just that when you do a root cause analysis, "more efficient CPU usage" seems to rarely be the best place to optimize, compared to "do less I/O" or "stop putting 25,000 widgets on one page", or "smaller RAM footprint", and so on.
I have seen plenty of 10x speedups in most apps from trivial changes. If you fail to use CPU cache correctly then can be getting ~1% utilization or less per logical CPU which is still plenty fast for the core logic of many applications.
I was just picking at a nuance of what you said about wasted CPU cycles. Yes, some languages and toolchains these days are less efficient to trade for ease of development, but from my experience it seems like I/O to RAM/disk/network is still 80% of the problem and "less work per CPU instruction" is 20% of the problem.
Depends on the game and what's going on, but even with GPU heavy games the CPU can do a ton of work just getting all the data ready and packaged up for the GPU to chew on.
Causality points both ways, though. Games use the GPU because that's where the compute power is—hence, that's why graphics are so emphasized over other features. GPUs are so powerful to meet the needs of gamers. There's nothing inherently faster or better about a GPU because most tasks don't parallelize that easy—it just so happens that the tasks that DO parallelize easily are highly emphasized.
A CPU for games would have very fast cores, larger cache, faster (less latency) branch prediction, fast apu and double floating point.
Few games care about multicore, many "rules" are completely serial, and more cores doesn't help.
Also, gigantic simd is nice, but most games never use it, unless it is ancient, because compatibility with old machines is important to have wide market.
And again, many cpu demanding games are running serial algorithms with serial data, matrix are usually only essential to stuff that the gpu is doing anyway.
To me, cpus are instead are optimized for intel biggest clients (server and office machines)
I disagree. As a gamedev writing game logic you are right.
But as an engine programmer, I agree with the linked author. I'll take your points one at a time.
Most engines are multi-core, but we do different things on each core (and this is where Intel's hyper-threading, where portions are shared between the virtual cores, for cheaper than entire new cores, is a solid win). Typically a game will have at least a game logic thread (what you are used to programming on) and a "system" thread which is responsible for getting input out of the OS and pushing the rendering commands to the card along with some other things. Then we typically have a pool of threads (n - 1; n is the number logical core of the machine; -2 for the two main threads, +1 to saturate) which pull work off of an asynchronous task list: load files from disk, wait for servers to get back to us, render UI, path-finding, AI decisions, physics and rendering optimization/pre-processing, etc.
AAA game studios will use up to 4 core threads by carefully orchestrating data between physics, networking, game logic, systems, and rendering tasks (e.g. thread A may do some networking (33%), and then do rendering (66%), thread B might do scene traversal (66%), and then input (33%), see the 33% overlap?), they also do this to better optimize for consoles. But then they have better control of their game devs and can break game logic into different sections to be better parallelized, where as consumer game engines have to maintain the single thread perception.
SIMD is used everywhere, physics uses it, rendering uses it, UI drawing can use it, AI algorithms can use it. Many engines (your physics or rendering library included) will compile the same function 3 or 4 different ways so that we can use the latest available on load. It's not great for game logic because it's expensive to load into and out of, but for some key stuff it's amazing for performance.
That stuff the GPU is doing eats up a whole core or more of CPU time. So what if we are generally running serial algorithms, we need to run 6 different serial algorithms at once, that's what the general purpose CPUs were built for.
This is all the stuff you don't often have to deal with coddled by your game engine. The same way that webdevs don't have to worry about how the web browser is optimizing their web pages.
To be fair I'm more of a hobbyist - who writes game-engine-esque code (I never said what kind of engine programmer I am did I) for my day job (pays better) - that just builds game engines for fun (like the last 10 years now... but no games). So some details are likely wrong, I'm kinda super curious as to your nits.
Sounds like what I did for the past 10 years before joining the gamedev world about 3 years ago. It is cool to work on your own tech and to learn a lot of different things, but it's also scary how much can get done with a whole team working at it.
>A CPU for games would have very fast cores, larger cache, faster (less latency) branch prediction,
The CPU industry stayed on this path for as long as it were physically possible, even long after the time when it hit diminishing returns on single thread performance divided by (area*power). Pentium 4 was the last CPU of this single-core era.
If you look closely at the microarchitecture of the modern desktop CPU, the out-of-order execution, caches and branch prediction are already maximized (to the point that >2/3ds of die area is cache). Multi-core has become mainstream only after all other paths became exhausted.
The reason Moore's law failed to keep pace is as you start to pack the transitors close enough you get a lot more heat and current leakage, the amount of work spent doing error correction increases quickly and becomes prohibitive.
> Also, gigantic simd is nice, but most games never use it, unless it is ancient, because compatibility with old machines is important to have wide market.
In my experience SIMD actually becomes more important when you want compatibility with old machines, because your ability to use the compute power of the GPU becomes more limited the further back you go in graphics library versions. For example, OpenGL has no compute shader before 4.6, no tessellation shader before 4.0, and no transform feedback before 3.0. When you can't make the GPU do what you want, SIMD becomes your best bet…
I don't understand how you can say such a blatantly false thing. Modern consoles have more than 4 cores available, do you really think we would let the other 3 go to waste?
Large games will use all cores, our game uses 32 cores if you have them, we solved a bug because 1 guy had such a machine and reported an issue. 1 guy, from the public, and we fixed it the next update!
But no, developers don't care, we don't even like games! In fact, we hate games! Please don't enjoy our games!
This is changing very rapidly with the introduction of the low-level APIs like Vulkan and DX12 that were designed to scale on multicore systems. On those titles we've been seeing much better use of multi core resources than we saw on previous generations [0][1]
> Do CPU designers spend area on niche operations such as _binary-field_ multiplication? Sometimes, yes, but not much area. Given how CPUs are actually used, CPU designers see vastly more benefit to spending area on, e.g., vectorized floating-point multipliers.
So, CPUs are not "optimized for video games", they are optimized for "vectorized floating-point multipliers". Something video game (and many others) benefits from.
Why are they optimized for vectorized floating-point multipliers? Does the CEO of Intel just tell all the engineers to do this because he likes multiplication?
They are optimized for that because a lot of algorithms can make use of them, from quicksort/mergesort through image rendering and encryption. It is an easy optimization from a hardware perspective -- simple repetitive hardware structure. This is why GPUs are so powerful and games are not the only thing that benefits from this type of optimization. Matrix multiplication is also used in signal processing. The CEO asked, how can we optimize the use of our hardware for the most benefit? And SIMD with wide pipes is at the top of the list. Most of the post is about all the new algorithms that can take advantage of the hardware push. The hardware push is there because it is an easy use of hardware resources.
This is also an optimization that compilers can readily take advantage of on a small scale (similar to pipelining) so the combination of benefit + ability to use + simplicity/low resource use makes it an inevitability.
Because CPUs are designed around benchmarks representative of real workloads people are running on their computers. Naturally, multimedia and games are a large part of these workloads, this is what drives SIMD adoption.
That's ignoring the history. The definition of 'general purpose stuff' includes things which used to be considered specialized. FPUs used to be physically separate optional add-ons. They were quickly absorbed into the main CPU around the time that the spreadsheet became the killer app. Then SIMD was added when 'multimedia' became a requirement. Now pretty good GPUs are integrated too. Of course all these capabilities are multi-purpose, but they were not core functionality originally.
> They were quickly absorbed into the main CPU around the time that the spreadsheet became the killer app.
For very loose definitions of "quickly" and/or "around the time". Spreadsheets became the "killer app" with VisiCalc in 1979 -- before the IBM PC was even a thing; but Intel processors through the 386 didn't have an integrated FPU, and the 486 (around 1990) came in both integrated-FPU (486DX) and no-integrated-FPU (486SX) models. It wasn't until the Pentium that integrated-FPU was universal in the Intel line.
(AFAIK, most spreadsheets of the early PC era didn't even support using the FPU, and the main widely-used application category that leveraged FPUs in the optional-FPU era was CAD.)
Yep, my spreadsheet example was not great. Your example of FPU-for-CAD is perhaps better at making the case that the FPU was once a specialized co-processor, but of course is now considered an essential everyday component. The modern CPU evolved to support the set of workloads people found for them over time. They are now heterogeneous parallel systems that are really good at actual workstation and server workloads. They are quite specialized in that sense.
The FPU achieving mass adoption maps pretty well to the onset of "multimedia hype" (CDs, digital audio and video, and of course 3D game rendering). That content doesn't strictly need floating point, but it became an affordance along with 32-bit desktop architectures and their increased memory and storage.
I agree. Nontheless, modern cpus are nonetheless designed to be acceptably good at different things (versus being extremely efficient at solving a specific task).
That's a good point. I don't know the answer. Optional PC FPUs and the rise of spreadsheets were contemporaneous in the late 1980s but perhaps not related.
edit: I checked. Microsoft at least used floating point in Excel until much later, until at least 2013. With the expected limitations. https://support.microsoft.com/en-ca/kb/78113
They absolutely were related (as were databases and FPUs). Had these mainstream business applications not driven them, it would have likely been years later before they became standard equipment. The FPU was likely originally added with 'serious' scientific and engineering computing tasks (including CAD) in mind, but it was the more mundane and common spreadsheet and database applications that drove demand.
Here's what I remember from back then: early PCs (8-bit era: Apple // etc) had no socket for a FPU (i.e. not even an expansion option), every IBM-compatible PC had a co-processor socket starting with the 8086 (fun fact: the 8087 wasn't even shipping when the PC was designed, and boy were they expensive for what they did when they shipped) but almost no one bought one until the late-286/early-386 era for general-purpose computing, by the 386 era the FPU was pretty much standard equipment on any 'real' business PC. So the software evolution was: no FPU support, optional FPU support, FPU required. (i.e. by the last generation of DOS applications, several major apps had dropped their floating point emulation libraries and would crap out with an error along the lines of 'coprocessor not detected/installed')
The evolution of the FPU was very similar to how the GPU has played out in terms of becoming a standard, expected component. As with the FPU, CAD and all sorts of other scientific/business applications were the initial drivers of the, but gaming is what caused unit volumes to explode and the reason it is now standard equipment.
They're optimized for single-precision linear algebra. If you need double-precision all those optimizations go out the window and you're left on your own.
> They're optimized for single-precision linear algebra. If you need double-precision all those optimizations go out the window and you're left on your own.
Not true at all on modern X86. You just get half the FLOPS by using double precision instead of single precision -- 16 double precision FLOPS per core per cycle. Which is very good considering there's also twice as much data to process.
You could rather say X86 CPUs are highly optimized for double precision performance.
TL;DR
To please the gaming market, CPUs develop large SIMD operations.
ChaCha uses SIMD so it gets faster. AES needs array lookups (for its S-Box) and gets stuck.
Maybe a better headline would be something like "How software crypto can be as fast as hardware crypto". I was curious about this after the WireGuard announcement so thanks to DJB for the explanation.
It's more that it's relatively easy to make some instruction useful to a variety of video game problems, but difficult to do the same for encryption or compression. You tend to end up with hardware support for specific standards.
Did you read the post? This is specifically addressed. The AES hardware support requires a bunch of die area specifically for that purpose and still isn't that performant. Smaller-area CPUs don't spend the area and perform abysmally on AES, and even in CPUs that do include AES-NI, Chacha achieves comparable performance for the same security margin without any custom hardware support, just using the general vector instructions added to improve game performance. DJB expects that because vector math continues to improve while AES hardware does not, Chacha will soon outperform AES even on devices with hardware support.
Thank you for pointlessly regurgitating much of his post?
The fact that Intel put an encryption feature in their chip, which does indeed make that algorithm faster, would tend to indicate they wanted faster encryption wouldn't it? That some other algorithm could be faster still isn't really contradicting that.
I'd wager that the goal wasn't so much speed (which is very rarely the issue) but security. It was way too hard to program a constant time AES implementation without AES-NI.
One important aspect DJB ignores is power efficiency. ChaCha achieves its high speed by using the CPU's vector units, which consume huge amounts of power when running at peak load. Dedicated AES-GCM hardware can achieve the same performance at a fraction of the power consumption, which is an important consideration for both mobile and datacenter applications.
Gamers generally don't care about power consumption. When you've spent $1000 on the hardware an extra dollar or two on your electricity bill is no big deal.
> CPU's vector units consume huge amounts of power when running at peak load. Dedicated AES-GCM hardware can achieve the same performance at a fraction of the power consumption
Citation needed. Where did you get that idea? Please show how djb's vector code spends more power vs the built-in AES "dedicated hardware" instruction when, as he measures:
"* Both ciphers are ~1.7 cycles/byte on Westmere (introduced 2010).
* Both ciphers are ~1.5 cycles/byte on Ivy Bridge (introduced 2012).
* Both ciphers are ~0.8 cycles/byte on Skylake (introduced 2015)."
"even though AES-192 has "hardware support", a
smaller key, a smaller block size, and smaller data limits" (his code is 256 bits and 12 rounds).
AVX is so hot that Intel CPUs may have to clock down ~200 MHz when executing heavy AVX code to stay within their power/thermal limits. I have no idea if this hits DJB's code in reality.
Thanks for the link, I can only find "when the processor detect AVX instruction additional voltage is applied, the processor can run hotter which can require the frequency to be reduced" but I don't see anywhere mentioned that the base frequency is 200 MHz. If you mean 200 MHz lower than TDP marked frequency, but processing twice as much data, it doesn't sound so bad, it's still 1.7 times more power efficient than the shorter instructions spending twice as much time at the marked TDP frequency. And I'd be surprised that AES is magically not needing serious processing too. Otherwise it would be already implemented to be much faster than it is now.
It really depends on your instruction mix. If only one in twenty instruction uses AVX, the rest of your instructions are running slower due to the lower clock and they aren't getting double the throughput. On top of that it could be some other thread using AVX, clocking down the entire core and harming the given thread that isn't using AVX.
Intel has done a lot of things to try to balance this. One of those things is they don't even bother turning half the vector unit on unless you use it a lot. If you seldom issue an op with 512-bit operands, the CPU will actually dispatch them as multiple 256-bit operations, in which case you won't incur the drop in clock, but you also don't get the supposed benefit of double throughput. Furthermore the performance may be much worse if the CPU decides to turn up the remaining vector bits, because the clock drops dramatically while those units are charging up.
So you can see that for someone trying to wring out every last bit of performance on a recent Intel CPU using all the advertised vector capabilities, optimization can become quite complicated.
AES-NI uses the XMM register bank, but not necessarily any vector execution unit. Furthermore, it only uses the lower 128 bits of vector registers, whereas the linked document is referring to instructions that use the upper lanes as well, i.e. YMM or ZMM registers.
The parts of the CPU that are not in use can be turned off, so making full use of the vector units can raise the temperature of the CPU a fair bit compared to less demanding code.
He's also comparing 12 rounds of ChaCha with 12-round AES, which may be fair, but realistically no one uses ChaCha12, they use ChaCha20.
I thought modern video games are predominantly limited by GPU performance? Maybe the argument is that while usually CPU performance isn't the most important part of the equation, video gamers base their purchasing decision on misguided benchmarks that expose it.
The big CPU hog and prime candidates for these vector operations nowadays seems to be video encoding.
Actually GPU's are bottlenecked by CPU's these days (low level API's will help a bit), GTX 1080's not to mention the Titan X are a really good example of hitting a CPU bottleneck like a brick wall at higher resolutions and higher core frequencies.
Because of the single core performance unless you are playing a really CPU intensive game which has been optimized for multiple cores (more than 4) (or running an insane SLI setup 3-4 way and need the extra PCIE lanes) the 6700K with as high overclock as possible (4.6-4.7ghz) is pretty much the bare minimum for GTX 1080 or better GPU's and even that isn't enough.
Physics are pretty cheap unless your game is a physics simulator.
CPU performance is important when there is a lot going on (most of the frame latency is CPU side), and when the CPU's are really busy dealing with other stuff like heavy AI or tons of NPC's you get poor frame rates and scalability with GPU processing power.
Good examples of real CPU hogs would be the latest Civ games as well as games like Assassin's Creed Unity.
ACU is especially a CPU killer, if your CPU OC is stable playing ACU it's really stable, I've seen that game cause CPU's that are not technically overclocked crash on their normal boost clock when the memory XPS profile was loaded.
I'm not saying they aren't :) It's just not every type of physics.
Both of them are more or less indie titles also don't expect great optimizations from them.
Minecraft is also CPU bottlenecked pretty much but that's because well the game was built with Java :)
Even though the JVM probably gets in the way of the kind of low-level optimizations we are talking about here, Java is not the main factor for the CPU usage.
All those voxels take up a good chunk (heh) of processing time. They are not all static, you know. Not sure how much of that could be shifted to the GPU.
No part of a Unity game runs in a .NET VM or any other VM.
They chose C# as the scripting language because C# is one of the most popular programming languages, it's extremely popular in the non-game dev development community, and it's probably the only non-Web language that most code academies teach for traditional development maybe other than Java.
It's syntax is also pretty close to C and C++ which means developers with game dev background will feel at home as most game development is done in C++.
Unreal Engine uses Unreal Script which is now pretty much C++ but it is also not compiled directly (although with Unreal Engine 4 and onwards it's much closer to direct compile than any other scripting language).
Unity engine has it's own interpreter which then builds highly optimized C++ code and compiles it when you build the game.
Unity Engine is a pretty decent engine with kickass performance when optimized, without fine optimization any general purpose engine including Unreal 4 acts like utter crap.
I'm alpha/beta testing a few UE4 games atm and you can see just how bad performance can get even on a solid defacto industry standard like UE4 like when dynamic shadows tank a GTX Titan X (Maxwell) SLI setup to below 20 fps any time there are light sources that are not properly fenced and culled - e.g. explosions.
Your last paragraph ticks me off so much about the current non-sequetor "industry standard". Most recent example I can give is that doesn't really care is EDF 4.1. It takes carpet bombing an entire city to make its FPS dip with hundreds if not thousands of gaint incest gibs (and four players) being flung across the map.
Do they really need a bazillion shaders and dynamic shadows on everything?
You sure? I was playing with Unity back in 2009 and you scripted both game and IDE atop mono. The .NET flavour of JS was being pushed, with code examples additionally in C# and Boo (!). I preferred to use F#.
I'm surprised by how many people think Unity games are "basically C#". That's like saying Unreal Engine games are coded in Lua. Like Lua, C# is nothing but the scripting language. The Unity game engine that does all of the heavy lifting is coded in C++.
Occasionally you can, but it comes with caveats, even for PhysX:
• Only works on Windows
• Only works with Nvidia graphics cards (and only them, cheap nvidia card for physx + AMD card for graphics will disable PhysX)
• Not guaranteed to be faster for all cases, needs to be evaluated on a case-by-case basis
So even if the physics engine can, not all game engines use it. The Unity devs e.g. stated that they won't bother with it, as the limitations make it unattractive to pour effort into.
You can. The problem is that there is a multi-frame (33ms * 2 or so) delay in getting any results back to the CPU. The GPU is set up for streaming, you compile command lists dynamically and feed it to it, this means it usually has at the very least 1 command list in execution and 1 being built on the CPU (the GPU is always kept busy, stalls are cycles going to waste). Hence the delay in getting results back.
And you will need some of those results on the CPU.
Physics for particles is not uncommon to be done on the GPU though. There is no feed-back to the CPU required so latency becomes a non-issue.
Sometimes, yes. Other times, no. If you're using an explicit integrator like a Runge–Kutta method such as RK4, then the answer is yes and the algorithms map pretty well onto a GPU. In that way, GPUs have been a huge boon to the scientific computing world. Though, personally, I find them a pain to program. However, if you use an implicit integrator like a backward Euler method, the answer is no because we need to solve a linear system. Yes, if we have a well defined problem it may be possible to design a custom preconditioner based on the physics of the problem that maps really well to a GPU. Those preconditioners take a huge amount of work to develop and, honestly, if you look at the last step of something like a multilevel method, it comes down to a direct factorization. Basically, direct dense and sparse factorizations do not currently map well to a GPU and these factorizations are extremely important to many scientific problems. Specifically, implicit time integrators depend on them to solve the linear systems involved and we need implicit integrators for stiff systems. Outside of integration, there are other situations where GPUs don't work well. In large scale optimization with equality constraints, there are linear systems that need to be solved and, most of the time, we need a direct factorization to solve these systems. That's why reduced spaced methods for things like parameter estimation are so popular. Basically, they're null space algorithms that eliminate the equality constraints by doing an implicit linear system solve using an explicit time integrator, which can map to something like a GPU. This leads to a host of other problems, but at least we can scale, sort of, sometimes.
At the end of the day, most scientific problems based on continuum mechanics need really fast level 1, 2, and 3 BLAS operations. That's mostly enough for simulating physics based on explicit time integrators. For elliptic problems or problems that require implicit time integrators, we need factorizations. Most of the time, we can get away with LU, Choleski, QR, and SVD. Both dense and sparse are required. For optimization with equality constraints, factorizations are also required.
By the way, if anyone wants to figure out how to do faster factorizations, dense and sparse, on whatever new hardware is coming out, that'd have an enormous impact on the scientific community. There are people working on it. There's not a lot of them.
Collision detection and response are the hard parts and they are poorly suited for GPU implementation. Some specific cases (fluids, particles, maybe cloth) that don't require these in full work quite nicely and are used often enough (see common PhysX usage)
I haven't been CPU performance locked in a PC game for over 6 years - I have upgraded my GPU every 2 years and I am finally approaching the point to which I'd gain a framerate benefit from upgrading my Sandy Bridge i7 920 (I have a GTX 980 ti GPU now).
There are several outliers to this - some games are VERY cpu dependant (supreme commander, MMO's like planetside 2), and some which are EXTREMELY inefficient games (Minecraft) which require an order of magnitude more CPU power than it really needs due to architecture problems.
Big modern games are CPU limited mostly. GPU work can be scaled down very easily (even dynamically!) by reducing the rendering resolution and as such can be made to fit the limits, up to a point.
Scaling the CPU side of things is a lot trickier. And although we get an increasing amount of cores in new CPUs, single core performance has not increased in the same way. Some things just don't want to be multi-threaded or must happen in-order. This then becomes the most limiting factor.
Well it is, but if your CPU is getting its job done 1ms sooner, then the GPU now has an additional 1ms to do the rendering while delivering the same deadline. Which probably means it'll render more frames.
Bottlenecking just means that improving the bottlenecked thing will result in the biggest improvement. The second-biggest performance limiter can still be a significant performance limiter.
Aside from concatenating & storing strings, the prime uses of these big clusters are machine learning and data analytics, both of which have very similar instruction patterns to games. Like another commenter on this article pointed out, if you make single-precision linear algebra fast, you'll cover the vast majority of cases where people actually need a faster CPU.
That's fine, but it is not the same thing as saying gamers drive the hardware. Gamers are benefiting because their games need the same things the big clusters do.
Not really, as a sibling pointed out. The core microarchitecture is the same. The difference is in ancillary on-chip features like memory and bus controllers.
...at enormously different price/flop, basically because it restricts RAM size and disables ECC in the Core chips. It's why we need AMD's Zen to be competitive again, so that this price gouging ends. Same for Tesla/Geforce at Nvidia.
They do if you take into account total numbers. It's a market of many many billions USD value with millions of users. Sure, each datacenter individually spends a lot but we're talking a difference in the amount of individual users probably in the 5 orders of magnitude range here.
What do you mean, "in numbers"? As I pointed out elsewhere, the number of distinct customers is irrelevant; the number of chips sold is relevant, and I do not believe gamers buy more chips than compute farms.
There has been no need to upgrade yearly for games since about the Wolfdale era, so those types of people are getting more and more rare, and are motivated mostly by spec sheets rather than real-world performance.
Many data centers go through a process of constant expansion and renewal, as they are competing with others and there is a larger financial incentive for them to do so.
VR has been causing a small resurgence in upgrades this year, since you need a fairly high-end graphics card for it. I.e. you need to have spent the price of a cheap computer on your GPU if you bought it last-gen, or somewhere above $250 if you're buying right now.
Similarly, 4k displays require a lot of GPU power, though I think it's viewed as even more frivolous / excessive than building for VR.
The number of gamers who upgrade yearly is fairly small, and the number who upgrade their base platform (CPU + mobo) yearly is even smaller. I would be very surprised if there were more than a million in the latter category.
As for datacenters and compute farms, the large ones are upgrading portions of their systems yearly, if not expanding on top of that.
And because CPUs are optimized for both gamers and Windows, the world has access to lots of cheap, powerful hardware. I'm not a Microsoft fan, but I'm very appreciative to them for making this ecosystem possible.
Not sure about Power8 as I wasn't able to find anything conclusive. But if you believe Oracle's marketing efforts, the SPARC chips do much better than Power8 and Intel on that front.
I remember one x86 from VIA (2002 or so) that had a crypto accelerator unit, but, like with AVX, you have to write code specifically for it. It's the same as with SPARC or POWER (or xSeries) - you shouldn't expect them to be designed to be fast on specific crypto algorithms when using generic instructions.
Isn't this exactly why HSM's exist - to provide optimised hardware crypto functionality?
Honestly I would treat this the same as eg Ethernet - high end cards have hardware offload capabilities that the software stack can utilise to get better performance.
I really find it hard to believe that people for whom such an interest in security at the CPU level would buy "retail" processors like you and me have access to. I am no expert in the field but it just seems weird that there isn't a market for and producer of specialized processors that are more militarized or something. Why does everyone have access to the same Intel chips? I doubt that's actually the case. Am I wrong?
> I really find it hard to believe that people for whom such an interest in security at the CPU level would buy "retail" processors like you and me have access to.
DJB's interest here is specifically in creating algorithms that work well on general-purpose popular CPUs.
ARMA III could be the good example of CPU bottleneck. Or maybe it is badly optimized... Then we hit the hot topic of multicore vs singlecore performance.
In Arma 3 the most critical part, the AI logic, runs in a single thread which means AMD's CPU with low single core performance often struggle to reach 60fps.
One of the major problems with Arma 3 is that due to it's simulation nature, it has to simulate everything, rather than just a bubble around the player and then cutting corners everywhere else. This means it inherently must use a whole lot more CPU than your average videogame.
It's quite possibly badly optimised too, but even if it weren't, Arma would eat CPU like crazy.
The form-factor for laptop screens are built for media consumption, even though the square form-factor is superior for productivity (I found an old Sony Vaio and the screen form-factor felt very pleasant). Seems the general consumption of media has dominated CPU design in addition to everything else in our computers.
Perhaps that was true in the mid 90s, but today Intel optimizes x86_64 for its highest margin core business: server/datacenter workloads. Any resulting benefit to desktop PC gaming is appreciated, but it's a side effect rather than a primary design goal.
Some stories from back around 2000 when designing CPUs at Intel. Some people did bemoan the fact the few software actually needed the performance in the processors we were building. One of the benchmarks where the performance is actually needed was ripping DVDs. That lead to the unofficial saying "The future of CPU performance is in copyright infringement."
(Not seriously, mind you)
However, here is a case where the CPUs were actually modified to improve one certain program.
"We ran these simulation models on either interactive workstations or compute servers – initially, these were legacy IBM RS6Ks running AIX, but over the course of the project we transitioned to using mostly Pentium® III based systems running Linux. The full-chip model ran at speeds ranging from 05-0.6 Hz on the oldest RS6K machines to 3-5 Hz on the Pentium® III based systems (we have recently started to deploy Pentium® 4 based systems into our computing pool and are seeing full-chip SRTL model simulation speeds of around 15 Hz on these machines)"
You can see that the P6-based processors (PIII) were a lot faster than the RS6K's and the
Wmt version (P4) was faster still? That program is csim and it is a program that does a
really dumb translation of the SRTL model of the chip (think verilog) to C code that then
gets compiled with GCC. (the Intel compiler choked) That code was huge and it had loops
with 2M basic blocks. It totally didn't fit in any instruction cache for processors. Most
processors assume they are running from the instruction cache and stall when reading from
memory. Since running csim is one of the testcases we used when evaluating performance
the frontend was designed to execute directly from memory. The frontend would pipeline
cacheline fetches from memory which the decoders would unpack in parallel. It could execute
at the memory read bandwidth. This was improved more on Wmt. This behavior probably helps
some other read programs now, but at the time this was the only case we saw where it really
mattered.
The end of the section is unrelated but fun:
"By tapeout we were averaging 5-6 billion cycles per week and had
accumulated over 200 billion (to be precise, 2.384 * 1011) SRTL
simulation cycles of all types. This may sound like a lot, but to
put it into perspective, it is roughly equivalent to 2 minutes on a
single 1 GHz CPU!"
Games were important but at the time most of the performance came from the graphics card.
In recent years Intel has improved the on-chip graphics and offloaded some of the 3d work to the processor using these vector extensions. That is to reclaim the money going to the graphic card companies.
tl;dr: AES uses branches and is not optimized for vectorization. Other (newer) algorithms are designed with branchless vectorization in mind, which makes specialized hardware instructions unnecessary.
And what if games are better (or worse) optimised for certain type of hardware? So that way, you spend on new Intel CPU every 3 years. So the point is, what if some games are badly optimisied and run bad on certain hardware on purpose. Maybe it sounds like a conspiracy theory. But look, CPUs are stalling, Intel wants to sell it's things every year, what if they come to developers and say "Look make your game run 10% better on our latest hardware and we give you money"?
Sometimes I can't help but wonder how the world where there is no need to spend endless billions on "cybersecurity", "infosec" would look like. Perhaps these billions would be used to create more value for the people. I find it insane that so much money and manpower is spent on scrambling the data to "secure" it from vandal-ish script kiddies (sometimes hired by governments), there is definitely something unhealthy about it.