More

agapon · 2024-09-18T09:22:33 1726651353

Hamas terrorists boarding commercial airplanes? With their secret pagers on them?

Somehow I don't think so.

agapon · 2024-08-26T21:27:02 1724707622

Well, if growing or breeding bigger oxen were as feasible as building bigger (mower powerful) computers was/is, perhaps people would take a different path? In other words, perhaps the analogy is flawed?

dpcx · 2024-08-26T21:39:13 1724708353

At some point, a bigger computer either doesn't exist or is too cost prohibitive to get. But getting lots of "small" computers is somewhat easier.

toast0 · 2024-08-27T06:02:36 1724738556

Sure, but it's important to notice that the biggest off the shelf computers keep getting bigger.

Dual socket Epyc is pretty big these days.

If you can fit your job on one box (+ spares, as needed), you can save a whole lot of complexity vs spreading it over several.

It's always worth considering what you can fit on one box with 192-256 cores, 12TB of ram, and whatever storage you can attach to 256 lanes of PCIe 5.0 (minus however many lanes you need for network I/O).

You can probably go bigger with exotic computers, but if you have bottlenecks with the biggest off the shelf computer you can get, you might be better of scaling horizontally, but assuming you aren't growing 4x a year, you should have plenty of notice that you're coming to the end of easy vertical scaling. And sometimes you get lucky and AMD or Intel makes a nicely timed release to get you some more room.

amy-petrik-214 · 2024-08-27T10:53:43 1724756023

exactly, that's what it is as we hit the end of moore's law (which we won't, but we'll hit the end as far as feature size shrinkage)... one of the optimizations they will do is rote trivial process optimization. So if the chip failure rate on the assembly line is 40% they drop it to 10%. Costs will drop accordingly, because there are x-fold more transistors per dollar, thus ensuring moores law.

Animats · 2024-08-27T02:44:57 1724726697

How to organize them is a hard problem. For general-purpose use, we have only three architectures today - shared memory multiprocessors, GPUs, and clusters. When Hopper gave that talk, people were proposing all sorts of non-shared memory multiprocessor setups. The ILLIAC IV and the BBN Butterfly predate that talk, while the NCube, the Transputer, and the Connection Machine followed it by a year or two. This was a hot topic at the time.

All of those were duds. Other than the PS3 Cell, also a dud, none of those architectures were built in quantity. They're really hard to program. You have to organize your program around the data transfer between neighbor units. It really works only for programs that have a spatial structure, such as finite element analysis, weather prediction, or fluid dynamics calculations for nuclear weapons design. Those were a big part of government computing when Hopper was active. They aren't a big part of computing today.

It's interesting that GPUs became generally useful beyond graphics. But that's another story.

mikaraento · 2024-08-27T08:36:03 1724747763

(Some) TPUs look more like those non-shared memory systems. The TPU has compute tiles with local memory and the program needs to deal with data transfer. However, the heavy lifting is left to the compiler, rather than the programmer.

Some TPUs are also structured around fixed dataflow (systolic arrays for matrix multiplication).

sillywalk · 2024-08-27T21:18:01 1724793481

> non-shared memory multiprocessor setups

It's not really comparable to the other examples you cite - the ncube/transputer/connection machine, in that it was programmed conventionally, not requiring a special parallel language, but Tandem's NonStop was this, starting in ~1976 or 1977. Loosely coupled, shared-nothing processors, communicating with messages over a pair of high-speed inter-processor busses. It was certainly a niche product, but not a dud. It's still around having been ported from a proprietary stack-machine to MIPS to Itanium to X86.

EDIT: I suppose it can be compared to a Single System Image cluster.

bunderbunder · 2024-08-26T21:59:42 1724709582

Sure, but, as I was (rather unpopularly) pointing out in another comment, that point was pretty hard to reach in 1982. Specifically the point where you've met both criteria: bigger computer is too cost prohibitive to get, and lots of smaller computers is easier. At the time of this lecture, parallel computers had a nasty tendency to achieve poorer real-world performance on practical applications than their sequential contemporaries, despite greater theoretical performance.

It's still kind of hard even now. To date in my career I've had more successes with improving existing systems' throughput by removing parallelism than I have by adding it. Amdahl's Law plus the memory hierarchy is one heck of a one-two punch.

PaulHoule · 2024-08-26T22:29:19 1724711359

In 1982 you still had "supercomputers" like

https://en.wikipedia.org/wiki/Cray_X-MP

because you could still make bipolar electronics that beat out mass-produced consumer electronics. By the mid 1990s even IBM abandoned bipolar mainframes and had to introduce parallelism so a cluster of (still slower) CMOS mainframes could replace a bipolar mainframe. This great book was written by someone who worked on this project

https://campi.cab.cnea.gov.ar/tocs/17291.pdf

and of course for large scale scientific computing it was clear that "clusters of rather ordinary nodes" like the

https://www.cscamm.umd.edu/facilities/computing/sp2/index.ht...

we had at Cornell were going to win (ours was way bigger) because they were scalable. (e.g. the way Cray himself saw it, a conventional supercomputer had to live within a small enough space that the cycle time was not unduly limited by the speed of light so that kind of supercomputer had to become physically smaller, not larger, to get faster)

Now for very specialized tasks like codebreaking, ASICs are a good answer and you'd probably stuff a large number of them into expansion cards into rather ordinary computers and clusters today possibly also have some ASICs for glue and communications such as

https://blogs.nvidia.com/blog/whats-a-dpu-data-processing-un...

----

The problem I see with people who attempt parallelism for the first time is that the task size has to be smaller than the overhead to transfer tasks between cores or nodes. That is, if you are processing most CSV files you can't round-robin assign rows to threads but 10,000 row chunks are probably fine. You usually get good results over a large range of chunk size but chunking is essential if you want most parallel jobs to really get a speedup. I find it frustrating as hell to see so many blog posts pushing the idea that some programming scheme like Actors is going to solve your problems and meeting people that treat chunking as a mere optimization you'll apply after the fact. My inclination is you can get the project done faster (human time) if you build in chunking right away but I've learned you just have to let people learn that lesson for themselves.

bunderbunder · 2024-08-26T23:17:43 1724714263

To your last point, it's been interesting to watch people struggle to effectively use technologies like Hadoop and Spark now that we've all moved to the cloud.

Originally, the whole point of the Hadoop architecture was that the data were pre-chunked and already sitting on the local storage of your compute nodes, so that the overhead to transfer at least that first map task was effectively zero, and your big data transfer cost was collecting all the (hopefully much smaller than your input data) results of that into one place in the reduce step.

Now we're in the cloud and the original data's all sitting in object storage. So shoving all your raw data through a tiny small slow network interface is an essential first step of any job, and it's not nearly so easy to get speedups that were as impressive as what people were doing 15 years ago.

That said I wouldn't want to go back. HDFS clusters were such a PITA to work with and I'm not the one paying the monthly AWS bill.

0xcde4c3db · 2024-08-26T23:07:06 1724713626

> The problem I see with people who attempt parallelism for the first time is that the task size has to be smaller than the overhead to transfer tasks between cores or nodes.

My big sticking point is that for some key classes of tasks, it's not clear that this is even possible. I've seen no credible reason to think that throwing more processors at the problem will ever build that one tool-generated template-heavy C++ file (IYKYK) in under a minute, or accurately simulate an old game console with a useful "fast forward" button, or fit an FPGA design before I decide to take a long coffee-and-HN break.

To be fair, some things that do parallelize well (e.g. large-scale finite element analysis, web servers) are extremely important. It's not as though these techniques and architectures and research projects are simply a waste of time. It's just that, like so many others before it, parallelism has been hyped for the past decade as "the" new computing paradigm that we've got to shove absolutely everything into, and I don't believe it.

bunderbunder · 2024-08-27T14:57:57 1724770677

It isn't for a great many tasks. Basically, whenever you're computing f(g(x)), you can't execute f and g concurrently.

What you can do is run g and h currently in something that looks like f(g(), h()). And you can vectorize.

A lot of early multiprocessor computers only gave you that last option. They had a special mode where you'd send exactly the same instructions to all of the CPUs, and the CPUs would be mapped to different memory. So in many respects it was more like a primitive version of SSE instructions than it is to what modern multiprocessor computers do.

dylan604 · 2024-08-26T23:45:37 1724715937

I was getting into 3D around the time the Pentium was out, and I took a lot of time looking at the price of a single Pentium computer or multiple used 486s. The logic being a mini render farm would still be faster than a single Pentium. Never pulled the trigger on either option

agapon · 2024-08-26T08:30:38 1724661038

Telegram blah blah, according to Telegram.

agapon · 2024-08-26T08:29:40 1724660980

Pavel Durov is a social media?

I thought he was a human being.

agapon · 2024-08-21T13:03:40 1724245420

https://www.youtube.com/watch?v=qdV0AdgjPgo

marxisttemp · 2024-08-21T17:38:45 1724261925

Go work for a defense contractor then, Mr. Righteous Slaughter

agapon · 2024-08-06T14:21:34 1722954094

How much would you value a soldier riding one of those?

agapon · 2024-07-08T21:02:23 1720472543

E.g., to press Ctrl-(Shift)-V and replace the selection with something you copied (to the buffer / clipboard) earlier?

kragen · 2024-07-08T23:53:15 1720482795

that doesn't work in terminal emulators because the terminal emulator doesn't know how to delete the selected text in whatever program drew it there (which may no longer even exist)

agapon · 2024-05-28T08:43:29 1716885809

There have been several Latin based scripts proposed for Ukrainian over many decades. So, if you haven't heard, you just haven't been interested.

https://en.wikipedia.org/wiki/Ukrainian_Latin_alphabet

agapon · on March 29, 2024

Hope you find comfort in fact that smart guys in China don't have such reservations and smart guys in North Korea don't have any options.

agapon · on Feb 6, 2024

Does Taiwan claim sovereignty over mainland China?

jowea · on Feb 6, 2024

Technically speaking yes. It's complicated.

mkl · on Feb 6, 2024

It did historically, and has not formally renounced its claim (which included Mongolia).