Yep, skilled people have options. When you are treating them as circus animals that constantly need to prove that they are worth the cage they will be put in and the food they will be fed, they just leave the circus.
Smart professional people desire non-hostile space where they can build a life. When a Russian scientist or Iranian doctor left their countries for London or Paris, they were't calculating for a net income increase, they were running away from an environment that didn't show a promise to allow them realize themselves. Lot's of white collar people are paid well below what they will make if they learn a JS library or do construction work because of their desire to fulfill themselves in a peaceful life and be respected. It is kind of similar to game devs being paid very little in respect to the complexity of the programming they do. If you break that magic, they aren't going to stay.
Some of it reminds me of the CCP, which I think is openly considered a model by some neo-authoritarians. Ubiquitous mass surveillance, social credit, and state capitalism with heavy control though regulatory pressure. I assume we will eventually see party men installed on boards of major companies, especially in media, tech, and entertainment.
The “tech right” is a major player here and a lot of those folks idolize China right now.
I think the US has been spiraling toward authoritarianism since 9/11 personally. This did not start yesterday or with the most recent election, nor is it exclusively the result of the right or the Republican Party. A lot of people to the left have also abandoned liberalism and ideas like free speech. There’s been a broad based shift away from liberalism and individualism and toward collectivism, which always leads toward totalitarianism.
Right wing collectivism comes in the form of racism and nationalism, while for the contemporary left its identity-grievance politics and a resurgence of Marxism.
“Why did everyone across the entire political spectrum abandon individualism in the 20-teens?” is one of the questions I keep asking.
I've thought a good bit about the "too large" side (author here under a different name). The magic rule for me has been to consider whether each member actually has a function within the community, in the sense that they contribute something unique to the group which is not contributed by anyone else, and for which the community would be hurt if they aren't there. The idea was originally put into my head by C.S. Lewis, in his work on Membership: "If you subtract any one member, you have not simply reduced the family in number; you have inflicted an injury on its structure. Its unity is a unity of unlikes, almost of incommensurables." E.g. if you were to remove a random poster from HN, it wouldn't affect anything much at all, because they end up a number. However, if you were to remove dang, their presence would be missed because they contribute to the uniqueness of HN's community in some way. IMO, if the group doesn't pass this test, you haven't actually found the real community yet.
Of course, all of this is quickly-written thoughts for a HN post. Maybe at some point I'll edit them down and post them properly, but I need to discuss it with more people so I'm sure my thoughts actually strike reality.
Mathematics pedagogy today is in a pretty sorrowful state due to bad actors and willful blindness at all levels that require public trust.
A dominant majority in public schools starting late 1970s seems to follow the "Lying to Children" approach which is often mistakenly recognized as by-rote teaching but are based in Paulo Freire's works that are in turn based on Mao's torture discoveries from the 1950s.
This approach contrary to classical approaches leverages torturous process which seems to be purposefully built to fracture and weed out the intelligent individual from useful fields, imposing sufficient thresholds of stress to impose PTSD or psychosis, selecting for and filtering in favor of those who can flexibly/willfully blind/corrupt themselves.
Such sequences include Algebra->Geometry->Trigonometry where gimmicks in undisclosed changes to grading cause circular trauma loops with the abandonment of Math-dependent careers thereafter, similar structures are also found in Uni, for Economics, Business, and Physics which utilize similar fail-scenarios burning bridges where you can't go back when the failure lagged from the first sequence, and you passed the second unrelated sequence. No help occurs, inducing confusion and frustration to PTSD levels, before the teacher offers the Alice in Wonderland Technique, "If you aren't able to do these things, perhaps you shouldn't go into a field that uses it". (ref Kubark Report, Declassified CIA Manual)
Have you been able to discern whether these "patterns" as you've called them aren't just the practical reversion to the classical approach (Trivium/Quadrivium)? Also known as the first-principles approach after all the filtering has been done.
To compare: Classical approaches start with nothing but a useful real system and observations which don't entrench false assumptions as truth, which are then reduced to components and relationships to form a model. The model is then checked for accuracy against current data to separate truth from false in those relationships/assertions in an iterative process with the end goal being to predict future events in similar systems accurately. The approach uses both a priori and a posteriori components to reasoning.
Lying to Children reverses and bastardizes this process. It starts with a single useless system which contains equal parts true and false principles (as misleading assumptions) which are tested and must be learned to competency (growing those neurons close together). Upon the next iteration one must unlearn the false parts while relearning the true parts (but we can't really unlearn, we can only strengthen or weaken) which in turn creates inconsistent mental states imposing stress (torture). This is repeated in an ongoing basis often circular in nature (structuring), and leveraging psychological blindspots (clustering), with several purposefully structured failings (elements) to gatekeep math through torturous process which is the basis for science and other risky subject matter. As the student progresses towards mastery (gnosis), the systems become increasingly more useful. One must repeatedly struggle in their sessions to learn, with the basis being if you aren't struggling you aren't learning. This mostly uses a faux a priori reasoning without properties of metaphysical objectivity (tied to objective measure, at least not until the very end).
If you don't recognize this, an example would be the electrical water pipe pressure analogy. Diffusion of charge in-like materials, with Intensity (Current) towards the outermost layer was the first-principled approach pre-1978 (I=V/R). The Water Analogy fails when the naive student tries to relate the behavior to pressure equations that ends up being contradictory at points in the system in a number of places introducing stumbling blocks that must be unlearned.
Torture being the purposefully directed imposition of psychological stress beyond a individuals capacity to cope towards physiological stages of heightened suggestability and mental breakdown (where rational thought is reduced or non-existent in the intelligent).
It is often recognized by its characteristic subgroups of Elements (cognitive dissonance, a lack of agency to remove oneself and coercion/compulsion with real or perceived loss or the threat thereof), Structuring (circular patterns of strictness followed by leniency in a loop, fractionation), and Clustering (psychological blindspots).
> Does "career development" just mean "more money"? If so, why not just say "there are opportunities to make more money"? If not, what is "career development" that is not just becoming more deeply buried in an organization with the various dysfunctions described in the rest of the post?
In life, everyone that thinks a lot is eventually confronted with the reality that we're all just minor players within much bigger systems. When you follow this thread, pretty deep questions start to fall out like "how can I be just in an unjust society?". Or "what's the best way that I, as an individual, can have a positive impact on my community?". Or "Is there any point in trying to change systems given my small role within them?".
To these types of questions there's various different responses and consequences. Some people dive in feet first and engage heavily with the mechanisms they have to enact change (such as local politics, grass roots political movements, activism etc). Some people, overwhelmed by the weight of the system, disengage entirely.
Now to answer your question, I believe in the work that we're doing (or else I probably wouldn't have joined). Career development at the company isn't just more money (though that's obviously a component), it's being given more responsibilities alongside the capacity to enact more and more change.
Faced with a dysfunctional organisation that you're a part of, what do you do? The options as I see it are roughly:
- Change companies, and acknowledge that the dysfunction is insurmountable.
- Do your job and stay at the position you're in.
- Embed deeper into the dysfunctional organisation, with the view that you can be an agent for positive change.
>Is it still satisfying if that software is bad, or harms many of those people?
To some people, yes. There are people out there that take satisfaction in doing harm. Not me, nor do I believe the work I do is harmful. I didn't think I had to be so granular as to say "It's satisfying to write software I believe is a net positive to society used by millions".
> that disconnect exists because people are mixing up terminology that took engineers years to define properly.
This is one of the larger trends I've observed in about 10 years of the software industry. A lot of these terms are really the crystallization of discussions at the water cooler, expositions in books or articles, or on technical fora like these, that span months if not years and thousands upon thousands of words. A veteran utters the word and immediately all the past conversations he's had regarding this topic come to mind.
Newer cohorts come in, and, not having been privy to those discussions, latch on to the jargon in a mimetic attempt to stochastically parrot the experts, but don't have the substance underlying the word - they only have the word itself. Now it gets thrown around as an ill-defined, ill-specified buzzword that means multiple different things to multiple people, none of whom can clarify what exactly the definition of that word is, what it means to them, because they were never part of the discourse, the oral or written tradition, in the first place, and don't understand the meaning of that word in context, its usage.
"Agile." "Technical debt." "DevOps." And now, "vibe coding." There was an article here on HN [0] [1] discussing semantic drift of the term "vibe coding" and how it now means something different from what was originally intended; I will merely point out that this is par for the course in software.
For other, more technical, examples of linguistic sloppiness: see JavaScript's conflation of objects, JSON, dictionaries, and hashmaps; to the computer scientist, you have the compositional primitive from object-oriented programming, the JavaScript Object Notation for serialization, the abstract data type, and the concrete data structure, respectively. To the JavaScript programmer, you just have "objects," and the fidelity of your linguistic and conceptual space has been reduced to a single pixel instead of something with more resolution and nuance.
Something interesting is happening. A false narrative is spreading online, pushed by people who know little about engineering, and others who should know better.
They claim junior devs are now 10x more productive, and project managers are shipping code themselves. Now, close your eyes for five seconds and try to picture what that code looks like. It's 100% legacy, disposable code.
The problem isn't AI, or PMs turning Figma into code, or junior devs prompting like mad. The real problem is the disconnect between expectations and outcomes. And that disconnect exists because people are mixing up terminology that took engineers years to define properly.
- A lean prototype is not the same as a disposable prototype
- An MVP is not the same as a lean prototype
- And a product is not the same as an MVP
A lean prototype is a starting point, a rough model used to test and refine an idea. If it works, it might evolve into an MVP. An MVP becomes a product once it proves the core assumptions and shows there's a real need in the market. And a disposable prototype is exactly that, something you throw away after initial use.
Vibing tools are great for building disposable prototypes, and LLM-assisted IDEs are better for creating actual products. Right now, only engineers are able to create lean prototypes using LLM prompts outside the IDE. Everyone else is just building simple (and working?) software on top of disposable code.
“The fallacy in these versions of the same idea is perhaps the most pervasive of all fallacies in philosophy. So common is it that one questions whether it might not be called the philosophical fallacy. It consists in the supposition that whatever is found true under certain conditions may forthwith be asserted universally or without limits and conditions. Because a thirsty man gets satisfaction in drinking water, bliss consists in being drowned. Because the success of any particular struggle is measured by reaching a point of frictionless action, therefore there is such a thing as an all-inclusive end of effortless smooth activity endlessly maintained.
It is forgotten that success is success of a specific effort, and satisfaction the fulfillment of a specific demand, so that success and satisfaction become meaningless when severed from the wants and struggles whose consummations they arc, or when taken universally.”
Because it takes 2-3 decades to build up the battery industry and supply chain.
Copying BYD is what Tesla is doing to some extent (BYD also copied aspects of Tesla). They are trying to design their own batteries, build their own plants and so on. But Tesla started going in that direction around 2017 and they were still a pretty small company then. Its crazy that Tesla dared go in that direction at all.
BYD came from the other direction, it was started as a battery manufacturing company. And specifically LFP type batteries, ironically a technology pioneered in the West. Tesla already has NMC batteries, that was their first major bet. But turns out, for the great mass of cars, LFP batteries are what you want, and China was already dominating in LFP. In China lower range cars had more of a market, so China adopted LFP aggressively while the West was still focusing on longer range models. Tesla is currently setting up its own manufacturing plant for LFP.
But BYD is part of China larger strategy for their automotive export industry. They realized that alternative fuel vehicles (they weren't sure what exactly) would eventually replace current technology and they hopped, the current industry. Battery electric turned out to be the clear winner in this and they pushed the industry forward, everything from the mines to all the complex refining steps. Lots of long term investments from China's investment banks, lead by the state but with lots of private investment as well. Initially they gave automakers from China subsidies and then eventually the reduced subsidies making it clear that they wanted a competitive industry, and the car makers had to sink or swim. Out of that China now has the two of the largest battery producers on the planet, CATL and BYD, completely dominating the battery market. This is not something you just copy, this is 20-30 years of work.
Now, why did other companies like Toyota or VW or GM not do this. Now Toyota and Japan in general was completely obsessed with hydrogen and didn't really believe in battery electric, and arguably still don't. VW and GM were large assembly companies, they looked at batteries like steel, or seats or something. They outsourced all that stuff and by the time they actually realized battery electric was the future, about 15 years after China, it was clear that these companies were simply not set up to do vertical integration, it was literally the opposite thing they had been doing for 30 years. They also had massive software issues that they also need to figure out at the same time. And when they tried they mostly just invested in battery cell assembly plants with partners, they had the finances but instead of spending 15 years investing in battery supply chain, they gave it to shareholders or invested in the wrong things.
Now the other maybe more interesting question is, why did other battery companies not do the same as BYD. Panasonic is from Japan and despite building the first true automotive size factory with Tesla, they only believed in the technology to a limited extend and didn't go after that market as aggressively as they should have, and thus have lost market share every year. LG came to batteries from the electronics side and they focused on being a major supplier, who in their estimation made more money, and also Korea car industry was already dominated by Hyundai so they would compete with their own countries champion.
This really is just an example of China long term strategy paying off. The focused on 'New Energy' Vehicles, and instead of focusing on one thing like Japan, they reinforced success continuously when batteries turned out the best. Of course they were also lucky, their market happened to be much more open to smaller lower range cars, making LFP batteries an option, while US LFP companies failed to get investment.
In the West, emission regulations mostly targeted improving efficiency of ICE engines and no long term supply chain investment at all was done, especially for mining.
So simply 'copying' that isn't that easy. As somebody that has watched this for a long time. It mostly just seems like western companies and western politics is consistently 10-20 years behind the curve. Tesla is the only company that was really on the ball, but it takes 10+ years to scale the old legacy car building no matter if your ideas are correct, but the danger Tesla also pushed China into being even more aggressive competing with Tesla.
The best advice I heard about the switch from regular engineer to mgmt role was this: A software engineer works to build harmonious productive stable software systems. An engineering manager works to build harmonious productive stable people systems. The product is different.
So many people who move into management don't make this mental shift and keep their hands on the steering wheel because that's how they got there in the first place. But if you don't let others drive, you won't be able to keep the car on the road for long. If you go into that role your job is to mostly get your hands off the nuts and bolts of the software system and create people systems of motivation, trust, and organization to get your team members working well on them instead.
I've never made the transition myself but have increasingly thought about it.
It's been my experience that I need to take full responsibility for the effectiveness of my communications.
A few years ago, I threw together a PowerPoint show, based on Randall Munroe's Communication comic[0]. I did it for an organization I participate in, that is full of some of the worst communicators I've ever encountered.
It astounds me, how people that get paid to communicate, don't understand the fundamentals.
>Import substitution is a policy by which the state aims to increase the consumption of goods that are made domestically by levying high tariffs on foreign goods. This gives an advantage to the domestic manufacturers as their goods will be cheaper and preferable in the market compared to foreign products. India adopted this model post-independence, and it continued till the 1991 reforms. Due to import substitution, the domestic producers captured the entire Indian market, but there was slow progress in technological advancements, and the quality of Indian products was inferior to the foreign manufactured ones. But after the reforms, the Indian market was opened to everyone, and the consumer got the best value for the price he paid. The Make in India policy of the present government is reminiscent of the pre-1991 inward-looking Indian state.
In the US it will be even worse. The US is already high-tech economy outsourcing low value-adding manufacturing to foreign countries while industries move towards higher value-adding products. After the tariffs, US manufacturing sector will sift to lower value-added, lower complexity products.
What's interesting is the speaker in the poem is predicting their own inevitable ("with a sigh") reality distortion field, i.e. is being cynical about themself. So it is a poem about self-serving bias, despite the fact that we are perfectly capable of checking ourselves. Our biases aren't entirely subconscious. We can be willful about it, willfully look the other way so our biases remain "subconscious" and thus not our responsibility, making it easier for our conscious side to convince itself that its self-esteem is legit.
And then there is a meta-confirmation of this bias: The way people commonly interpret this poem. If Frost had that in mind all along, then he/the poem is genius.
There are a few different approaches. Meta documents at least one approach quite well in one of their llama papers.
The general gist is that you have some kind of adapter layers/model that can take an image and encode it into tokens. You then train the model on a dataset that has interleaved text and images. Could be webpages, where images occur in-between blocks of text, chat logs where people send text messages and images back and forth, etc.
The LLM gets trained more-or-less like normal, predicting next token probabilities with minor adjustments for the image tokens depending on the exact architecture. Some approaches have the image generation be a separate "path" through the LLM, where a lot of weights are shared but some image token specific weights are activated. Some approaches do just next token prediction, others have the LLM predict the entire image at once.
As for encoding-decoding, some research has used things as simple as Stable Diffusion's VAE to encode the image, split up the output, and do a simple projection into token space. Others have used raw pixels. But I think the more common approach is to have a dedicated model trained at the same time that learns to encode and decode images to and from token space.
For the latter approach, this can be a simple model, or it can be a diffusion model. For encoding you do something like a ViT. For decoding you train a diffusion model conditioned on the tokens, throughout the training of the LLM.
For the diffusion approach, you'd usually do post-training on the diffusion decoder to shrink down the number of diffusion steps needed.
The real crutch of these models is the dataset. Pretraining on the internet is not bad, since there's often good correlation between the text and the images. But there's not really good instruction datasets for this. Like, "here's an image, draw it like a comic book" type stuff. Given OpenAI's approach in the past, they may have just bruteforced the dataset using lots of human workers. That seems to be the most likely approach anyway, since no public vision models are quite good enough to do extensive RL against.
And as for OpenAI's architecture here, we can only speculate. The "loading from top to be from a blurry image" is either a direct result of their architecture or a gimmick to slow down requests. If the former, it means they are able to get a low resolution version of the image quickly, and then slowly generate the higher resolution "in order." Since it's top-to-bottom that implies token-by-token decoding. My _guess_ is that the LLM's image token predictions are only "good enough." So they have a small, quick decoder take those and generate a very low resolution base image. Then they run a stronger decoding model, likely a token-by-token diffusion model. It takes as condition the image tokens and the low resolution image, and diffuses the first patch of the image. Then it takes as condition the same plus the decoded patch, and diffuses the next patch. And so forth.
A mixture of approaches like that allows the LLM to be truly multi-modal without the image tokens being too expensive, and the token-by-token diffusion approach helps offset memory cost of diffusing the whole image.
I don't recall if I've seen token-by-token diffusion in a published paper, but it's feasible and is the best guess I have given the information we can see.
EDIT: I should note, I've been "fooled" in the past by OpenAI's API. When o* models first came out, they all behaved as if the output were generated "all at once." There was no streaming, and in the chat client the response would just show up once reasoning was done. This led me to believe they were doing an approach where the reasoning model would generate a response and refine it as it reasoned. But that's clearly not the case, since they enabled streaming :P So take my guesses with a huge grain of salt.
I'm unlikely to write a book, but here are a few more tidbits that come to mind.
Re the above -- I don't mean to imply that any of this is malicious or even conscious on anyone's behalf. I suspect it is for a few people, but I bet most people could pass a lie detector test that they care about their OKRs and the OKRs of their reports. They really, really believe it. But they don't act it. Our brains are really good at fooling us! I used to think that corporate politics is a consequence of malevolent actors. That might be true to some degree, but mostly politics just arises. People overtly profess whatever they need to overtly profess, and then go on to covertly follow emergent incentives. Lots of misunderstandings happen that way -- if you confront them about a violation of an agreement (say, during performance reviews), they'll be genuinely surprised and will invent really good reasons for everything (other than the obvious one, of course). It's basically watching Elephant In The Brain[1] play out right in front of your eyes.
Every manager wants to grow their team so they can split it into multiple teams so they can say they ran a group.
When there is a lot of money involved, people self-select into your company who view their jobs as basically to extract as much money as possible. This is especially true at the higher rungs. VP of marketing? Nope, professional money extractor. VP of engineering? Nope, professional money extractor too. You might think -- don't hire them. You can't! It doesn't matter how good the founders are, these people have spent their entire lifetimes perfecting their veneer. At that level they're the best in the world at it. Doesn't matter how good the founders are, they'll self select some of these people who will slip past their psychology. You might think -- fire them. Not so easy! They're good at embedding themselves into the org, they're good at slipping past the founders's radars, and they're high up so half their job is recruiting. They'll have dozens of cronies running around your company within a month or two.
From the founders's perspective the org is basically an overactive genie. It will do what you say, but not what you mean. Want to increase sales in two quarters? No problem, sales increased. Oh, and we also subtly destroyed our customers's trust. Once the steaks are high, founders basically have to treat their org as an adversarial agent. You might think -- but a good founder will notice! Doesn't matter how good you are -- you've selected world class politicians that are good at getting past your exact psychological makeup. Anthropic principle!
There's lots of stuff like this that you'd never think of in a million years, but is super-obvious once you've experienced it. And amazingly, in spite of all of this (or maybe because of it?) everything still works!
> I bet that WhatsApp is one of the rare services you use which actually deployed servers to Australia. To me, 200ms is a telltale sign of intercontinental traffic.
So, I used to work at WhatsApp. And we got this kind of praise when we only had servers in Reston, Virginia (not at aws us-east1, but in the same neighborhood). Nowadays, Facebook is most likely terminating connections in Australia, but messaging most likely goes through another continent. Calling within Australia should stay local though (either p2p or through a nearby relay).
There's lots of things WhatsApp does to improve experience on low quality networks that other services don't (even when we worked in the same buildings and told them they should consider things!)
In no particular order:
0) offline first, phone is the source of truth, although there's multi-device now. You don't need to be online to read messages you have, or to write messages to be sent whenever you're online. Email used to work like this for everyone; and it was no big deal to grab mail once in a while, read it and reply, and then send in a batch. Online messaging is great, if you can, but for things like being on a commuter train where connectivity ebbs and flows, it's nice to pick up messages when you can.
a) hardcode fallback ips for when DNS doesn't work (not if)
b) setup "0rtt" fast resume, so you can start getting messages on the second round trip. This is part of noise pipes or whatever they're called, and tls 1.3
c) do reasonable-ish things to work with MTU. In the old days, FreeBSD reflected the client MSS back to it, which helps when there's a tunnel like PPPoE and it only modifies outgoing syns and not incoming syn+ack. Linux never did that, and afaik, FreeBSD took it out. Behind Facebook infrastructure, they just hardcode the mss for i think 1480 MTU (you can/should check with tcpdump). I did some limited testing, and really the best results come from monitoring for /24's with bad behavior (it's pretty easy, if you look for it --- never got any large packets and packet gaps are a multiple of MSS - space for tcp timestamps) and then sending back client - 20 to those; you could also just always send back client - 20. I think Android finally started doing pMTUD blackhole detection stuff a couple years back, Apple has been doing it really well for longer. Path MTU Discovery is still an issue, and anything you can do to make it happier is good.
d) connect in the background to exchange messages when possible. Don't post notifications unless the message content is on the device. Don't be one of those apps that can only load messsages from the network when the app is in the foreground, because the user might not have connectivity then
e) prioritize messages over telemetry. Don't measure everything, only measure things when you know what you'll do with the numbers. Everybody hates telemetry, but it can be super useful as a developer. But if you've got giant telemetry packs to upload, that's bad by itself, and if you do them before you get messages in and out, you're failing the user.
f) pay attention to how big things are on the wire. Not everything needs to get shrunk as much as possible, but login needs to be very tight, and message sending should be too. IMHO, http and json and xml are too bulky for those, but are ok for multimedia because the payload is big so framing doesn't matter as much, and they're ok for low volume services because they're low volume.
I had the same issue, and I just caught up over the weekend. Three books I can recommend to get up to speed:
- NumPy basics pdf - first 2-3 chapters
- Deep Learning with PyTorch by Voight Godoy [2] - first 2-3 chapters if you had experience with neural networks, or the whole of it if you didn't.
With the above, you will get the basics to understand this book about transformers, and the architecture of the models, and everything else, from this book:
> does anyone know of ML directions that could add any kind of factual confidence level to ChatGPT and similar?
Yes. It's a very active area of research. For example:
Discovering Latent Knowledge in Language Models Without Supervision (https://arxiv.org/abs/2212.03827) shows an unsupervised approach for probing a LLM to discover things it thinks are facts
Language Models as Knowledge Bases? (https://aclanthology.org/D19-1250.pdf) is some slightly older work exploring how well LLMs store factual information itself.
Smart professional people desire non-hostile space where they can build a life. When a Russian scientist or Iranian doctor left their countries for London or Paris, they were't calculating for a net income increase, they were running away from an environment that didn't show a promise to allow them realize themselves. Lot's of white collar people are paid well below what they will make if they learn a JS library or do construction work because of their desire to fulfill themselves in a peaceful life and be respected. It is kind of similar to game devs being paid very little in respect to the complexity of the programming they do. If you break that magic, they aren't going to stay.