Hacker Newsnew | past | comments | ask | show | jobs | submit | troelsSteegin's commentslogin


A big assumption with this change is that the "Modular Open Systems Approach" (MOSA) [0] [1] will be adequate for integrating new systems developed and acquired under this "fast track". MOSA appears to be about 6 years old as a mandate [2] and is something that big contractors - SAIC, BAI, Palantir [3] - talk about. But, 6 years seems brand new in this sector. I'd be curious to see if LLM's have leverage for MOSA software system integrations.

[0] https://breakingdefense.com/tag/modular-open-systems-archite...

[1] https://www.dsp.dla.mil/Programs/MOSA/

[2] https://www.govinfo.gov/app/details/USCODE-2016-title10/USCO...

[3] https://blog.palantir.com/implementing-mosa-with-software-de...


This was a good read. I was struck by the quantity of nuanced and applied knowhow it took to build SmolLM3. I am curious about the rough cost it took to engineer and train SmolLM3 - at ~400 GPUS for a least a month, and, based on the set of book co-authors, 12 engineers for at least three months. Is $3-5M a fair ballpark number? The complement is how much experience, on average, the team members had doing ML and LLM training at scale before SmolLM3. The book is "up" on recent research, so I am surmising a phd-centric team each with multiple systems built. This is not commodity skill. What the book suggests to me is that an LLM applications start up would best focus on understanding the scope and knowhow for starting from post-training.


One could look at feed-forward decision trees as representing the idea that preferences are latent and immutable, and that the optimal branch is the truest expression of innate preferences. And, one could look at backpropagation as adjusting preferences to accomodate situational constraints -- or as learning to want what is good for you, where what is good for you is defined by some external or imposed metric. Tragically, Plath was unable to "backpropagate". Was attention all she needed?


> Arbitrary signals are essentially primate dominance tools.

What should I read to better understand this claim?

> LLMs are the equivalent of circles act using language.

Circled apes?


Basil Bernstein's 1973 studies comparing English and math comprehension differences in class. Halliday's Language and Society Vol 10 Primate Psychology Mastripietri Apes and Evolution Tuttle Symbolic Species Deacon Origin of Speech MacNeilage

That's the tip of the iceberg

edit: As CS doesn't understand the parasitic or viral aspects of language and simply idealizes it, it can't access it. It's more of a black box than the coding of these. I can't understand how CS assumed this would ever work. It makes no sense to exclude the very thing that language is and then automate it.


They should at least try to sell it. Fine line, though, between leverage and extortion.


Are "adversaries" broadly used in algorithm design? I've not seen that before. I'm used to edge cases and trying to break things, but an "adversary", especially white box, seems different.


Yes. There is a whole sector of algorithm design called online algorithms dedicated to studying algorithms that must make decisions without complete information. A common analysis technique proves the "competitive ratio" of an algorithm by analyzing its worst case performance against an adversary. In fact, this article was the analysis of one particular online problem. For a simple introduction, you can check out "the ski rental problem." More complex applications include things like task scheduling and gradient descent.

Adjacent to this topic is algorithms for two-player games, like minimax, which depend on imagining an adversary that plays perfect counter moves.

In a similar vein, in ML, there is a model called generative adversarial networks (GANs) in which 2 networks (a generator and discriminator) play a minimax game against each other, improving the capability of both models at once.


They are certainly used in anything cryptographic.

Here is a 2011 article about DOS attacks against web apps enable by hash table-based dicts: https://www.securityweek.com/hash-table-collision-attacks-co...

djb has long advocated “crit bit trees”, ie tries: https://cr.yp.to/critbit.html


It really depends on the particular group of algorithms. I'm only considering non-cryptographic algorithms here.

As a general rule, any algorithm that involves a hash or a random/arbitrary choice has historically been based on "assume no adversary" and even now it has only advanced to "assume an incompetent adversary".

By contrast, most tree-adjacent algorithms have always been vigilant against competent adversaries.


Really??

Quicksort, mergesort and heapsort are commonly analyzed with worst case / adversaries based decisions.

I know that binary trees (especially red-black trees, AVL trees and other self balancing trees) have huge studies into adversaries picking the worse case scenario.

And finally, error correction coding schemes / hamming distances and other data reliability (ex: CRC32 checks) have proofs based on the worst case adversary bounds.

-------

If anything, I'm struggling to think of a case where the adversary / worst case performance is NOT analyzed. In many cases, worst case bounds are easier to prove than average case... So I'd assume most people start with worst case analysis before moving to average case analysis


I think there's a distinction between worst-case and adversarial behavior.

For some types of problems, identifying worst-case behavior is straightforward. For example, in a hash table lookup the worst-case is when all keys hash to the same value. To me, it seems like overkill to think in terms of an intelligent adversary in that case.

But in the problem described here, the worst-case is harder to construct. Especially while exploring the solution space given that slight tweaks to the solution can significantly change the nature of the worst-case. Thinking of it as adversarial implies thinking in terms of algorithms that dynamically produce the worst-case rather than trying to just identify a static worst-case that is specific to one solution. I can imagine that approach significantly speeding up the search for more optimal solutions.


> Thinking of it as adversarial implies thinking in terms of algorithms that dynamically produce the worst-case rather than trying to just identify a static worst-case that is specific to one solution.

I think your statement makes sense for say, Quicksort or simple Binary Trees. In this case, the worst-case scenario is a "simple" reversed list. (ex: sorting [5 4 3 2 1] into [1 2 3 4 5]).

The worst-case insertion into an AVL-balanced tree however is a "Fibonacci Tree". AVL trees have a strange property where sorted lists [1 2 3 4 5 6 7] or [7 6 5 4 3 2 1] actually leads to optimal balancing. The sequence for worst case insertion into AVL Tree is something like [1 2 3 4 5 1.5 6] (1.5 to prevent the far-left tree from being perfectly balanced, and then 6 further unbalances the far-right branches)

Some algorithms have very non-intuitive worst-case scenarios.


> I think there's a distinction between worst-case and adversarial behavior.

I think _technically_ there is no difference - it does not matter if the worst-case-behavior is triggered by an "adversary" or by chance.

It _does_ give a different mental model though.


I enjoyed a fast read of this and will re-read it. Following from the idea of making system-scoped information available to individual agents, I think a general model needs to include some sense of information processing capacity of agents - like an attention or analysis budget. That includes ability to recognize higher order signal as first order relevant. Information asymmetries are not so much about information but interpretation.


Thanks for reading, and thanks for the feedback!


What's missing from this is the "before and after" - how this quarter's class experience was different from previous quarters without the AI tool emphasis.


The very first thing you have to learn about original research is the basis of the experimental scientific method, of the idea of empiricism and improvement through reason, observation, and iterative comparative testing. It is a little bit shocking when you encounter the broad swath of the population that has not internalized this.


Local to south Jersey it's "ruckers".


I’m from south Jersey and have never heard of this “ruckers”. Is it near Ouaisné?


I was about to call fake on this -- Americans from south Jersey are largely unfamiliar with the present perfect and would not say "[I] have never heard of" but "[I] never heard of" instead.

But it turns out this grammatical cue is an effective way to discover that the comment is not about an American south Jersey but a British one.


It’s an Albany expression.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: