This paper shows that polynomials show most features of deep neural nets, including double descent and ability to memorize entire dataset.
It connects dots there - polynomials there are regularized to be as simple as possible and author argues that hundredths of billions of parameters in modern neural networks work as a regularizers too, they attenuate decisions that "too risky."
I really enjoyed that paper, a gem that puts light everywhere.
If I understand correctly, they approximate language of inputs of a function to discover minimal (in some sense, like "shortest description length") inputs that violate relations between inputs and outputs of a function under scrutiny.
As you mentioned "improvement of existing language," I'd like to mention that Haskell has green threads that most probably are lighter (stack size 1K) than goroutines (minimum stack size 2K).
Haskell also has software transactional memory where one can implement one's own channels (they are implemented [1]) and atomically synchronize between arbitrarily complex reading/sending patterns.
> Symbolic AI like SAT solvers and planners is not trying to learn from data and there's no context in which it has to "scale with huge data".
Actually, they do. Conflict-Driven Clause Learning (CDCL) learns from conflicts encountered during working on the data. The space of inputs they are dealing with oftentimes is in the order of the number of atoms in Universe and that is huge.
I'm pretty sure most "industrial scale" SAT solvers involve both deduction and heuristics to decide which deductions to make and which to keep. At a certain scale, the heuristics have to be adaptive and then you have "induction".
I don't agree. The derivation of new clauses by Resolution is well understood as deductive and the choice of what clauses to keep doesn't change that.
Resolution can be used inductively, and also for abduction, but that's going into the weeds a bit- it's the subject of my PhD thesis. Let me know if you're in the mood for a proper diatribe :)
You know, this seems like yet another reason to allow HN users to direct message each other, or at least receive reply notifications. Dang, why can't we have nice things?
I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:
In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.
This is also why I think the title of the article is slightly misleading.
But they don't learn that way at all, my 7yo learns by watching youtubers. There's a whole network of people teaching each other the game, that's almost more fun than playing it alone.
> there is no probabilistic link between the words of a text and the gist of the content
Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.
Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.
As if matter existed and then there was a creation of light at some moment. Red shift is explained by interaction of light with gravitational field - the more distant source of light, the longer it travels under the influence of gravity and the more red it becomes.
This paper shows that polynomials show most features of deep neural nets, including double descent and ability to memorize entire dataset.
It connects dots there - polynomials there are regularized to be as simple as possible and author argues that hundredths of billions of parameters in modern neural networks work as a regularizers too, they attenuate decisions that "too risky."
I really enjoyed that paper, a gem that puts light everywhere.
reply