similar to RWKV7’s new (sub quadratic) attention mechanism which models key values as v≈kS’ and does an in-context descent on ||v - kS’||^2/2 (where the state matrix S is one attentional head) , explained more by the author here https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-...
looks like a nice overview. i’ve implemented neural ODEs in Jax for low dimensional problems and it works well, but I keep looking for a good, fast, CPU-first implementation that is good for models that fit in cache and don’t require a GPU or big Torch/TF machinery.
Anecdotally, I used diffrax (and equinox) throughout last year after jumping between a few differential equation solvers in Python, for a project based on Dynamic Field Theory [1]. I only scratched the surface, but so far, it's been a pleasure to use, and it's quite fast. It also introduced me to equinox [2], by the same author, which I'm using to get the JAX-friendly equivalent of dataclasses.
`vmap`-able differential equation solving is really cool.
classic NN takes a vector of data through layers to make a prediction. Backprop adjusts network weights till predictions are right. These network weights form a vector, and training changes this vector till it hits values that mean "trained network".
Neural ODE reframes this: instead of focusing on the weights, focus on how they change. It sees training as finding a path from untrained to trained state. At each step, it uses ODE solvers to compute the next state, continuing for N steps till it reaches values matching training data. This gives you the solution for the trained network.
that’s not what the story says. in any case, the point is to explain, in terms of dualistic if-then logic, that the if (you practice now) and then (you will wake up) are a single non-dual thing. but to communicate in in terms which make sense to the dual, if-then mind, one needs to use dualistic language.
Very far. I'm a fullstack web developer. Independent game dev has been my hobby for 10 years, and game dev is what got me interested in tech when I was a kid. I started teaching myself at a young age with qbasic.
Building interfaces and menu systems etc feels very similar to frontend web development. Much of the domain knowledge transfers when it comes to programming. Mainly my skills transfer to gameplay development and debugging. I make small games and only just recently released my first commercial game, which has only sold two copies.
i wonder if there are any semi automated approaches to finding outliers or “things worth investigating” in these traces, or is it just eyeballs all the way down?
This is possible by semi-automatic detection of anomalies over time for some preset of fields used for grouping the events (aka dimensions) and another preset of fields used in stats calculations (aka metrics). In general case this is hard to resolve taks, since it is impossible to check for anomalies across all the possible combinations of dimensions and metrics for wide events with hundreds of fields.
This is also complicated by the possibility to apply various filters for the events before and after ststs' calculations.
having worked on whole brain modeling the last 15 years and european infra for supporting this kinda research, this is a terrible buzzword salad. the pdf is on par with a typical masters project.
yeah, i did my undergrad research on biological neuron emulation and most of the research in the area is hilariously moronic things just for pumping "research" out and getting students their pieces of paper.
The pigeon experiment is a great one to learn from not just about programming or software, but about life in general. Where are you getting your next dopamine hit? Is it random? Maybe that’s where our idiosyncrasies come from.
and i tried to unpack it a bit here https://wdmn.fr/rank-1-take-on-rwkv7s-in-context-learning/