Hacker Newsnew | past | comments | ask | show | jobs | submit | marmaduke's commentslogin

similar to RWKV7’s new (sub quadratic) attention mechanism which models key values as v≈kS’ and does an in-context descent on ||v - kS’||^2/2 (where the state matrix S is one attentional head) , explained more by the author here https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-...

and i tried to unpack it a bit here https://wdmn.fr/rank-1-take-on-rwkv7s-in-context-learning/


looks like a nice overview. i’ve implemented neural ODEs in Jax for low dimensional problems and it works well, but I keep looking for a good, fast, CPU-first implementation that is good for models that fit in cache and don’t require a GPU or big Torch/TF machinery.



Anecdotally, I used diffrax (and equinox) throughout last year after jumping between a few differential equation solvers in Python, for a project based on Dynamic Field Theory [1]. I only scratched the surface, but so far, it's been a pleasure to use, and it's quite fast. It also introduced me to equinox [2], by the same author, which I'm using to get the JAX-friendly equivalent of dataclasses.

`vmap`-able differential equation solving is really cool.

[1]: https://dynamicfieldtheory.org/ [2]: https://github.com/patrick-kidger/equinox


Thanks, that looks neat.

Kidger's thesis is wonderful https://arxiv.org/abs/2202.02435


no, wrote it by hand for use with my own Heun implementation, since it’s for use within stochastic delayed systems.

jax is fun but as effective as i’d like for CPU


Not as effective as I'd like?


ha, yeah, thanks.


How would you describe what a neural ODE is in the simplest possible terms? Let's say I know what an NN and a DE are :).


classic NN takes a vector of data through layers to make a prediction. Backprop adjusts network weights till predictions are right. These network weights form a vector, and training changes this vector till it hits values that mean "trained network".

Neural ODE reframes this: instead of focusing on the weights, focus on how they change. It sees training as finding a path from untrained to trained state. At each step, it uses ODE solvers to compute the next state, continuing for N steps till it reaches values matching training data. This gives you the solution for the trained network.


Pretty cool approach, looking more into it, thank you!


i like how the contraction it’s and abbreviation T’is are anagrams


It is not T'is, it is Tis. No apostrophe.



I've only ever seen it spelled 'tis


It's a contraction of "it is" so " 'tis " is correct.


The apostrophe is in the wrong spot, but 'tis is the correct spelling, and the only one I've ever seen.


I didn’t look it up, but at first glance, it reminded me of discussions like this one

https://youtu.be/qnT48wO0UL0


that’s not what the story says. in any case, the point is to explain, in terms of dualistic if-then logic, that the if (you practice now) and then (you will wake up) are a single non-dual thing. but to communicate in in terms which make sense to the dual, if-then mind, one needs to use dualistic language.


how far from your previous experience is the game work?


Very far. I'm a fullstack web developer. Independent game dev has been my hobby for 10 years, and game dev is what got me interested in tech when I was a kid. I started teaching myself at a young age with qbasic.

Building interfaces and menu systems etc feels very similar to frontend web development. Much of the domain knowledge transfers when it comes to programming. Mainly my skills transfer to gameplay development and debugging. I make small games and only just recently released my first commercial game, which has only sold two copies.


do you have a link for your game? how was the release process?


i wonder if there are any semi automated approaches to finding outliers or “things worth investigating” in these traces, or is it just eyeballs all the way down?


This is possible by semi-automatic detection of anomalies over time for some preset of fields used for grouping the events (aka dimensions) and another preset of fields used in stats calculations (aka metrics). In general case this is hard to resolve taks, since it is impossible to check for anomalies across all the possible combinations of dimensions and metrics for wide events with hundreds of fields.

This is also complicated by the possibility to apply various filters for the events before and after ststs' calculations.


honeycomb "bubble up"


That seems a good usecase for AI: Its trivial to have it suggest some queries and test if they give interesting results.


having worked on whole brain modeling the last 15 years and european infra for supporting this kinda research, this is a terrible buzzword salad. the pdf is on par with a typical masters project.



https://www.biorxiv.org/content/10.1101/2024.10.25.620245v1....

covers some of the recent perspectives on this modeling approach if you’re interested.


Awesome, made this worth it from my pov


yeah, i did my undergrad research on biological neuron emulation and most of the research in the area is hilariously moronic things just for pumping "research" out and getting students their pieces of paper.


HH is kinda the opposite of LIF on the abstraction spectrum.


I mean HH is an elaboration of the LIF with the addition of several equations for the various ion channels, but yeah I see what you mean.


The pigeon experiment is a great one to learn from not just about programming or software, but about life in general. Where are you getting your next dopamine hit? Is it random? Maybe that’s where our idiosyncrasies come from.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: