Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This paper seems rather unfocused, explaining their architecture three times with slight variations while managing to omit crucial details like how exactly they compute gradients for their "External Retrieval Memory."

Also, the section on DeepSeek is really weird: "While the precise architectural details of DeepSeek LLM are still emerging, early discussions suggest that it relies on an extended Transformer backbone or a "hybrid" approach that likely incorporates some form of attention-based mechanism, potentially at specific layers or across chunk boundaries, to facilitate information flow across large contexts." It makes it sound like a mystery, even though there have been multiple papers published on it (they cite the R1 one) so that there's really no need to guess whether attention is involved.

Overall I'm not convinced the authors know what they're doing.



Would you say they aren’t paying attention?


I think it's fair to say they are explicitly avoiding attention.


Hate to be that guy, but this screams LLM-generated to me. Between the titles, the vague explanations, the vague concepts, and the overall amount of fluff to data, I'd bet good money that this was generated with an LLM.

It's not inherently bad to use an LLM for consistency, language and overall sprucing up, but this is taking it a bit too far. It seems like they've prompted it to explain some notes, but it's unsure how well it did, since the notes themselves (i.e. data, experiments, etc) are missing. And it seems poorly prompted in that it consists of lots of fluff paragraphs, devoid of core knowledge, going round and round explaining the same concepts with different words.

In the end the responsibility for the end product is alsways on the submitter. This whole paper could have been a prompt, and it's worrying that this is accepted at such a prestigious school.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: