Show HN: Mamba-Chat – A Chat LLM Based on State Space Models

Hey everyone! Many of you might have come across the Mamba paper a few days ago, which introduced an LLM based on a state space model architecture. The Mamba architecture is quite useful as its complexity scales subquadratically with input length and is therefore way more efficient than transformer models: https://github.com/state-spaces/mamba

I got really excited about the paper, so I decided to fine-tune the model on a chat dataset. It turns that this actually worked quite well! The model is quite suitable for casual chatting, which honestly surprised me given that it only has 2.8B parameters and the base model was only trained on the Pile. It's quite exciting that we might have a serious candidate for an architecture that could dethrone transformers.

You can find both my fine-tuning and inference code here: https://github.com/havenhq/mamba-chat