lmstudio is using a dark pattern I really hate. Don't have a Github logo in your webpage if your software is not source available. It just takes to Github to some random config repos they have. This is poor choice in my opinion.
Since I couldn't find it in your list, I'd like to plug my own macOS (and iOS) app: Private LLM. Unlike almost every other app in the space, it isn't based on llama.cpp (we use mlc-llm) or naive RTN quantized models (we use OmniQuant). Also, the app has deep integrations with macOS and iOS (Shortcuts, Siri, macOS Services, etc).
Incidentally, it currently runs Mixtral 8x7B Instruct[2] and Mistral[3] models faster than any other macOS app. The comparison videos are with Ollama, but it generalizes well to almost every other macOS app that I've seen uses llama.cpp for inference. :)
nb: Mixtral 8x7B Instruct requires an Apple Silicon Mac with at least 32GB of RAM.
You can see ms/token in a tiny font on the top of the screen, once the text generation completes in both the videos I'd linked to. Performance will vary by machine. On my 64GB M2 Mac Studio Max, I get ~47 tokens/s (21.06ms/token) with Mistral Instruct v0.2 and ~33 tokens/s (30.14ms/token) with Mixtral Instruct v0.1.
I haven't run any specific low level benchmarks, lately. But chunked prefilling and tvm auto-tuned Metal kernels from mlc-llm seemed to make a big differenced, the last time I checked. Also, compared to stock mlc-llm, I use a newer version of metal (3.0) and have a few modifications to make models have a slightly smaller memory and disk footprint, also slightly faster execution. Because unlike the mlc-llm folks, I only care about compatibility with Apple platforms. They support so much more than that in their upstream project.
I am the author of Msty app mentioned here. So humbled to see an app that is just about a month old that I mostly wrote for my wife and some friends to begin with (who got overwhelmed with everything that was going in LLM world), on the top of your list. Thank you!
One bit of feedback: there's nowhere to put system messages. These can be much more influential than user prompts when it comes to shaping the tone and style of the response.
That's on the top of our list. It got pushed back because we want to support creating a character/profile (basically select a model and apply some defaults including a system prompt). But I feel like that was a mistake tomwait for it. Regardless, it is getting added in the next release (the one after something that is dropping in a day or 2, which is a big release in itself)
1) What are the mac system requirements? Does it need a specific OS version?
2) If you're privacy first, many would feel a lot more comfortable if this was released as an app in the app store so it will be sandboxed. This is important because it's not open source so we have no idea what is happening in the background. Alternatively open source it, which many here have requested.
It loads the LLM in the browser, using webgpu, so it works offline after the first load, it's also PWA you can install. It should work on chrome > 113 on desktop and chrome > 121 on mobile.
Oh thanks! didn't know there are quite a few ChatGPT local alternatives. I was wondering what users they are targeting. Engineers or average users? I guess average users will likely choose ChatGPT and Perplexity over local apps for more recent knowledge of the world.
Hi. I'm the author of Msty app, 2nd on the list above. You are right about average users likely choosing ChatGPT over local models. My wife was the first and the biggest user of my app. A software engineer by profession and training but she likes to not worry about LLM world and just to use it as a tool that makes you more productive. As soon as she took Msty for a ride, I realized that some users, despite their background, care about online models. This actually led me adding support for online models right away. However, she really likes to make use of the parallel chat feature and uses both Mistral and ChatGPT models to give same prompt and then compare the output and choose the best answer (or sometimes make a hybrid choice). She says that being able to compare multiple outputs like that is a tremendously helpful. But that's the extent of local LLMs for her. So far my effort has been to target a bit higher than the average users while making it approachable for more advanced users as well.
Looks great, though the fact that you have to ignore your anti-virus warning during installation, and the fact that it phones home (to insights.msty.app) directly after launch despite the line in the FAQ on not collecting any data makes me a little skittish.
I’m looking for a ChatGPT client alternative, i.e. I can use my own OpenAI API key in some other client.
Offline isn’t important for me, only that $20 is a lot of money, when I’d wager most months my usage is a lot less. However, I’d still want access to completion, DALL-E, etc.
Give it a try and see how you feel. "Yes, it will" be a dishonest answer to be completely honest at least at this point. The app has been out for just about a month and I am still working in it. I would love a user like you to give it a try and give me some feedback (please). I am very active on our Discord if you want to get in touch (just mention your HN username and I will wace).
Thanks for the list. Tried Jan just now as it is both easy and open source. It is a bit buggy I think but the concept is ace. The quick install, tells you which models work on your machine, one click download and then a chatgpt style interface. Mistral 7B running on my low spec laptop at 6 token/s making some damn sense is amazing. The bugs are at the inference time. Could be hardware issues though, not sure. YMMV
Author of Msty here. Not yet but I am already working on the design for it to be added in very near future. I am happy to chat more with you to understand your needs and what you are looking in such apps. Please hop on the Discord if you don't mind :)
Some of my usecases would be summarizing a PDF report, analyzing json/csv data, upload a dev project to write a function or feature or build a UI, rename image files, categorize images, etc
Yes, GPT4All has RAG-like features. Basically you configure some directories and then have it load docs from whatever folders you have enabled for the model you're currently using. I haven't used it a ton, but I have used it to review long documents and it's worked well depending on the model.
Was testing apps like this if anyone is interested:
Best / Easy to use:
- https://lmstudio.ai
- https://msty.app
- https://jan.ai
More complex / Unpolished UI:
- https://gpt4all.io
- https://pinokio.computer
- https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generat...
- https://github.com/LostRuins/koboldcpp
Misc:
- https://faraday.dev (AI Characters):
No UI / Command line (not for me):
- https://ollama.com
- https://privategpt.dev
- https://serge.chat
- https://github.com/Mozilla-Ocho/llamafile
Pending to check:
- https://recurse.chat
Feel free to recommend more!