Hacker Newsnew | past | comments | ask | show | jobs | submit | pickettd's commentslogin

Reddit and GitHub are littered with people launching new projects and appear to be way more feature-rich than new tool/app launches from previous years. I think it is a lot harder to get noticed with a new tool/app new because of this increase in volume of launches.

Also weekend hackathon events have completely/drastically changed as an experience in the last 2-3 years (expectations and also feature-set/polish of working code by the end of the weekend).

And as another example, you see people producing CUDA kernels and MLX ports as an individual (with AI) way more these days (compared to 1-2 years ago), like this: https://huggingface.co/blog/custom-cuda-kernels-agent-skills


I have no way of verifying any of those. Something I can easily verify, new games launched on steam.

January numbers are out and there were fewer games launched this January than last.


I’d be interested where you’re getting your data. SteamDB shows an accelerating trend of game releases over time, though comparing January 2026 to January 2025 directly shows a marginal gain [0].

This chart from a16z (scroll down to “App Store, Engage”) plots monthly iOS App Store releases each month and shows significant growth [1].

> After basically zero growth for the past three years, new app releases surged 60% yoy in December (and 24% on a trailing twelve month basis).

It’s completely anecdotal evidence but my own personal experience shows various sub-Reddit’s just flooded with AI assisted projects now, so much so that various pages have started to implement bans or limits of AI related posts (r/selfhosted just did this).

As far as _amazing software_ goes, that’s all a bit subjective. But there is definitely an increase happening.

[0] https://steamdb.info/stats/releases/

[1] https://www.a16z.news/p/charts-of-the-week-the-almighty-cons...


I got the numbers swapped. Turns out there was an increase of about 40 games between last January and this. Which is exactly what you wouldn’t expect if the 5-10x claims are true.

Also the accelerating trend dates back to 2018 if you remove the early COVID dip. Which is exactly my point. You can look at the graph and there is no noticeable impact correlated to any major AI advancements.

The iOS data is interesting. But it’s an outlier because the Play Store and Steam show nothing similar. And the iOS App Store is weird because they’ve had numerous periods of negative growth follow by huge positive growth over the years. My guess is that it probably has more to do with all of the VC money flowing into AI startups and all the small teams following the hype building wrappers and post training existing models. If you look at a random sample of the iOS new apps that looks likely.

Seriously go to the App Store, search AI and scroll until you get bored. There are literally thousands of AI API wrappers.


Specifically about custom CUDA kernels, I’ve implemented them with AI that significantly sped up the code in this project I worked on. Didn’t know how to code these kernels at all, but I implemented and tested a couple of variations and got it running fast in just two days. Basically impossible for me before AI coding (well not impossible but it would have taken me many weeks, so I wouldn’t have tried it).


Or just don't publish them, because they don't want to deal with uses.

I wrote a python DHCP server which connects with proxmox server to hand out stable IPs as long as the VM / container exists in proxmox.

Not via MAC but basically via VM ID ( or name)


The one thing AI is consistently better at than humans is shipping quickly. It will give you as much slop as you want right away, and if you push on it for a short period of time it will compile and if you run it a program will appear that has a button for each of the requested features.

Then you start asking questions like, does the button for each of the features actually do the thing? Are there any race conditions? Are there inputs that cause it to segfault or deadlock? Are the libraries it uses being maintained by anyone or are they full of security vulnerabilities? Is the code itself full of security vulnerabilities? What happens if you have more than 100 users at once? If the user sets some preferences, does it actually save them somewhere, and then load them back properly on the next run? If the preferences are sensitive, where is it saving them and who has access to it?

It's way easier to get code that runs than code that works.

Or to put it another way, AI is pretty good at writing the first 90% of the code:

    "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time." — Tom Cargill, Bell Labs


Was the on-device local LLM stack that you tried llama.cpp or something like MLC? I've seen better performance with MLC than llama.cpp in the past - but it has been probably at a least a year since I tested iphones and androids for local inference


I looked into using https://github.com/mybigday/llama.rn. Ultimately, it was too slow to be conversational. The demands of the rendering the WebGL would likely not help.

It was a while ago. If I was to do it over again I might try https://github.com/tirthajyoti-ghosh/expo-llm-mediapipe. Maybe newer models will help.


I also want to add on that I really appreciate the benchmarks.

When I was working with RAG llama.cpp through RN early last year I had pretty acceptable tok/sec results up through 7-8b quantized models (on phones like the S24+ and iPhone 15pro). MLC was definitely higher tok/sec but it is really tough to beat the community support and availability in the gguf ecosystem.


I get what you mean about wanting a visual app to experience yourself and be able to point others too. I recently followed this MLX tutorial for making a small model act well for home speaker automation/tool-use that I think could potentially be used to make a good all-in-one demo: https://www.strathweb.com/2025/01/fine-tuning-phi-models-wit... (it was fast and easy to do on a MacBook pro)


Nice to see a clear example of doing this entirely locally on a MBP. It ran >2x faster on my M2 MBP compared to the numbers they showed for an M1. Only 23/25 of the test cases passed for me on the fine-tuned model following the README 1:1, but the speedup from fine-tuned versus off-the shelf was clear. Thanks for sharing.



I downloaded the 8 bit quant last night but haven't had a chance to play with it yet.


Depends on what benchmarks/reports you trust I guess (and how much hardware you have for local models either in-person or in-cloud). https://aider.chat/docs/leaderboards/ has Deepseek v3 scoring higher than most closed LLMs on coding (but it is a huge local model). And https://livebench.ai has QwQ scoring quite high in the reasoning category (and that is relatively easy to run locally but it doesn't score super high in other categories).


My gut feeling is that there may be optimization you can do for faster performance (but I could be wrong since I don't know your setup or requirements). In general on a 4090 running between Q6-Q8 quants my tokens/sec have been similar to what I see on cloud providers (for open/local models). The fastest local configuration I've tested is Exllama/TabbyAPI with speculative-decoding (and quantized cache to be able to fit more context)


I love the idea of openrouter. I hadn't realized until recently though that you don't necessarily know what quantization a certain provider is running. And of course context size can vary widely from provider to provider for the same model. This blog post had great food for thought https://aider.chat/2024/11/21/quantization.html


To expand a little, some providers may apply more aggressive optimization in periods of high load.


I experimented with both Exo and llama.cpp in RPC-server mode this week. Using an M3 Max and an M1 Ultra in Exo specifically I was able to get around 13 tok/s on DeepSeek 2.5 236B (using MLX and a 4 bit quant with a very small test prompt - so maybe 140 gigs total of model+cache). It definitely took some trial and error but the Exo community folks were super helpful/responsive with debugging/advice.


The Android apk for MLC is updated frequently with recent models built-in. And a Samsung S24+ can comfortably run 7-8B models at reasonable speeds (10ish tokens/sec).

https://llm.mlc.ai/docs/deploy/android.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: