1) people can run a 1.6B model for free on consumer hardware 2) any model that's...

1) people can run a 1.6B model for free on consumer hardware

2) any model that's run on computational resources you are owning or leasing will have more privacy than an explicit cloud offering. running completely on your own local hardware will be private. this means you don't have to think twice about asking the LLM about the proprietary code or information you are working on.

3) smaller models gain the performance improvements from all the other improvements in interpreters and quantizing, allowing for even more consumer friendly offline use

4) oh yeah, offline use. could expand use cases to having LLM's baked into operating systems directly, including leading phones

5) showing what's possible, pushing towards the benchmarks of the best possible model while using less computational resources. this also makes the hosts of the best possible model realize that they could either A) be using less computational resources and increasing the bandwidth for their users B) further improve their own model because of competition. Basically if ChatGPT 4 was using similar improvements in technology across all areas of reasoning/whatever, there never would have been a rate limit on ChatGPT 4.

6) more demand for other computational resources. Nvidia is backordered till maybe Q2 2024 right now. If people realize AMD or even their ARM chips can offer same performance with the right combination of hardware and software, It alleviates pressure on other ventures that want computation power.