Hacker Newsnew | past | comments | ask | show | jobs | submit | zakki's favoriteslogin

It depends on what you're doing... Just for reference, here is a small showcase of the capabilities that I've trained on a 13 billion parameter llama2 fine tune (done with qlora).

https://old.reddit.com/r/LocalLLaMA/comments/186qq92/comment...

Edit: Embed some of the content instead.

Inkbot can create knowledge graphs. The structure returned is proper YAML, and I got much better results with my fine-tune than using GPT4.

https://huggingface.co/Tostino/Inkbot-13B-8k-0.2

Simple prompt: https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e...

Complex prompt: https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01...

It also does chunked summarization.

Here is an example of chunking:

Part 1: chunked summarization - https://gist.github.com/Tostino/cacb1cecdf2eb7386baf565d157f...

Part 2: summary-of-summaries - https://gist.github.com/Tostino/81eeee9781e519044950332b4e64...

Here is an example of a single-shot document that fits entirely within context: https://gist.github.com/Tostino/4ba4e7e7988348134a7256fd1cbb...


Back at my old job in ~2016, we built a cheap homegrown data warehouse via Postgres, SQLite and Lambda.

Basically, it worked like this:

- All of our data lived in compressed SQLite DBs on S3.

- Upon receiving a query, Postgres would use a custom foreign data wrapper we built.

- This FDW would forward the query to a web service.

- This web service would start one lambda per SQLite file. Each lambda would fetch the file, query it, and return the result to the web service.

- This web service would re-issue lambdas as needed and return the results to the FDW.

- Postgres (hosted on a memory-optimized EC2 instance) would aggregate.

It was straight magic. Separated compute + storage with basically zero cost and better performance than Redshift and Vertica. All of our data was time-series data, so it was extraordinarily easy to partition.

Also, it was also considerably cheaper than Athena. On Athena, our queries would cost us ~$5/TB (which hasn't changed today!), so it was easily >$100 for most queries and we were running thousands of queries per hour.

I still think, to this day, that the inevitable open-source solution for DWs might look like this. Insert your data as SQLite or DuckDB into a bucket, pop in a Postgres extension, create a FDW, and `terraform apply` the lambdas + api gateway. It'll be harder for non-timeseries data but you can probably make something that stores other partitions.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: