It depends on what you're doing... Just for reference, here is a small showcase of the capabilities that I've trained on a 13 billion parameter llama2 fine tune (done with qlora).
Back at my old job in ~2016, we built a cheap homegrown data warehouse via Postgres, SQLite and Lambda.
Basically, it worked like this:
- All of our data lived in compressed SQLite DBs on S3.
- Upon receiving a query, Postgres would use a custom foreign data wrapper we built.
- This FDW would forward the query to a web service.
- This web service would start one lambda per SQLite file. Each lambda would fetch the file, query it, and return the result to the web service.
- This web service would re-issue lambdas as needed and return the results to the FDW.
- Postgres (hosted on a memory-optimized EC2 instance) would aggregate.
It was straight magic. Separated compute + storage with basically zero cost and better performance than Redshift and Vertica. All of our data was time-series data, so it was extraordinarily easy to partition.
Also, it was also considerably cheaper than Athena. On Athena, our queries would cost us ~$5/TB (which hasn't changed today!), so it was easily >$100 for most queries and we were running thousands of queries per hour.
I still think, to this day, that the inevitable open-source solution for DWs might look like this. Insert your data as SQLite or DuckDB into a bucket, pop in a Postgres extension, create a FDW, and `terraform apply` the lambdas + api gateway. It'll be harder for non-timeseries data but you can probably make something that stores other partitions.
https://old.reddit.com/r/LocalLLaMA/comments/186qq92/comment...
Edit: Embed some of the content instead.
Inkbot can create knowledge graphs. The structure returned is proper YAML, and I got much better results with my fine-tune than using GPT4.
https://huggingface.co/Tostino/Inkbot-13B-8k-0.2
Simple prompt: https://gist.github.com/Tostino/c3541f3a01d420e771f66c62014e...
Complex prompt: https://gist.github.com/Tostino/44bbc6a6321df5df23ba5b400a01...
It also does chunked summarization.
Here is an example of chunking:
Part 1: chunked summarization - https://gist.github.com/Tostino/cacb1cecdf2eb7386baf565d157f...
Part 2: summary-of-summaries - https://gist.github.com/Tostino/81eeee9781e519044950332b4e64...
Here is an example of a single-shot document that fits entirely within context: https://gist.github.com/Tostino/4ba4e7e7988348134a7256fd1cbb...