Is this something, as a tech enthusiast that's no expert, I can easily fine tune are run?
My use case would be fine tuning on technical docs. Specific news, 2 years of blog posts, primary source material, and Twitter explainer thread. I want to gather all the niche information of a topic from the last two years, dump it into this and have an LLM that is a subject-matter expert.
Fine tuning doesn't quite work that way. You have to format the training data set as request/response. The idea of fine tuning is to get the model to output things in a specific format, style or structure.
Your use case is better suited to RAG. This is where you retrieve data from a large dataset and inject it into the user's request so the AI model has the context it needs to answer accurately.
But that's not a silver bullet and you would need to spend significant time on chunking strategy and ranking of results to hopefully get a decent response accuracy.
Here is an example of the Predibase platform, referred in the article for the Solar model, but that can train also Llama-3, Phi-3 and Mistral. https://www.youtube.com/watch?v=R2JQhzfaOFw&themeRefresh=1 I think you can assess by yourself if it's easy enough to do for you. (Predibase founder here)
My use case would be fine tuning on technical docs. Specific news, 2 years of blog posts, primary source material, and Twitter explainer thread. I want to gather all the niche information of a topic from the last two years, dump it into this and have an LLM that is a subject-matter expert.