If the engineering part is of secondary importance for you, then at least remember about the dataset. It is in the composition and quality of the training data that all the skills find their origin.
Most of our discussions are about model size, but few about dataset. Yet all the scaling laws hint at the great usefulness of more data. Sometimes even little data can have a great impact in the fine-tuning phase. In the end it is the training data that transforms a random init into the model.
Most of our discussions are about model size, but few about dataset. Yet all the scaling laws hint at the great usefulness of more data. Sometimes even little data can have a great impact in the fine-tuning phase. In the end it is the training data that transforms a random init into the model.