Hacker Newsnew | past | comments | ask | show | jobs | submit | dvilasuero's commentslogin

Hey! I'm part of the AI Sheets team.

Looking forward to seeing what you think, feedback and ideas are super welcome!

We also wrote a blog post with more details:


love it!


Hey!

At Argilla, we've been using our previous version of distilabel to build open preference datasets used by 100s of models and top performing models like zephyr-141b.

Today we're releasing distilabel 1.0.0. We've totally revamped it to make creating complex synthetic data pipelines easier, more robust and community-friendly.

We'd love to hear your thoughts!



Hey there!

We've just released a new open-source project for AI feedback to build datasets for RLHF-related methods (like DPO).

Recent projects like Zephyr and Tulu from AllenAI have shown it's possible to build powerful open-source models with DPO and AI Feedback (AIF) datasets.

There's a lot of exciting research in the AIF spaces, such as UltraFeedback (the dataset leveraged by Zephyr and Tulu), JudgeLM, or Prometheus.

However, going beyond research efforts and applying AIF at scale it's different. For enterprise and production use, we need framework that implements key AIF methods on a robust, efficient and scalable way. This framework should enable AI engineers to build custom datasets and scale for their own use cases.

This, combined with humans-in-the-loop for improving dataset quality is the next big leap for OSS LLM models.

distilabel aims to bridge this gap.

We'd love your feedback!


It's really interesting to see open-source tools like Argilla pushing the field to let open-source models get trained the way OpenAI's models are.


I'm Dani, CEO and co-founder of Argilla.

Happy to answer any questions you might have and excited to hear your thoughts!

More about Argilla

GitHub: https://github.com/argilla-io/argilla Docs: https://docs.argilla.io


Please change the logo. no offense, but people could agree that it deserves better logo. It maybe more apparent when you look at the favicon.


Does this support versioning?


We're very much looking forward to seeing Falcon-40B support on llama.cpp. For production use cases, this is also highly relevant: https://huggingface.co/blog/sagemaker-huggingface-llm


Thanks Anakin! we want to bring the data-centric approach to how LLMs are built and fine-tuned too.


Thanks! The main difference is that Argilla is built as an open-source component to be integrated into the wider MLOps/LLMOps stack. The focus being on continous data collection, monitoring, and fine-tuning with open-source and commercial LLMs, as opposed to outsourcing training data collection, and one-off labeling projects. In the blog post we mention this with other words:

Domain Expertise vs Outsourcing. In Argilla, the process of data labeling and curation is not a single event but an iterative component of the ML lifecycle, setting it apart from traditional data labeling platforms. Argilla integrates into the MLOps stack, using feedback loops for continuous data and model refinement. Given the current complexity of LLM feedback, organizations are increasingly leveraging their own internal knowledge and expertise instead of outsourcing training sets to data labeling services. Argilla supports this shift effectively.

I'd love to hear your thoughts on this!


OSS approach makes sense!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: