Wanted to share some learnings we had optimizing and deploying Qwen-Image-Edit at scale to replace Nano-Banana. The goal was to generate a product catalogue of 1.2m images, which would have cost $46k with Nano-Banana or GPT-Image-Edit.
Qwen-Image-Edit being Apache 2.0 allows you to fine-tune and apply a few tricks like compilation, lightning lora and quantization to cut costs.
The base model takes ~15s to generate an image which would mean we would need 1,200,000*15/60/60=5,000 compute hours.
Compilation of the PyTorch graph + applying a lightning LoRA cut inference down to ~4s per image which resulted in ~1,333 compute hours.
I'm a big fan of open source models, so wanted to share the details in case it inspires you to own your own weights in the future.
One of the maintainers of the Open Source project "Oxen" here. Our VCS scales for binary data better than git does, and was built to solve some of the problems with git-lfs and git-annex.
We've had a few requests to integrate with music production workflows, but haven't taken it on yet. If anyone wants to collaborate to integrate Oxen with their DAW or workflow let us know! Here's the project:
We're working on `oxen` to solve a lot of the problems we ran into with git or git-lfs.
We have an open source CLI and server that mirrors git, but handles large files and mono repos with millions of files in a much more performant manner. Would love feedback if you want to check it out!
Not working with datasets. Binary files aren't that large, and these tools are generally bad for my use case - because I am not concerned about datasets.
I need to track changes in binary files of very reasonable size. Total repo size is <1GB. But even at these small memory requirements it makes much more sense to self host with LFS. I have written this up too many times on the internet to go into great detail about how LFS isn't perfect and how I wish there was something better, but in practice it has worked extremely well for tracking a small amount of binary files. Kudos to the devs.
FLUX.1-dev is one of the most widely fine-tuned models out there - but I couldn’t find a single, clean, end-to-end example that actually worked. So I wrote one. Enjoy!
Over the past ~1.5 years I've been running a research paper club where we dive into interesting/foundational papers in AI/ML. So we naturally have come across a lot of the papers that lead up to DeepSeek-R1. While diving into the DeepSeek papers this week, I decided to compile a list of papers that we've already gone over or I think would be good background reading to get a bigger picture of what's going on under the hood of DeepSeek.
We're working on Oxen.ai which is an Open Source CLI and Server with Python bindings as well. Optimized for ML/AI workloads but works with any type of data and we see usage from game companies, bio, aerospace etc.
Or a hub you can host data on (we have public and private repos, or private VPC deployments):
https://oxen.ai
The CLI mirrors git so it's easy to learn. It has some interesting build in tooling for diff-ing datasets and working on them remotely without downloading a full copy of the data as well.
Right now the UI is only available through a VPC deployment. We are thinking about making the data grid / query interface embeddable or available through a library which would make it easy to self host.
If you haven't seen the Oxen project yet, we have been building an open source unstructured data version control tool.
We were inspired by the idea of making large machine learning datasets living & breathing assets that people can collaborate on, rather than the static ones of the past. Lately we have been working hard on optimizing the underlying Merkle Trees and data structures with in Oxen.ai and just released v0.19.4 which provides a bunch of performance upgrades and stability to the internal APIs.
To put it all to the test, we decided to benchmark the tool on the 1 million+ images in the classic ImageNet dataset.
The TLDR is Oxen.ai is faster than raw uploads to S3, 13x faster than git-lfs, and 5x faster than DVC. The full breakdown can be found here.
If you are in the ML/AI community, or rust aficionados, would love to get your feedback on both the tool and the codebase. We would love some community contribution when it comes to different storage backends and integrations into other data tools.
I'm still not convinced on Mamba's performance on Natural Language tasks, but maybe it's just because they haven't trained a large enough model on enough data yet.
Qwen-Image-Edit being Apache 2.0 allows you to fine-tune and apply a few tricks like compilation, lightning lora and quantization to cut costs.
The base model takes ~15s to generate an image which would mean we would need 1,200,000*15/60/60=5,000 compute hours.
Compilation of the PyTorch graph + applying a lightning LoRA cut inference down to ~4s per image which resulted in ~1,333 compute hours.
I'm a big fan of open source models, so wanted to share the details in case it inspires you to own your own weights in the future.