In agreement. I think maybe the best way to get this data set is to subsidize a ...

seveibar · 2025-03-29T05:40:34 1743226834

Very fun idea, I had not considered training on existing work (IP is so sensitive I just couldn't think of a way to get enough)

My approach is slightly different for building the dataset. I think we should bootstrap an absolutely massive synthetic dataset full of heuristically autorouted PCBs to allow the AI to learn the visual token system and basic DRC compliance. We then use RL to reward improvements on the existing designs. Over time the datasets will get better similar to how synthetic datasets are produced whenever a new LLM model is released that make training subsequent LLMs easier.

I think people are underestimating the number of PCBs that are needed to train a system like this. My guess is it is well over 10m PCBs with perfect fidelity. It will make sense to have a large synthetic data strategy.

rasz · 2025-03-30T03:30:35 1743305435

Before you splurge on hardware to extract data it would be much cheaper and faster to just buy it in Shenzhen. All the Apple stuff has been reverse engineered, this is how apps like ZXW have scans of all pcb layers. Random google search https://www.diyfixtool.com/blogs/news/iphone-circuit-diagram...