Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In agreement.

I think maybe the best way to get this data set is to subsidize a few dozen electronics recycling centers for every unique microCT scan they send you. Lease them the tomography equipment. They increase their bottom line, you get a huge dataset of good-to-excellent quality commercial PCB designs.



Very fun idea, I had not considered training on existing work (IP is so sensitive I just couldn't think of a way to get enough)

My approach is slightly different for building the dataset. I think we should bootstrap an absolutely massive synthetic dataset full of heuristically autorouted PCBs to allow the AI to learn the visual token system and basic DRC compliance. We then use RL to reward improvements on the existing designs. Over time the datasets will get better similar to how synthetic datasets are produced whenever a new LLM model is released that make training subsequent LLMs easier.

I think people are underestimating the number of PCBs that are needed to train a system like this. My guess is it is well over 10m PCBs with perfect fidelity. It will make sense to have a large synthetic data strategy.


Before you splurge on hardware to extract data it would be much cheaper and faster to just buy it in Shenzhen. All the Apple stuff has been reverse engineered, this is how apps like ZXW have scans of all pcb layers. Random google search https://www.diyfixtool.com/blogs/news/iphone-circuit-diagram...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: