Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Label availability is a big problem. There are some academic collaborations for some diseases which is making this better, but datasets in general are minuscule compared to what is available for more general computer vision applications.

I am hopeful though. Few-shot learning and self supervision are very active questions right now and there are a lot of papers in the medical AI field that are getting published on these topics.

I'm personally interested in liver cancer, which does not have large, well-curated and shared database of cases.

Sharing data gets tricky, especially in the US. Labels I'm working with and creating (at least at this point) are for my own research, which won't be shared publicly any time soon.



> n the medical AI field that are getting published on these topics.

Most of what I've seen isn't very promising. The energy in these research areas are because it would be so much cheaper than the "right" way, far more than because of the likelihood of success. And also perversely, because it's hard for the academic researchers to get enough data to do other studies :) NB: I'm not saying there is nothing useful coming out of the learning literature in last few years, just that it a) isn't a silver bullet and b) is often being misapplied in these areas anyway.

Label quality and availability isn't the only big problem though. Many data sets exhibit problematic sampling bias, as well as being order(s) of magnitude too small, because of the way they are gathered and how access is granted.


Problem is though for most of these diseases there just aren't the number of samples available, period, to do it the "right way". HCC for example, has around 50K new cases/year in the US. Even if every single case went into a repository with perfect labels, would still take a long time to collect that info. Not to mention you need either a radiologist (4 year of medical school + 6 years of post-school training) or a very skilled and experienced technician to label the data.

Not to mention imaging protocols are not standardized, and the imaging technology is also evolving so scans we do today may not be "correct" or standard in 5-10 years.


Definitely diseases have different challenges. Breast cancer screening being a notable outlier as far as data availabilty. For some diseases ML is probably always going to be problematic although may help in diagnostics mostly by helping get rid of other possibilities.

I suspect we have similar overall views of the problem, but I'm pretty strongly in camp that recent advances in ML/AI are mostly really driven by data & label availability, not algorithmic advances - this colors where I think the wins to be had in medical ML can happen most easily. Either way though the non-technical barriers seem clearly higher than the technical ones still.


Really enjoyed your contributions to this thread, thanks.

I’m a first year radiology registrar (PGY3) in Australia looking to find others doing interesting work in this domain, if you think I could help with your efforts feel free to DM


Second follow up.

Results are not impressive until they are :)

It's certainly not a solved problem, and it's easy to have a pessimistic view now but I'm generally bullish on where things will be 10 years from now.


> Results are not impressive until they are :)

True! I certainly wouldn't discourage anyone from trying.

On the other hand, I think it would be a huge mistake to trust that fancy learning approaches will solve everything so we shouldn't try and improve access and labling. Getting better there is still by far the most high probability of successful impact, imo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: