It's an interesting subject, with a long history; I think many of the biggest challenges are not technical.
The first commercially available AI/ML approach to breast cancer screening was available (US) in the late 90s. There have been many iterations and some improvements since, none of which really knock it out of the park but most clinical radiologists see the value. Perhaps the more interesting question then is why are people getting value out of uploading their own scans, i.e. why does their standard care path not already include this?
The reason I made this project 100% free and available to the general public is to help patients, especially in the remote area who has limited access to experienced radiologists for diagnosis, to at least get a second opinion on their mammogram. And I think this has certain value and this is why I'm doing this project.
Question I have for you is that one of the biggest problems with cancer diagnoses is false positives: "Yes, there is something on your scan, we're not sure what it is, so we'll biopsy it." Biopsy is not a 0-risk procedure, and it can cause a lot of worry and pain, so it's not something to be taken lightly. Also, there are many cases of "OK, it's probably cancer, but the cure may be worse than the disease." This is the classic problem with detecting prostate cancer at an advanced age - it's very likely/probable something else will kill you before the cancer does.
How does your software deal with this issue? I'd be worried if, as you put it, people in a remote area with limited access to experienced radiologists, were given access to this and it came back with "Fairly decent chance of cancer" - what do they do then?
Great question. Definitely sensitivity / specificity balance is a crucial topic in AI-assisted diagnosis. I have to admit this model and website for mammography was done in 2018 and might not be the leading solution out there. At the moment, if I want to improve the results of my earlier work, I will add additional stage of radiomics model to do false positive reduction, and in the mean time lower the threshold in the first model to increase sensitivity. By doing this, combining the feature extractions of deep learning and the past 20 years of knowledge in medical imaging using Radiomics might give better performance in terms of sensitivity / specificity.
You're operating in a really serious area of medicine and I encourage you to takes the comments about false positives (and false negatives) more seriously. It's not really just a matter of making adjustments to a model; it has to be pervasive in the entire process of making reproducible modeels that are used for making decisiojns about humans.
The website is cleared marked as a breast health awareness tool and not for diagnosis. We are not doing any decision making for humans. However I would like to point out there are quite some FDA approved mammo AI products in the market at the moment.
Even if people use it for diagnosis, so what? Take the info and if it’s positive to confirm with doctor, if it’s negative but you have reservations go see doctor.
I think this comment is perhaps a bit uncharitable.
Even if you have done everything you are supposed to do in the process, at the end of the day you are looking at ROC curves or equivalent and trying to understand sensitivity vs. specificity trade offs, and often you have some sort of (indirect) parameters than can move to different points of that tradeoff.
This is quite critical in deployment, if you are screening you usually want something different than in diagnosis; as alluded to elsewhere if you raise the work-up rate too much you definitely risk killing more people from biopsy complications than you help with higher sensitivity (it's more complicated than that in practice)
deploying ML medical diagnostics is like everything else in ML: the ML part is 1% of a much larger "thing", which involves business, legal, and many other concerns far beyond the data analytics.
Nothing abotu what I'm saying is uncharitable- in a sense, it's charitable because I'm helping warn a person who is going down a dangerous path to consult more with experts in this field.
The charitable response would have been to assume they do have some understanding the broader context, and perhaps raise specific concerns or points of interest.
What you did was assume that they were ignorant in potentially dangerous ways, and assert they should do something different.
I don't think a reading of the comments/responses (at least at your time of posting) really supported that assumption, especially considering the limitations of the medium. Hence my reply to you, while also detailing the trade offs a tiny bit.
I read the entire article and I didn't see anything in there that would convince me this author is anything other than an amateur programmer (I can't parse the section about bruker- is the "amateur programmer" also a director at Bruker who ddevelops medical devices full-time"?).
Please be assured that I put a fair amount of thought into this - for example, I used to do due diligence for a VC firm evaluating proposals like this all the time and we had to reject most of them because the founders didn't understand the basic rules of deploying medical technology in highly regulated environments like the US.
Based on my interactions with the author in the various parts this post, I continue to conclude this individual is lacking core knowledge and wisdom required to execute a project like this successfully at scale.
The article was pretty fluffy, but it was about them not by them. If article was accurate about the role at Siemens they have for certain been exposed to RA/QA work and know what a DHF is, etc.
Anyway and least at they time you posted (since then there were more interaction) I didn't find the same information nearly enough to dismiss their competence out of hand.
I went back and dwetermined that the article was wrong. He wasn't a "director" at Bruker, he was a "detector imaging scientist". There's nothing about Siemens.
So this isn't an amateur programmer, it's a person who got a phd in nuclear engineering and radiological sciences, was a scientist at bruker, has some experience with health systems, and then became a serial entrepeneur with a small company that has some funding. BTW, people who have the job title "Director" are normally fairly senior (old), as well.
You will save even more lives if you over-communicate about the need to use your awareness tools (with better false positive reduction) to drive more effective diagnosis.
Inconsistent care is a really good point. I wasn't trying to be negative - hope it didn't come across that way. I was hoping to point out that systemic issues in health care management, at least in a lot of countries, seems to be more of a problem that tech for things like this.
Out of curiosity, how are you handling the data access and labelling issues here? I suspect that's the key issue that has limited the performance of the commercial offerings (hardly limited to this problem or this space).
OTOH in terms of real impact, properly leveraging a more modestly successful algorithm will probably help more people than getting a few more %. With the (strong) caveat that in a space like this you really have to look at work-up rate and balance risks.
There is history of breast cancer in my family and anything that can be done to improve outcomes has to be highly commended. I did however have the same question and this
> to at least get a second opinion on their mammogram.
for me makes a lot of sense, even in developed countries where you get a result but want extra assurances.
It would be interesting to know (assuming you have the data, even anecdotally) if the second opinions using this overturned professional ones and from those how many were corrected an original false negative mistake.
Not speaking for coolwulf obviously but I can perhaps shed some light.
Screening breast mammo has an occurrence rate problme. Something like less than 10 in 1000 studies will require further review; this means in practice as a radiologist you look at a lot of negative films before seeing a TP. It also means a typical read is done fast. Seconds-to-small minutes.
This results in a couple of things. Reader variability based on experience/throughput, and false negatives. There were some double reader studies that caught something like 15% (going from memory here) of FN - but nobody can affort to have two radiologists read everything.
So the profession is already conceptually used to the idea of using an algorithm as a "second read" and reconsidering. Typically this won't "overturn" anything here but rather say 'hey have another look', but the decision to proceed or not is still the clinicians. Having a positive from the algorithm makes them review carefully, but you have to watch the FP rate here or nobody would get anything else done.
I have heard of health systems using algorithms as a first pass too (i.e. radiologist only see films that have had a postive in a tuned-to-be-senstive version), but that has it's own set of issues.
The first commercially available AI/ML approach to breast cancer screening was available (US) in the late 90s. There have been many iterations and some improvements since, none of which really knock it out of the park but most clinical radiologists see the value. Perhaps the more interesting question then is why are people getting value out of uploading their own scans, i.e. why does their standard care path not already include this?