I agree with the sentiment of this paper (AF can enable drug discovery), but in this specific instance, the authors had a real opportunity contribute a general finding to the scientific community but instead they put in the lowest amount of effort (to a point where they're almost saying nothing at all).
The target had dozens of related structures in the protein databank, including relatives with ~40% sequence identity. This target family has a very similar structure, and conserved active site residues. It's relevant that this target has approved cross-CDK family inhibitors (and thousands of data points of CDK family binders on ChEMBL). The conventional way to enable structure-based design is to build a homology model using a similar structure (see here: https://swissmodel.expasy.org/repository/uniprot/Q8IZL9?temp...), and in this case, there is very low deviation from the AF2 model and this "old fashioned" approach.
To recap, this target had a decent model that would have likely sufficed for drug discovery. The community already knows that "homology models" can be used for structure-based drug design, so any methodological hypotheses of this paper are not supported by evidence.
Although I agree the authors could have done homology modelling, in this case AlphaFold is already doing that. It knows all the related sequences (through the sequence database similarity graph that it embeds) and has a very sophisticated modelling system. In my guess (I'd have to check with my old friends to be sure) it does as well as if not better in producing atomic accuracy for structural predictions for homology modellers better than a typical modeller could produce.
This paper is mainly a flag planted so they can claim they landed on mars first and fastest.
There a quite a few things missing from this paper that would make it a good a good drug discovery paper.
First, they didn't discover a drug - they found a hit. 30 days from target to hit using conventional high-throughput biochemical screening would take 2-4 months. So, this is 3x faster, but that's not the rate limiting step. Validation and in vivo studies will take >4 mo and 1-12mo respectively.
Second, if we take this as a "we found a hit" paper, I want to know how specific your hit is. This would be one of the major advantages of using AF2 - screen against related proteins with some structural or functional similarity. This is the time intensive and oft overlooked part of good in vitro screening campaigns. Potency is nice (although 9 μM isn't impressive), but ultimately selectivity is paramount when targeting a class of proteins with well conserved binding sites, like kinases. If they found a promiscuous CDK inhibitor that happens to hit CDK20, then I bet there are tons of previously reported promiscuous CDK inhibitors that will hit CDK20 too.
Third, this paper is surprising because it exploits none of the cool new things AF2 could enable. In addition to what you mention above, the authors could have tried to counter screen (much faster in silico!), find an allosteric inhibitor, identify a PPI/complex inhibitor, or take a leap by generating a SAR series in silico and validating a few selected compounds in vitro.
Overall, this paper seems both incremental and misdirected. Saving 2 months in the discovery phase, pre-IP, is worth ~0. Not sure anyone there has much experience developing drugs. Hits are nice, but rarely the hard part. However, a hit on a protein from a structurally divergent class would be a major accomplishment.
another important point to notice is affinity. While 8 uM looks impressive, it is not that hard to develop such potency since compounds are likely to aim ATP binding pocket. It is big, deep and offers many hydrogen bond donors in hindge region. What important for such compounds is selectivity, since you want to inhibit only specific kinase, not all of them.
For me it looks like advertising of their platform, not actual scientific achievement.
And this, dear HN community, is the difference between an expert reading a paper pertaining to their field and the casual reader, or even scientists in unrelated fields reading this paper. I am just not equipped to judge the quality of research in the field.
It's hard to read the tea leaves of the author affiliation list, but it sure looks like this work was led by a Hong Kong-based startup founded by Russians, with limited assistance from a couple North America-based researchers.
I'm curious why those Russians may have chosen to found in HK rather than, say, the Bay Area. We may want to contemplate what this means for America's competitive stance (not that I have any problem with the rest of the world doing awesome cutting edge research).
Besides Aspuru-Guzik, this author seems to be the most wellknown in the field.
My read is that they are in HK for the diverse CS talent (ie other russians) and second, better funding options compared to being in China proper.
Knowing russians, they have been considering the Far East as more culturally accomodating than the SV corporate style (combining the poorer aspects of left and right) for some time now. I would say if Texas had a insilico pharm industry that might be comparable to HK..
Neither are US and China, but we still have tons of Chinese immigrants powering our tech economy. Maybe they're coming more through student visas, which may be of less interest to the Russians.
Yes! During the days of the Russian empire, especially during the Civil War. They were staunch supporters of the Union although their own position on their serfs made that support a bit awkward.
Can someone explain for people who aren't good at biology what are the implications of this "novel CDK inhibitor"? Is this news only because of the discovery method (i.e. AI-based), or is it significant news in and of itself (i.e. this novel CDK20 inhibitor is/could be a big deal)?
I think this is explained pretty well in the paper:
"...hepatocellular carcinoma (HCC) was nominated as the indication of interest due to its high prevalence in liver cancers and lack of effective treatments. In general, by analysis of text and OMICs data from 10 database for hepatocellular carcinoma, PandaOmics provides a top list of 20 targets after multiple dimensions filtration, including novelty, accessibility by biologics, safety, small molecule accessibility, and tissue specificity. CDK20 was finally selected as our initial target to work on due to its strong disease association, limited experimental structure information and with no publicly small molecule inhibitor. [...] To the best of our knowledge, this molecule is the first reported CDK20 inhibitor and moreover, this work is also the first reported example which successfully utilized AlphaFold predicted protein structures to identify a confirmed hit for a novel target in early drug discovery"
The researchers focused on a particular type of liver cancer (HCC) because of "lack of effective treatments"
They focused on a particular protein molecule called CDK20 which appears to be important for development of HCC. You can think of it this way... "If CDK20 goes 'haywire', it can contribute to the development of HCC."
The idea is that if you can stop CDK20 from going haywire, perhaps you can slow/stop/prevent/reverse development of HCC.
Along those lines, if you can find a "small molecule" that stops CDK20 from going haywire, that small molecule could potentially serve as a medicine for treating HCC. "Small molecule" is a common pharma term for molecule that is smaller than most biological "macromolecules" like proteins. "Small molecule" is often used (roughly) interchangeably for a molecule that can be developed into an ingestible pill (or injected). "Small molecule" medicines are often relatively stable (i.e., can be stored for a long time without many restrictive storage conditions), cheap to manufacture (not always), etc.
These people used computational methods to create several candidate small molecules that might stop CDK20 from going haywire (the 'technical term here is that the small molecule inhibits CDK20. Some of this was done in conjunction with AlphaFold's predicted 3-D structure for CDK20.
But just because the computer says that your small molecule might inhibit CDK20, that doesn't mean that your small molecule will _actually_ inhibit CDK20 in the real world.
The first _true_ test is to make the small molecule and experimentally assess whether it inhibits CDK20.
One of their candidate small molecule compounds appears to inhibit CDK20.
That's one of the punch lines of the paper.
====================================
I don't like it when people are too negative about this stuff, but I'm going to be a little negative here.
Other than employing AlphaFold, this seems like pretty standard work for pharmaceutical development. I worked for a company doing structure-based drug design and the general concepts employed in this paper are not different from what wewe (and others) have been doing for a while.
They have a particular platform for finding these types of molecules and they would like to argue that their platform is unique and distinguishes itself from everybody else's platform.
I'm not saying that the candidate small molecule is "bad" or anything like that. It's definitely a promising lead and it needs to be pursued. But it's just a lead.
A lot of biotech / pharma involves hyping/selling your particular drug discovery platform and trying to convince people that your platform is the better/more efficient way of finding valuable drugs.
- One of the strategies in drug development is to find a protein that is more present in people who have a disease versus those who don't (called "overexpressed" proteins), and attempt to stop this protein from work correctly (called "inhibition") by having a small molecule drugs that binds to a specific site of the protein (called the "allosteric site") in order to make it mechanically unable to execute its function.
- Cyclin-dependent protein kinases (CDKs) are a family of proteins that seem to play important roles in controlling cell division. CDK20 is overexpressed in a number of cancers.
Regarding the novelty:
- Discovering new inhibitors of proteins based on AI is definitely less novel than it was 5 years ago - while it's definitely still not the norm, AI is making big waves in the pharmaceutical industry. However, I think this might be the first publication validating the use of Alphafold for small molecule drug development, which is a major step forward.
- While it's interesting to see that it's possible to design a small-molecule CDK20 inhibitor, it's currently still very uncertain whether this is a promising drug: i. the compound could be insufficiently specific to CDK20 and could bind to other important proteins and cause unwanted and potentially serious side-effects, ii. the compound could have bad "drug-like" properties (e.g. bioaccumulate in the liver) or be toxic in some way, iii. the compound could interact badly with other drugs that cancer patients receive, iv. the compound could induce resistance (a common problem in small molecule drugs in oncology), and finally, and most importantly, v. the drug might just not be effective at treating cancer or any other diseases - it's not because a protein is over-expressed that it's the cause of a cancer, but rather a symptom of another biological dysregulation.
Still, it's definitely an achievement, and I applaud the efforts of the team and hope they'll find successful treatments.
Finding a molecule that inhibits the use of the gene CDK20 could potentially help to stop certain cancers/tumors to grow and maybe even stop them to circumvent certain parts of the immune system
I have zero understanding of biology but apparently "Diseases associated with CDK20 include Obsessive-Compulsive Disorder and Attention Deficit-Hyperactivity Disorder".
You say they play a role in cancer therapy, yet the paper states, "this molecule is the first reported CDK20 inhibitor".
How could they have played a role before they existed?
Would a greater amount of nonspecific cancer funding have allowed this specific discovery to be found faster since it is limited by a machine and human evaluation of the results?
Alphafold has a clear, easily stated application and problem it solves.
Watson, like the AI craze in general, was mostly buzzwords and hype covering the fact that it was actually three application specialists in a trench coat trying to pass themselves off as machine intelligence.
Yeah, it was a pretty clever if misleading marketing ploy IMO. I don't think many people ever realized that "Watson" as a singular thing does not exist and is basically just the brand name of IBMs ML consulting and cloud services.
I'm kind of surprised more companies didn't try follow suit, it would have been very easy to create the public impression that e.g. AlphaGo and AlphaFold were actually the same AI and available on GCP, or something like that. But I guess the deception breaks down faster when you're not hiring consultants.
AlphaFold doesn't sort out molecules. This paper took the 3D model of a protein predicted by the original AlphaFold paper and used that to test random small molecules against it, first virtually then physically.
For me, what alphafold showed is probably one of the best and brightest teams technology wise and with Google the financial and resource availability.
When I think about IBM I think about a business company.
Google already added alpha fold into its gcp ai vertex service.
They make stuff very very end user-friendly and have been doing this with ai for a while.
The IBM health news was very disappointing to me and I would love to have more insight on it.
Until I have I will just believe that IBM is just not tech savvy enough or was to early technology wise.
Gcp also provides audit save and Industrie standards and complaince standards.
They go so far that they have contracts with pentagon and co. They also have the cooperation with the (or one of the biggest) healthcare providers in the USA tx to there complaince support on gcp.
After all GCP provides encryption on rest, in memory, with Google provided keys and customer provided keys.
They also have gvisor.
With the push of big companies like sap into the cloud, gcp aligns already to business companies globally.
It does. They used Chemistry42 which is Insilico's ML based molecule generation platform(?)[0]. Sure, they didn't train any models specifically for this case, but they used a protein structure predicted by one ML algorithm, then used another set of ML algorithms to find and rank molecules that might match the protein, and then tested the most likely ones in reality.
Offtopic: More and more I'm noticing the ever-increasing amount of chinese names in machine learning papers. Can anyone explain that? Are these researchers mainly choosing this field out of their own interest, or is China somehow pushing/sponsoring a lot researchers to pursue this field?
I realise that China is pursusing AI dominance, is that just what we're seeing? Note this is purely curiosity, I have no problems with the Chinese people (CCP is a different story).
Well first of all, China is a huge country with four times the population of the US or three times of the EU. The greatest "trick" that they have is that they count as one country. It would look different if they were a continent made up of a dozen (still pretty big) countries.
I used to work at a Chinese research institute (in a non ML-field) and the team leader took some time and frequently encouraged his students to learn ML, to try to apply it to their work or just for the sake of it. He was very open about the fact that it would benefit the nation, but that also ML researchers were in good demand and it would be great for the student's individual careers.
So yeah it is mainly just a very popular topic over there, too. But in addition, I feel that China (the gov/party/research community, whoever) thinks more strategically than for example the EU.
I don't think there is any big mystery. AI is a hot area, and a reasonable percentage of smart people have Chinese ancestry (regardless of where they reside now).
China population is twice the that of Europe + US. With the % of people going into science fields in China being a lot higher they are probably producing 4-5 person for a 1 person in EUR and US . This is not just in ML or AI but many other fields
You should see it more as China finally starting to pull its weight in scientific research.
Their population is 4x that of the US, yet their scientific impact is AFAICT less than that of the US.
How far away is something like this from an actual treatment? I assume its pretty far, but is it the sort of thing where you basically have to test to see if it works and is safe-ish? Or are a lot more steps involved?
What does promising in this context mean? Like is it the sort of thing that has a 50% chance of eventually being useful, or is it more like 1% chance?
Very, very, very far away. This paper is about target selection and hit identification, the very first steps of the pipeline. What follows is Backbone optimization (tweaking the chemistry, and figuring out how to actually produce it), pre-clinical trials (in cells and later in animals), and should those prove successful (fairly unlikely in general), then we'll have clinical phases 1-3. Each of those will take several years. Most candidate drugs that even manage to come to clinical trials will fail there (i forgot the exact numbers, but far over 95%). In general: this research is at the stage of someone coming to a software engineer with the words "I have an idea about an app". The actual work hasn't really started yet, and most likely this won't work at all.
Should this molecule be successful (Wich is highly unlikely, as most hit targets aren't) it will probably take an decade until this becomes an experimental treatment, and likely two or more until it becomes a standard treatment.
All in all, this isn't really big news at all, this is a very unexciting thing, and only gets views because it has AlphaFold in the title
(Caveat: it's been a few years since I worked in a remotely related area)
This is about as far from actual treatment as you can be. This paper is entirely about the very first step, finding candidate molecules that bind to a specific protein.
for context, nearly all medical biology research touts itself as a solution to the problem, but is really just a tiny component of a far larger ecosystem of research, development, and deployment.
To see how messed up things can get, take a look at Vioxx. I saw the entire life cycle of vioxx during my earlier career: from "we crystallized cox-1 and cox-2 and now we can make differential inhibitors that don't cause stomach bleeding" to clinical trials to the drug being taken off the market because the clinical trials failed to report serious problems.
So these days every time a person waves their hand and claims they solved a problem I ask what the direct and immediate effect on the actual problems will be and it also elicits an answer of "well, it's complicated... next we have to do <soandso> and <suchandsuch> before we can even put it in a human". genomics with all its claims for human health went through the same hype cycle. As did nanomedical diagnostics.
Personally I think AlphaFold proved its value from day one, by firmly establishing the knowledge (which had already been speculated) that everything you need to predict protein structures is a large enough collection of protein structures, a much larger collection of protein sequence relationships, and a collection of very savvy machine learning techniques to extract the maximum information from that to produce the most physically plausible model. It also showed that you could do all this without explicitly modelling the folding process itself, which is such a huge timesaver.
Very far away, and there are competing experimental technologies which could have yielded the same result in less time. A twist on the same experiment used to do the binding assay called DNA encoded libraries (DEL) can screen millions of molecules against a target. In these screens it's routine to find hundreds of micromolar binders. It's not routine to publish micromolar binders.
This paper isn't really even quantitative scientific evidence that the methodology of the paper is a good hit filtering method. The authors don't say much about the ranking of the non-hits.
It's a valuation booster moreso than a scientific milestone.
I'd say this paper forms the basis for pre-clinical trials. First in-vitro with cell cultures, then in model organisms...
So maybe a few years until first Human trials if they are fast? Maybe up to a decade for full approval? As to the chance nobody can say. Too many factors go into that.
there isn't data in this paper to support going to human clinical trials. micromolar inhibitors are a starting point for further molecular development, not invivo cell work.
I was sketching the path this research has to go for the benefit of a lay reader, provided nothing show-stopping occurs. The small molecule they propose may actually go all the way to Human trials, even though there is a high probability it will not turn out to do something useful.
They already used it on some kind of CDK20 activity assay. Maybe they modify it somewhat, but the next step would be to find out what it does in some kind of cell culture? I'm not an expert in drug development or this specific cancer.
This could be an interesting addition to other large scale drug discovery methods. Not totally sure how orthogonal it is to other methods and about the price point when use at scale - but nevertheless.
I suppose the critical issue will be still that going from molecule to treatment, the success rate is annoyingly low. (And, yes, there are lots of attempts to increase the odds here.)
Success rate is really low from an end-to-end perspective, but I would not call this a failure (and I expect the same from this approach). It reflects the difficulty of the task.
It would be great it we could find a way to use Alphafold to kindof reverse engineer proteins - like to specify the shape you want and then run Alphafold 'backwards' so that you're going from shape -> DNA instead of DNA -> shape
Early evidence seems to show that AlphaFold has trouble with single point mutations that change a protein's shape, so completely de-novo proteins will be a challenge:
I would not expect any program to reliably predict the effects of single mutations that massively change the protein's shape unless there was enough high quality structural data and sequence data for both substates and enough signal to predict which substate the protein would adopt after mutation.
Fortunately, evolution already encoded robustness against this sort of problem into proteins and the vast majority of single point mutations are tolerated (the resulting enzymes are often nearly as active and stable as the originals).
The point of AlphaFold not being able to predict the impact of small changes is probably more about having only seen the "right" way those protein sequences usually look. Many point mutations will result in a fitness reduction and therefore the difference would not make it into the training set.
No, this is absolutely not true. For example scientists have made hundreds of point mutations and probed them, even if the "fitness" was lower (not clear what that even means in this context).
Don't forget that you can replace the three residues in the serine protease catalytic triad and still see significant proteolytic activity. Everything we know about protein activity is wrong.
Yes, I do not know how to feel about this. On the one hand it is great that they found this and made the results public before trying any followup microbiological research. On the other hand, they seem to say: we, engineers, did in two years what you, scientist, failed in doing the last 20 years. They could have just written a clean scientific paper and mention AlphaFold in their methodology section.
The target had dozens of related structures in the protein databank, including relatives with ~40% sequence identity. This target family has a very similar structure, and conserved active site residues. It's relevant that this target has approved cross-CDK family inhibitors (and thousands of data points of CDK family binders on ChEMBL). The conventional way to enable structure-based design is to build a homology model using a similar structure (see here: https://swissmodel.expasy.org/repository/uniprot/Q8IZL9?temp...), and in this case, there is very low deviation from the AF2 model and this "old fashioned" approach.
To recap, this target had a decent model that would have likely sufficed for drug discovery. The community already knows that "homology models" can be used for structure-based drug design, so any methodological hypotheses of this paper are not supported by evidence.