[Disclaimer: I used to work at D. E. Shaw Research from 2011-2016]
The early Anton 1 numbers of 17us/day on 100K atoms were huge leap forward then. At that time, GPU-based simulations (e.g. GROMACS/Desmond on GPU) were doing single digit ns/day. Remember, even for 'fast-folding' proteins, the relaxation time is on the order of us and you need 100s of samples before you can converge statistical properties, like folding rates [0]. Anton 2 got a 50-100x speed-up [1] which made it much easier to look at druggable pathways. Anton was also used for studying other condensed matter systems, such as supercooled liquids [2].
Your question of why is this so slow or small is prescient. On the reasons that we have to integrate the dynamical equations (e.g. Newtonian or Hamiltonian mechanics) at small, femtosecond timesteps (1 fs = 1e-15s) is because the vibrational frequencies of bonds are on the order of picoseconds (1 ps = 1e-12s). Given that you also have to compute Omega(n^2) pairwise interactions between n particles, you end up having a large runtime to get to ns and us while respecting bond frequencies. The hard part, for atomistic/all-atom simulation is that n is on the order of 1e5-1e6 for a single protein with 100s of water molecules. The water molecules are extremely important to simulate exactly since you need to get polar phenomena, such as hydrogen bonding, correct to get folded structure and druggable sites correct to angstrom precision (1e-10 meters). If you don't do atomistic simulations (e.g. n is much smaller and you ignore complex physical interactions, including semi-quantum interactions), you have a much harder time matching precision experiments.
[1] https://ieeexplore.ieee.org/abstract/document/7012191/ [the variance comes from the fact that different physics models and densities cause very different run times -> evaluating 1/r^6 vs. 1/r^12 in fixed precision is very different w.r.t communication complexity and Ewald times and FFTs and ...]
This explanation is interesting. Thanks for sharing it. While reading it, I got the impression that the simulation is not fully quantum mechanical, but rather classical with select quantum mechanical effects.
Which parts of quantum mechanics are idealised away and how do we know that not including them won't significantly reduce the quality of the result?
Are you possibly using stochastical noise in the simulations and repeat them multiple times, in the hope that whatever disturbance caused by the idealisation of the model is covered by the noise?
That's a good question and there are a number of ways to try to tackle this. One of the main reasons you cannot do QM simulations directly is that the high quality methods can cost Omega(n^6/eps) to get eps. relative accuracy (you can do better with DFT, but then you're making your life hard in other way). At a high-level (and I mean, 50,000 ft. level), here are the simplest way:
1) Do quantum mechanics simulations of interactions of a small number of atoms — two amino acids, two ethanol molecules. Then fit a classical function to the surface E[energy(radius between molecules, angles)], where this expectation operator is the quantum one (over some separable Hilbert space). Now use the approximation for E[energy(r, a)] to act as your classical potential.
- Upshot: You use quantum mechanics to decide a classical potential for you (e.g. you chose the classical potential that factors into pairs such that each pair energy is 'closest' in the Hilbert space metric to the quantum surface)
- Downside: You're doing this for small N — this ignores triplet and higher interactions. You're missing the variance and other higher moments (which is usually fine for biology, FWIW, but not for, say, the Aharanov-Bohm effect).
2) Path Integral methods: This involves running classical simulation for T timesteps, then sampling the 'quantum-sensitive pieces' (e.g. highly polar parts) in a stochastic way. This works because Wick rotation lets you go from Hamiltonian evolution operator e^{i L}, for a Lagrangian density L, to e^{-L} [0]. You can sample the last density via stochastic methods to add a SDE-like correction to your classical simulation. This way, you simulate the classical trajectory and have the quantum portions 'randomly' kick that trajectory based on a real Lagrangian.
3) DFT-augmented potentials: A little more annoying to describe, but think of this as a combination of the first two methods. A lot of the "Neural Network for MD" stuff falls closer in this category [1]
[0] Yes, assume L is absolutely continuous with regards to whatever metric-measure space and base measure you're defined over :) Physics is more flexible than math, so you can make such assumption and avoid thinking about nuclear spaces and atomic measures until really needed
> Upshot: You use quantum mechanics to decide a classical potential for you (e.g. you chose the classical potential that factors into pairs such that each pair energy is 'closest' in the Hilbert space metric to the quantum rest) - Downside: You're missing the variance.
Couldn't the quantum mechanical state become multimodal such that the classical approximation picks a state that is far away from the physical reality?
And, couldn't this multimodality excaberate during the actual physical process and possibly arrive at a number of probable outcomes which are never predicted by the simulation? Is there more than hope that that doesn't happen?
Yes, for sure. In practice (and not at the 50,000 ft. level), you do try to include the multimodalities — you don't _really_ just use E[quantum_energy(r)]. But you ARE still reliant on some computable/smooth/Lipschitz moment and/or expectation from the quantum surface. The semi-heuristic argument for why you get away with this in biological simulation is somewhat heuristic, but of the following form:
- Most quantum field theories are described by of the form L(E), where E is an energy level ["effective field theory"] and the Lagragian changes as E change.
- When E is low, L(E) is classical mechanics & EM
- When E is around 1GeV, L(E) is the aforementioned plus QED
- When E is around 100GeV, L(E) has the aforementioned plus some QCD
- When E is at 1 TeV, L(E) has the aforementioned plus Higgs-like stuff
Now biology is on the lowest end of that scale, so you mainly have to deal with QED and perturbative electronic expansion. These electronic expansions are the most important part — you need them to get hydrogen bonding + electrodynamic molecular interactions correct — BUT they are highly local.
This locality is what you take advantage of when you normalize — you find from QM that the potentials only matter when the two charged/polar molecules are close, so you try to make a classical potential that has quantum 'jumps' when these things are close.
Do you miss the purely quantum stuff? Aharanov-Bohm, Chern classes, and the like? Of course. But from a practical standpoint, you do get the structures that you measure from experiment to be correct because the 'cool' quantum with 'tons' of states is less important for pedestrian things at low energy scale.
It is still hard to get right though! There's a lot of entropy you need to localize correctly and in some sense, you have to make sure you get the modes as a function of local particle positions correct.
The final thing to point out is that the Wick rotated path integral stuff works for biology much better than for real HEP-type of stuff because molecules are contained at low energies — those tunneling probabilities are O(h E), and log(E) is still dwarfed by -log(h) so you _can_ safely ignore them.
This is not true for things like circuits, however, because the lithography at EUV scales (3nm -_-) does have tunneling issues at high field strengths.
tl;dr: Biology has some saving graces that give you good approximations. Are they perfect? No, but if you find a time that I have to compute a vanishing first Chern class in a noisy, ugly biological system, then you deserve a Nobel Prize!
I think it's still an open question whether you truly need an ensemble of long simulations (versus many shorter ones that can be generated in embarassingly parallel mode on GPUs).
Nonetheless, I'm thankful for both DESMOND and Anton, which helped push the MD community out of its moribund state in the early 2000s. I still don't think MD simulations produce anything that exceeds the opportunity value of the power they consume, and it seems unclear that they ever will, although I still would love to know the answer to the question: "could sophisticated classic forcefields with reasonable O(nlogn) scaling every be truly valuable as a virtual screening feature generator/evaluation mechanism".
I agree that it is still unsolved, from a practical perspective. I think both sides (the MD vs. ensemble sampling) have incorporated techniques that the other side uses to improve sampling efficiency and accuracy. At the same time, I suspect that the sampling methodology only works with some form of structured guidance, whether it be MD or embedding (a la AlphaFold, which has a few former DESRES people working on it). The raw Monte Carlo methods that people use for 'embarrassing' parallelism often have terrible scaling — the spectral gaps of the Markov chains are abysmal and you only realize that after 1000s of core-hours.
On the other hand, DESRES had been focusing on a lot of acceleration methods that involved hardware optimized HMC-esque methodologies that had reasonably good parallelism. AFAICT the only public description of this work is in the appendices of this paper [0].
At the end of the day, you probably need both techniques because the pure sampling approaches lose fine structure (e.g. binding pockets opening up with anomalous frequency due to water clusters) whereas the standalone MD model has too long of a decorrelation time to get averages to converge.
The early Anton 1 numbers of 17us/day on 100K atoms were huge leap forward then. At that time, GPU-based simulations (e.g. GROMACS/Desmond on GPU) were doing single digit ns/day. Remember, even for 'fast-folding' proteins, the relaxation time is on the order of us and you need 100s of samples before you can converge statistical properties, like folding rates [0]. Anton 2 got a 50-100x speed-up [1] which made it much easier to look at druggable pathways. Anton was also used for studying other condensed matter systems, such as supercooled liquids [2].
Your question of why is this so slow or small is prescient. On the reasons that we have to integrate the dynamical equations (e.g. Newtonian or Hamiltonian mechanics) at small, femtosecond timesteps (1 fs = 1e-15s) is because the vibrational frequencies of bonds are on the order of picoseconds (1 ps = 1e-12s). Given that you also have to compute Omega(n^2) pairwise interactions between n particles, you end up having a large runtime to get to ns and us while respecting bond frequencies. The hard part, for atomistic/all-atom simulation is that n is on the order of 1e5-1e6 for a single protein with 100s of water molecules. The water molecules are extremely important to simulate exactly since you need to get polar phenomena, such as hydrogen bonding, correct to get folded structure and druggable sites correct to angstrom precision (1e-10 meters). If you don't do atomistic simulations (e.g. n is much smaller and you ignore complex physical interactions, including semi-quantum interactions), you have a much harder time matching precision experiments.
[0] https://science.sciencemag.org/content/334/6055/517
[1] https://ieeexplore.ieee.org/abstract/document/7012191/ [the variance comes from the fact that different physics models and densities cause very different run times -> evaluating 1/r^6 vs. 1/r^12 in fixed precision is very different w.r.t communication complexity and Ewald times and FFTs and ...]
[2] https://pubs.acs.org/doi/abs/10.1021/jp402102w