[Disclaimer: I used to work at D. E. Shaw Research from 2011-2016] The early Ant...

madez · on April 9, 2020

This explanation is interesting. Thanks for sharing it. While reading it, I got the impression that the simulation is not fully quantum mechanical, but rather classical with select quantum mechanical effects.

Which parts of quantum mechanics are idealised away and how do we know that not including them won't significantly reduce the quality of the result?

Are you possibly using stochastical noise in the simulations and repeat them multiple times, in the hope that whatever disturbance caused by the idealisation of the model is covered by the noise?

tchitra · on April 9, 2020

That's a good question and there are a number of ways to try to tackle this. One of the main reasons you cannot do QM simulations directly is that the high quality methods can cost Omega(n^6/eps) to get eps. relative accuracy (you can do better with DFT, but then you're making your life hard in other way). At a high-level (and I mean, 50,000 ft. level), here are the simplest way:

1) Do quantum mechanics simulations of interactions of a small number of atoms — two amino acids, two ethanol molecules. Then fit a classical function to the surface E[energy(radius between molecules, angles)], where this expectation operator is the quantum one (over some separable Hilbert space). Now use the approximation for E[energy(r, a)] to act as your classical potential. - Upshot: You use quantum mechanics to decide a classical potential for you (e.g. you chose the classical potential that factors into pairs such that each pair energy is 'closest' in the Hilbert space metric to the quantum surface) - Downside: You're doing this for small N — this ignores triplet and higher interactions. You're missing the variance and other higher moments (which is usually fine for biology, FWIW, but not for, say, the Aharanov-Bohm effect).

2) Path Integral methods: This involves running classical simulation for T timesteps, then sampling the 'quantum-sensitive pieces' (e.g. highly polar parts) in a stochastic way. This works because Wick rotation lets you go from Hamiltonian evolution operator e^{i L}, for a Lagrangian density L, to e^{-L} [0]. You can sample the last density via stochastic methods to add a SDE-like correction to your classical simulation. This way, you simulate the classical trajectory and have the quantum portions 'randomly' kick that trajectory based on a real Lagrangian.

3) DFT-augmented potentials: A little more annoying to describe, but think of this as a combination of the first two methods. A lot of the "Neural Network for MD" stuff falls closer in this category [1]

[0] Yes, assume L is absolutely continuous with regards to whatever metric-measure space and base measure you're defined over :) Physics is more flexible than math, so you can make such assumption and avoid thinking about nuclear spaces and atomic measures until really needed

[1] https://arxiv.org/abs/2002.02948

madez · on April 9, 2020

> Upshot: You use quantum mechanics to decide a classical potential for you (e.g. you chose the classical potential that factors into pairs such that each pair energy is 'closest' in the Hilbert space metric to the quantum rest) - Downside: You're missing the variance.

Couldn't the quantum mechanical state become multimodal such that the classical approximation picks a state that is far away from the physical reality?

And, couldn't this multimodality excaberate during the actual physical process and possibly arrive at a number of probable outcomes which are never predicted by the simulation? Is there more than hope that that doesn't happen?

tchitra · on April 9, 2020

Yes, for sure. In practice (and not at the 50,000 ft. level), you do try to include the multimodalities — you don't _really_ just use E[quantum_energy(r)]. But you ARE still reliant on some computable/smooth/Lipschitz moment and/or expectation from the quantum surface. The semi-heuristic argument for why you get away with this in biological simulation is somewhat heuristic, but of the following form:

- Most quantum field theories are described by of the form L(E), where E is an energy level ["effective field theory"] and the Lagragian changes as E change.

   - When E is low, L(E) is classical mechanics & EM

   - When E is around 1GeV, L(E) is the aforementioned plus QED

   - When E is around 100GeV, L(E) has the aforementioned plus some QCD

   - When E is at 1 TeV, L(E) has the aforementioned plus Higgs-like stuff

Now biology is on the lowest end of that scale, so you mainly have to deal with QED and perturbative electronic expansion. These electronic expansions are the most important part — you need them to get hydrogen bonding + electrodynamic molecular interactions correct — BUT they are highly local.

This locality is what you take advantage of when you normalize — you find from QM that the potentials only matter when the two charged/polar molecules are close, so you try to make a classical potential that has quantum 'jumps' when these things are close.

Do you miss the purely quantum stuff? Aharanov-Bohm, Chern classes, and the like? Of course. But from a practical standpoint, you do get the structures that you measure from experiment to be correct because the 'cool' quantum with 'tons' of states is less important for pedestrian things at low energy scale.

It is still hard to get right though! There's a lot of entropy you need to localize correctly and in some sense, you have to make sure you get the modes as a function of local particle positions correct.

The final thing to point out is that the Wick rotated path integral stuff works for biology much better than for real HEP-type of stuff because molecules are contained at low energies — those tunneling probabilities are O(h E), and log(E) is still dwarfed by -log(h) so you _can_ safely ignore them.

This is not true for things like circuits, however, because the lithography at EUV scales (3nm -_-) does have tunneling issues at high field strengths.

tl;dr: Biology has some saving graces that give you good approximations. Are they perfect? No, but if you find a time that I have to compute a vanishing first Chern class in a noisy, ugly biological system, then you deserve a Nobel Prize!

dekhn · on April 9, 2020

I think it's still an open question whether you truly need an ensemble of long simulations (versus many shorter ones that can be generated in embarassingly parallel mode on GPUs).

Nonetheless, I'm thankful for both DESMOND and Anton, which helped push the MD community out of its moribund state in the early 2000s. I still don't think MD simulations produce anything that exceeds the opportunity value of the power they consume, and it seems unclear that they ever will, although I still would love to know the answer to the question: "could sophisticated classic forcefields with reasonable O(nlogn) scaling every be truly valuable as a virtual screening feature generator/evaluation mechanism".

tchitra · on April 9, 2020

I agree that it is still unsolved, from a practical perspective. I think both sides (the MD vs. ensemble sampling) have incorporated techniques that the other side uses to improve sampling efficiency and accuracy. At the same time, I suspect that the sampling methodology only works with some form of structured guidance, whether it be MD or embedding (a la AlphaFold, which has a few former DESRES people working on it). The raw Monte Carlo methods that people use for 'embarrassing' parallelism often have terrible scaling — the spectral gaps of the Markov chains are abysmal and you only realize that after 1000s of core-hours.

On the other hand, DESRES had been focusing on a lot of acceleration methods that involved hardware optimized HMC-esque methodologies that had reasonably good parallelism. AFAICT the only public description of this work is in the appendices of this paper [0].

At the end of the day, you probably need both techniques because the pure sampling approaches lose fine structure (e.g. binding pockets opening up with anomalous frequency due to water clusters) whereas the standalone MD model has too long of a decorrelation time to get averages to converge.

[0] https://www.pnas.org/content/116/10/4244