[WIP] Free energy fitting #54

maxentile · 2020-10-29T14:42:25Z

Translating numerical demonstrations from https://github.com/openforcefield/bayes-implicit-solvent#differentiable-atom-typing-experiments , upgrading to use message-passing rather than fingerprints + feedforward model.

Hiccup: porting the autodiff-friendly implementation of GBSA OBC energy from Jax to PyTorch wasn't as simple as replacing np. with torch. -- I need to track down a likely unit bug I introduced during the conversion, and pass an OpenMM consistency assertion, before merging.

… to 79 character lines oops, my IDE had been on 120 character lines this whole time -- switching to 79 character lines so future black passes don't turn things into quadruply nested messes

…with a graph-net in the loop it's fishy that the initial hydration free energy prediction is so poor i suspect i may have made a unit mistake in my numpy/jax --> pytorch port

… PyTorch GBSA

…another function that converts to espaloma unit system

@yuanqing-wang

…ssertions Thanks to @yuanqing-wang for carefully stepping through this with me Co-Authored-By: Yuanqing Wang <[email protected]>

Obtained from https://raw.githubusercontent.com/MobleyLab/FreeSolv/ff0961a3177863c8002e8487ff9718c513974138/database.txt

…et using parsley 1.2

the line to compute the reduced work was written as if "solv_energies" was "valence_energies + gbsa_energies" but of course it was just "gbsa_energies"...

in column `quick_xyz` -- will shortly replace this with a column `xyz` with more thorough parsley 1.2 vacuum sampling

… negative delta g predictions

jchodera · 2020-10-30T05:37:34Z

espaloma/mm/implicit.py

    f = torch.sqrt(r ** 2 + torch.ger(B, B) * torch.exp(
        -r ** 2 / (4 * torch.ger(B, B))))
    charge_products = torch.ger(charges, charges)
-    assert (f.shape == (N, N))
-    assert (charge_products.shape == (N, N))

    ixns = - (


Isn't this missing a -138.935485 conversion from nm/(proton_charge**2) to kJ/mol? The docstring says "everything is in OpenMM native units".

Comparing with https://github.com/openmm/openmm/blob/master/wrappers/python/simtk/openmm/app/internal/customgbforces.py#L381-L382

It looks like you might pre-multiply the charges by sqrt(138.935485)? If so, you should probably document that in the docstring.

Gahh -- you're right -- I had dropped this in the current conversion! Thank you for catching this. Charges are not assumed to be premultiplied by sqrt(138.935485) , will clarify docstring...

(This conversion was present but poorly labeled in the numpy/jax implementation in bayes-implicit-solvent.)

@jchodera

thanks to @jchodera for spotting this! #54 (comment) Co-Authored-By: John Chodera <[email protected]>

* confirmed that it can overfit to a small subset of FreeSolv! 🎉 * RMSE on whole of freesolv hasn't yet matched the quality of OBC2 🙏

…ze: validation set RMSE 1.8kcal.mol * increased stepsize 1e-3 rather than 1e-4 * decreased layer and node dimensions from 128 to 32

maxentile · 2020-10-30T17:26:47Z

Incorporating the missing unit conversion John identified, this now appears to be passing integration checks in the demo notebook. A graph-net is used to emit (n_atoms, 2) per-particle GBSA parameters given an input molecular graph. These per-particle parameters are then passed into a PyTorch GBSA implementation (along with cached vacuum samples) to produce one-sided EXP estimates of hydration free energy. A loss function is defined in terms of the estimated vs. experimental hydration free energies, differentiated w.r.t. the graph-net parameters, and optimized using Adam.

Can this procedure overfit a GBSA-parameter-emitting graph-net to a small random subsample of FreeSolv (N=10)? Yes:

Can this procedure fit a graph-net to a random half of FreeSolv (N=321) and generalize to the other half (N=321)? Tentatively yes:

A few more important refinements and unit tests are needed before this is ready for final review and merge, but this is now significantly less fishy than it was yesterday.

+ a few formatting and documentation enhancements

* define random seed once at top of file rather than before each step * remove verbose flag * use same learning rate, n_iterations, n_mols_per_batch, n_snapshots_per_mol for both trajectories

element # of molecules containing it C 639 H 629 O 344 N 169 Cl 114 S 40 F 35 Br 25 P 14 I 12

* add training / validation curves for early-stopping * add bootstrapped rmses to final scatterplots

maxentile · 2020-10-31T04:30:26Z

To address concern about elements that appear only a handful of times in FreeSolv, see this notebook counting the number of molecules in FreeSolv containing each element.

element    # of molecules containing it
C          639
H          629
O          344
N          169
Cl         114
S          40
F          35
Br         25
P          14
I          12

A related question is: if we filter the molecules to retain only certain subsets of elements, how many molecules do we retain?

Enumerating one sequence of element subsets (including elements in descending order of "popularity"):

elements                                 # molecules     coverage
{C, H}                                   103             16.0%
{C, H, O}                                300             46.7%
{C, H, N, O}                             431             67.1%
{C, Cl, H, N, O}                         529             82.4%
{C, Cl, H, N, O, S}                      559             87.1%
{C, Cl, F, H, N, O, S}                   591             92.1%
{Br, C, Cl, F, H, N, O, S}               616             96.0%
{Br, C, Cl, F, H, N, O, P, S}            630             98.1%
{Br, C, Cl, F, H, I, N, O, P, S}         642             100.0%

A not-so-challenging subset of FreeSolv -- that should be free of the infrequently-occurring-element concern -- is the collection of molecules containing only {C, H, O}. This demo notebook fits a GB-parameter-emitting graph-net on this set in about 40 CPU minutes.

Training and validation RMSE are reported every epoch for this "mini-Freesolv" subset:

The same plot, zoomed in on the y range 0.5-2.5 kcal/mol

In this run, the lowest validation-set RMSE happened to be encountered at the very last epoch, but that wouldn't be expected in general due to noise in gradient estimates (and especially if run longer).

Plotting predicted vs. reference scatter plots for training and validation sets at that last epoch (labeled with RMSE +/- 95% bootstrapped CI):

Similar plots could easily be generated for every other "mini-Freesolv" enumerated above. If there's an apparent difference between the more restricted vs. the more complete "mini-Freesolv"s, that might be suggestive of difficulty arising from sparsely sampled elements / chemical environments.

run 10ns of md per molecule (rather than the measly 0.01ns per molecule in 5866029 )

jchodera · 2020-10-31T05:09:57Z

Looks great!

How about we run with this for the next bioRxiv update (and thesis) and revisit compound splitting on a larger set (maybe including N and Cl) in the next iteration (after thesis submission)?

validation-set rmse ~1.2-1.5 kcal/mol

maxentile · 2020-10-31T13:52:55Z

Looks great!

Thanks!

How about we run with this for the next bioRxiv update (and thesis) and revisit compound splitting on a larger set (maybe including N and Cl) in the next iteration (after thesis submission)?

Sounds good -- compound splitting is subtle and not the primary focus of this demonstration.

Because it was convenient (change one line, wait 30 minutes), I re-ran the notebook on the {C, H, O, N, Cl} FreeSolv subset (n=529) to get a preview

Noting one observation for when we return to this:

In this run, the validation loss increased for ~10 epochs before decreasing again. Early-stopping requires the user to pre-specify a "patience" parameter (how many iterations without improvement to tolerate before stopping), and this example suggests it might be better to choose a "patience" >= 10 epochs. Will sync with @yuanqing-wang about how this patience parameter is currently selected.

using more thorough vacuum md, specified here 8e50eec

anecdotally, this appears to increase the training-set vs. validation-set error gap, suggesting that insufficient equilibrium sampling might have made the validation-set performance reported in #54 (comment) look more favorable than it should!

maxentile · 2020-10-31T18:20:06Z

To hone in on the version of these results that will be reported in the biorxiv update (and thesis), I re-ran the notebook from #54 (comment) , on the updated equilibrium snapshots cached from more thorough vacuum MD.

These updated results should supersede the earlier results.

Anecdotally (based on one run with snapshots from short MD vs. one run with snapshots from thorough MD), this update appears to have increased the training-set vs. validation-set error gap, suggesting that insufficient equilibrium sampling might have made the validation-set performance reported in #54 (comment) look more favorable than it should.

jchodera · 2020-10-31T19:40:40Z

Interesting finding! But I agree that behavior is much closer to what I would have expected from training vs validation error.
Let's run with this for the bioRxiv/thesis update!

maxentile · 2020-11-02T16:47:16Z

Repeated this, but with 10x longer optimization trajectories, and with KFold 90%/10% splitting rather than a single 50%/50% split.

Reporting final training and validation set performance for each of the 10 splits.

jchodera · 2020-11-02T18:25:56Z

Is this train/validate/test with early stopping, or is it just train/validate with 10% of the dataset split out and no early stopping (with cross-validation over the 10% held-out sets intended to be representative of the test set error)?

Are we concerned at all with the experimental strategies being vastly different between the different experiments in the paper for no particular reason?

maxentile · 2020-11-02T19:03:12Z

Is this train/validate/test with early stopping

No early stopping.

is it just train/validate with 10% of the dataset split out and no early stopping (with cross-validation over the 10% held-out sets intended to be representative of the test set error)?

Correct. I'm not aiming to do any hyperparameter selection informed by this experiment, just aiming to report on the repeatability / variability of the training procedure if the dataset were slightly different, and to report an estimate of the generalization error of this specific procedure on the chemistry represented by this specific dataset.

In the previous plot, I showed just a single 50/50 split. Would that plot look different if the random seed were different? The way to measure that is to repeat multiple times with different random splits and report all results. The ideal would be to approach leave-one-out (run the procedure 300 times on each of the n=299-size subsets of the data). K-fold is a common compromise.

Are we concerned at all with the experimental strategies being vastly different between the different experiments in the paper for no particular reason?

John, I think the different approaches in progress partly reflect differing goals -- here I'm picking a single hyperparameter choice, and aiming to report on the variability / repeatability of the training procedure.

The valence-fitting experiments I think are still highly sensitive to various hyperparameter choices, and the goal of those ongoing experiments is still I think to select good hyperparameters.

Experiments constructed to simultaneously select hyperparameters and estimate the generalization error once hyperparameters are selected must take care to do nested cross-validation or use a held-out test set that is only ever consulted once.

jchodera · 2020-11-02T19:18:15Z

Thanks for the clear explanations! Let's make sure the experimental section describes the motivation and conception of this design, both in the presentation of results and Detailed Methods! Those subtleties will be lost on the reader unless we make them explicit.

…alculations vs. experiment on the {C, H, O} subset

maxentile · 2020-11-04T18:18:52Z

Noting here a few more to-do's (of undecided priority level), that I think would help shore up and contextualize these results:

Compare to the baseline of optimizing only the 6 continuous parameters in this model with a fixed (elemental) atom-typing scheme, to measure how much improvement is due to learned "chemical perception" vs. just learned parameters. So far, the results in this PR do not address this question, but only demonstrate feasibility of performing the optimization with learned "chemical perception" in the loop.
Compare to the baseline of using the graph model to predict the hydration free energy directly from the chemical graph (rather than predicting physical simulation parameters that in turn imply a hydration free energy), as @yuanqing-wang suggested to me on Monday. So far, the results in this PR do not address the question of whether incorporating a physical model in this task improves predictive performance relative to the fully "black-box" version of the approach.
Incorporate MBAR reweighting rather than forward-EXP reweighting (as in https://gist.github.com/maxentile/1568531f2f39b5a84e263a1ab8d963b5#file-sample_parameters_using_autograd_and_pymbar-py-L197-L260 , neutromeratio, and openff-evaluator) to reduce concern about reweighting estimator reliability. Further, store and report asymptotic uncertainty of the reweighting estimator to confirm its reliability. So far, the results in this PR assume (plausibly, based on prior experience) that the forward-EXP estimator is reliable for this specific task, but this assumption should either be avoided or its validity quantified. (Additionally, using MBAR in place of EXP should reduce the gradient estimator variance, which may have an impact on the behavior of the stochastic optimizer.)

Observations: * n=10 overfitting seems to achieve a lower error than previously * 50/50 train/validate seems to initialize and optimize at a higher error than in first version

maxentile and others added 22 commits October 22, 2020 15:03

Create run on openff-1.2 training dataset.ipynb

79d125f

use offmol_indices, handle case of no propers / impropers, and adhere…

e3454ac

… to 79 character lines oops, my IDE had been on 120 character lines this whole time -- switching to 79 character lines so future black passes don't turn things into quadruply nested messes

allow to .sum(dim=1) even when there are no impropers

d3f236d

allow impropers to have length 0 in valencemodel

10268d5

add gbsa port

9c350ee

oof, avoid pytorch in-place modification of input argument tensors

cf14a20

gahh, careless variable-name typo

0c20732

wip demo notebook for fitting to a hydration free energy calculation …

32f7d2d

…with a graph-net in the loop it's fishy that the initial hydration free energy prediction is so poor i suspect i may have made a unit mistake in my numpy/jax --> pytorch port

port @proteneer's GBSA implementation instead

dfce74c

repeat fitting-to-free-energies notebook with less-likely-to-be-buggy…

6a02086

… PyTorch GBSA

add reference implementation from bayes-implicit-solvent

b5fa1e9

refactor gbsa_obc2_energy into a function in openmm unit system, and …

b77794c

…another function that converts to espaloma unit system

increase descriptiveness in gbsa implementation, add thorough shape a…

eb73857

…ssertions Thanks to @yuanqing-wang for carefully stepping through this with me Co-Authored-By: Yuanqing Wang <[email protected]>

import FreeSolv database v0.52

745c6ac

Obtained from https://raw.githubusercontent.com/MobleyLab/FreeSolv/ff0961a3177863c8002e8487ff9718c513974138/database.txt

remove tensor-shape-printing statements

a88fe91

create pandas dataframe with serialized openmm systems for freesolv s…

3773dd5

…et using parsley 1.2

ooooof. fix silly mistake in fitting-to-free-energies notebook

22d314f

the line to compute the reduced work was written as if "solv_energies" was "valence_energies + gbsa_energies" but of course it was just "gbsa_energies"...

also handle cases like methane where len(propers) == 0

dfdeac0

save also xyz coordinates from brief md

5866029

in column `quick_xyz` -- will shortly replace this with a column `xyz` with more thorough parsley 1.2 vacuum sampling

remove **kwargs to try to play nice with torchscript jit

47931bd

remove temporary assert statements in _gbsa_obc2_energy_omm

d812bdd

must be a remaining sign-flip error -- seems like it's unable to make…

fc2fbe7

… negative delta g predictions

jchodera reviewed Oct 30, 2020

View reviewed changes

maxentile and others added 4 commits October 30, 2020 11:55

add missing conversion from nm/(proton_charge**2) to kJ/mol

b564fe6

thanks to @jchodera for spotting this! #54 (comment) Co-Authored-By: John Chodera <[email protected]>

update gbsa docstring

08a2bcf

update freesolv-fitting notebook

d5ae749

* confirmed that it can overfit to a small subset of FreeSolv! 🎉 * RMSE on whole of freesolv hasn't yet matched the quality of OBC2 🙏

re-run demo notebook with increased stepsize and decreased network si…

bacb705

…ze: validation set RMSE 1.8kcal.mol * increased stepsize 1e-3 rather than 1e-4 * decreased layer and node dimensions from 128 to 32

maxentile added 2 commits October 30, 2020 15:05

ipynb --> py

6d4c831

+ a few formatting and documentation enhancements

refine fit_freesolv.py script

9ffa304

* define random seed once at top of file rather than before each step * remove verbose flag * use same learning rate, n_iterations, n_mols_per_batch, n_snapshots_per_mol for both trajectories

maxentile added 3 commits October 30, 2020 21:42

notebook reporting on element coverage in freesolv

c63bc05

element # of molecules containing it C 639 H 629 O 344 N 169 Cl 114 S 40 F 35 Br 25 P 14 I 12

oops, forgot nitrogen!

6db4016

notebook fitting to {C, H, O} mini-freesolv

9857f52

* add training / validation curves for early-stopping * add bootstrapped rmses to final scatterplots

oodles o' vacuum samples

8e50eec

run 10ns of md per molecule (rather than the measly 0.01ns per molecule in 5866029 )

maxentile added 3 commits October 31, 2020 09:26

set openmm_cpu_threads to 1

1f1f520

Create fit to {C, H, O, N, Cl} subset of freesolv (n=529).ipynb

38bbf03

validation-set rmse ~1.2-1.5 kcal/mol

oops fix plot labels

7de7b6c

maxentile added 4 commits October 31, 2020 11:23

merge freesolv vacuum sample records

2643694

git lfs track freesolv_vacuum_samples.npz (279MB)

d24b5bf

add xyz column to freesolv_with_samples.h5

e6d31d9

using more thorough vacuum md, specified here 8e50eec

maxentile added 4 commits November 1, 2020 23:27

add experiment script for k-fold cv

a38812d

oops, don't indent all the relevant stuff out of the training loop!

ac80468

git lfs track each of the K=10-fold CV trajectories

7c76087

add notebook to plot k-fold cv results

f56999c

add a horizontal line depicting RMSE of FreeSolv's explicit-solvent c…

e01ab88

…alculations vs. experiment on the {C, H, O} subset

maxentile added 4 commits September 3, 2021 11:41

Add todos

0a06d43

Add SAGEConv

3c7eee4

Address GraphSAGE todo

fad31a5

Update PDF figures

ccea6ba

Observations: * n=10 overfitting seems to achieve a lower error than previously * 50/50 train/validate seems to initialize and optimize at a higher error than in first version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Free energy fitting #54

[WIP] Free energy fitting #54

maxentile commented Oct 29, 2020

jchodera Oct 30, 2020

jchodera Oct 30, 2020

jchodera Oct 30, 2020

maxentile Oct 30, 2020

maxentile commented Oct 30, 2020

maxentile commented Oct 31, 2020

jchodera commented Oct 31, 2020

maxentile commented Oct 31, 2020

maxentile commented Oct 31, 2020

jchodera commented Oct 31, 2020

maxentile commented Nov 2, 2020

jchodera commented Nov 2, 2020

maxentile commented Nov 2, 2020

jchodera commented Nov 2, 2020

maxentile commented Nov 4, 2020

[WIP] Free energy fitting #54

Are you sure you want to change the base?

[WIP] Free energy fitting #54

Conversation

maxentile commented Oct 29, 2020

jchodera Oct 30, 2020

Choose a reason for hiding this comment

jchodera Oct 30, 2020

Choose a reason for hiding this comment

jchodera Oct 30, 2020

Choose a reason for hiding this comment

maxentile Oct 30, 2020

Choose a reason for hiding this comment

maxentile commented Oct 30, 2020

maxentile commented Oct 31, 2020

jchodera commented Oct 31, 2020

maxentile commented Oct 31, 2020

maxentile commented Oct 31, 2020

jchodera commented Oct 31, 2020

maxentile commented Nov 2, 2020

jchodera commented Nov 2, 2020

maxentile commented Nov 2, 2020

jchodera commented Nov 2, 2020

maxentile commented Nov 4, 2020