Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Really slow performance with some macrocycles #118

Open
bjonnh-work opened this issue Nov 29, 2022 · 8 comments
Open

Really slow performance with some macrocycles #118

bjonnh-work opened this issue Nov 29, 2022 · 8 comments

Comments

@bjonnh-work
Copy link

bjonnh-work commented Nov 29, 2022

We found an issue when using coordgen with rdkit.
When activating coordgen, it is extremely slow to work with such chiral macrocycles.
Removing the chirality from the atoms makes it fast again.

(roughly 100 times slower for the given molecule).

from rdkit.Chem import rdDepictor
from rdkit import Chem

### This is slow
rdDepictor.SetPreferCoordGen(True)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))

### This is fast
rdDepictor.SetPreferCoordGen(False)
mol = Chem.MolToMolBlock(Chem.MolFromSmiles("C[C@@H]1CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN[C@H](C)CCCCCCCCC(=O)OCCN1"))

This is a crossposted issue with rdkit rdkit/rdkit#5813

@d-b-w
Copy link
Collaborator

d-b-w commented Nov 29, 2022

Thanks for the report! this is very interesting

@bjonnh-work
Copy link
Author

I'm still trying to figure out the impact of chirality, it seems that in some cases it may be even slower without chirality.

@bjonnh-work
Copy link
Author

Ran some better benchmarks, for that molecule at least, it seems the chirality has no impact.

@bjonnh-work bjonnh-work changed the title Really slow performance with chiral macrocycles Really slow performance with some macrocycles Nov 29, 2022
@bjonnh-work
Copy link
Author

However, the more flexible it is the more time it takes.
This takes twice the time as the previous molecule: CC1CCCCCCCCCOCCNCCCCCCCCCCOCCNC(C)CCCCCCCCC(=O)OCCN1

@bjonnh-work
Copy link
Author

CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2m on my machine
CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1 takes 2s

I think I'll throw a profiler at that

@bjonnh-work
Copy link
Author

Minimal example that doesn't require rdkit

BOOST_AUTO_TEST_CASE(SlowMacrocycle)
{
    auto mol = "CC1CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC1"_smiles;
    BOOST_TEST(mol->getBonds()[0]->getBondOrder() == 1);
    sketcherMinimizer minimizer;
    minimizer.initialize(mol.get());
    minimizer.runGenerateCoordinates();
    const auto& atoms = minimizer.getAtoms();
    sketcherMinimizerAtom* center = atoms.at(0);
    BOOST_REQUIRE_EQUAL(center->getAtomicNumber(), 6);
}

@tadhurst-cdd
Copy link

The slowdown occurs in file:
.../coordgen/CoordgenMacrocycleBuilder.cpp
At line 682:
if (checkedMacrocycles > MAX_MACROCYCLES) {
break;
}
MAX_MACROCYCLES is set to 40, and it takes a long time to get there for the bad mol.
OR, the acceptableScore calculated and checked just above that could be the issue.

  It is calculated as: numberOfAtoms * SUBSTITUTED_ATOM_RESTRAINT / 2
  and   SUBSTITUTED_ATOM_RESTRAINT is 10
  
  Just FYI

@d-b-w
Copy link
Collaborator

d-b-w commented Dec 2, 2022

Ok, the general slowness on macrocycles is a known issue. We're also tracking performance issues here: #39 and in Schrödinger's internal bug tracker. We have efforts underway to sidestep this, so we'll probably incorporate these as test cases. That project is somewhat long-term, though.

Your team is welcome to submit a patch if you have suggestions, of course!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants