GitHub - msmdev/gpcca: GPCCA: Generalized Perron Cluster Cluster Analysis program to coarse-grain reversible and non-reversible Markov State Models.

msmdev / gpcca Public

Notifications You must be signed in to change notification settings
Fork 1
Star 3

GPCCA: Generalized Perron Cluster Cluster Analysis program to coarse-grain reversible and non-reversible Markov State Models.

LGPL-3.0 license

3 stars 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
Algorithm.png		Algorithm.png
LICENSE.txt		LICENSE.txt
README.txt		README.txt
SRSchur_num_t.m		SRSchur_num_t.m
cluster_by_isa.m		cluster_by_isa.m
count.txt		count.txt
do_schur.m		do_schur.m
fillA.m		fillA.m
find_twoblocks.m		find_twoblocks.m
getopt.m		getopt.m
gpcca.m		gpcca.m
gram_schmidt_mod.m		gram_schmidt_mod.m
indexsearch.m		indexsearch.m
initialize_A.m		initialize_A.m
initialize_optional.m		initialize_optional.m
initialize_workspace.m		initialize_workspace.m
inputT.m		inputT.m
load_t.m		load_t.m
main.m		main.m
main_nlscon.m		main_nlscon.m
nearly_equal.m		nearly_equal.m
nlscon.m		nlscon.m
numeric_t.m		numeric_t.m
objective.m		objective.m
opt_soft.m		opt_soft.m
optimization_loop.m		optimization_loop.m
plotmatrix.m		plotmatrix.m
postprocess.m		postprocess.m
problem_pcca_nlscon.m		problem_pcca_nlscon.m
regressionTest.m		regressionTest.m
save_t.m		save_t.m
unitTests.m		unitTests.m
use_minChi.m		use_minChi.m

Repository files navigation

This file is part of GPCCA.

If you use this code or parts of it, cite the following reference:

Reuter, B., Weber, M., Fackeldey, K., Röblitz, S., & Garcia, M. E. (2018). Generalized
Markov State Modeling Method for Nonequilibrium Biomolecular Dynamics: Exemplified on
Amyloid β Conformational Dynamics Driven by an Oscillating Electric Field. Journal of
Chemical Theory and Computation, 14(7), 3579–3594. https://doi.org/10.1021/acs.jctc.8b00079

GPCCA is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
------------------------------------------------------------------------------------------
For questions or support contact

Bernhard Reuter
[email protected]

or fill a pull request.
------------------------------------------------------------------------------------------

Download the code at: https://github.com/msmdev/gpcca

------------------------------------------------------------------------------------------

Please consider using the further developed Python version of GPCCA - *pyGPCCA* -
especially, if you are working with large transition matrices with more than a few
thousand rows/columns. pyGPCCA is significantly more performant, if one uses the
optional support for sorted partial Schur decompositions utilizing the SPLEPc library.
Further, GPCCA is not designed to work with sparse matrices. If a sparse matrix is passed
to GPCCA, it is simply densified and thus the RAM consumption may increase dramatically.
On the contrary, pyGPCCA was designed to work equally well with dense and sparse matrices
and makes use of sparse data structures internally whenever possible, if supplied with a
sparse transition matrix.

*pyGPCCA* can be found here:
https://github.com/msmdev/pyGPCCA

Documentation is available here:
https://pygpcca.readthedocs.io/en/latest/

------------------------------------------------------------------------------------------
How to use GPCCA:

Firstly, you will need an installation of MATLAB. Since GPCCA was developed and tested
with MATLAB version 2015b, it is advised to use this version but later versions should
work as well.
To execute GPCCA, start MATLAB and run the script main.m. This script contains the
relevant input parameters, which are preset for the example of butane (the count matrix of
butane is stored in the file count.txt). The script executes the main program gpcca.m,
which will ask you to supply the name of a input file containing the count matrix
characterizing your system/process. This count matrix needs to be generated in advance
from your data (e.g. molecular dynamics time series data) as described in detail in the
above reference. There you can also find details about using GPCCA on protein
conformational dynamics data gained from molecular dynamics simulations. You might also
take a look at the flowchart supplied in the file Algorithm.png.

The program is pleasant to use interactively and automatically stores a whole range of
interesting quantities and plots. The following procedure is advised:

1. If interactively selected: Use the minChi criterion for an interval of cluster numbers
(e.g., 2 to 10) specified in the main.m script to obtain a graph of the minChi's.

2. Based on the minChi graph the interval of cluster numbers (e.g., 2 to 5), within which
optimization is performed for each cluster number using Gauss-Newton, Nelder-Mead, or
Levenberg-Marquardt, is selected and inputted. Here it is advisable to use Gauss-Newton,
since even if it does not converge to the minimum it will come very close to it and is
extremely fast. A graph of the crispness is outputted, which serves to identify the
optimal cluster number for a final optimization, if one is desired.
The optimization loop can be executed in parallel (via parfor, by setting the option
wk.parallel = 1) or serial (wk.parallel = 0). This saves hours and days if you have
multiple cores and MATLABs Parallel Programming Toolbox.

3. If specified in the main.m script: Interactively, an predetermined optimal cluster
number (or an interval of predetermined cluster numbers) is entered and the final
optimization takes place - Nelder-Mead with a ( possibly very) large number of iterations
has proven to be a good choice here.
The optimization loop can be executed in parallel (via parfor, by setting the option
iopt.parallel = 1) or serial (iopt.parallel = 0).

There are two test suites that are very easy to use:

The unitTests Testsuite checks the functionality of each individual function, the
regressionTest Testsuite checks the correct functioning of the entire program (i.e. of
gpcca.m, which calls all other functions) based on known results for the butane count
matrix. Extensive test results and reports are stored in folders created by the tests -
these do not have to be checked manually and are only for archiving (can be deleted).

The test suites are called by run(unitTests) and run(regressionTest) respectively. Then
the desired precision is chosen in which the test is performed: 'mp' (multiprecision) is
only selectable if you have installed the Multiprecision Toolbox of Advanpix, 'double' is
the Matlab standard.

Also for the overall program the precision can and must be determined. This happens in the
main.m script and is by default set to 'double' there.
WARNING: Don't set a precision bellow 'double'!!! NEVER use single precision!!!