The testsystems Module: pymbar.testsystems

The pymbar.testsystems module contains a number of test systems with analytically or numerically computable expectations or free energies we use to validate its implementation. These test systems are also convenient to use if you want to easily generate synthetic data to experiment with the capabilities of ``pymbar`.

class pymbar.testsystems.harmonic_oscillators.HarmonicOscillatorsTestCase(O_k=(0, 1, 2, 3, 4), K_k=(1, 2, 4, 8, 16), beta=1.0)

Test cases using harmonic oscillators.

Examples

Generate energy samples with default parameters.

>>> testcase = HarmonicOscillatorsTestCase()
>>> [x_kn, u_kln, N_k, s_n] = testcase.sample()

Retrieve analytical properties.

>>> analytical_means = testcase.analytical_means()
>>> analytical_variances = testcase.analytical_variances()
>>> analytical_standard_deviations = testcase.analytical_standard_deviations()
>>> analytical_free_energies = testcase.analytical_free_energies()
>>> analytical_x_squared = testcase.analytical_observable('position^2')

Generate energy samples with default parameters in one line.

>>> (x_kn, u_kln, N_k, s_n) = HarmonicOscillatorsTestCase().sample()

Generate energy samples with specified parameters.

>>> testcase = HarmonicOscillatorsTestCase(O_k=[0, 1, 2, 3, 4], K_k=[1, 2, 4, 8, 16])
>>> (x_kn, u_kln, N_k, s_n) = testcase.sample(N_k=[10, 20, 30, 40, 50])

Test sampling in different output modes.

>>> (x_kn, u_kln, N_k) = testcase.sample(N_k=[10, 20, 30, 40, 50], mode='u_kln')
>>> (x_n, u_kn, N_k, s_n) = testcase.sample(N_k=[10, 20, 30, 40, 50], mode='u_kn')

Generate test case with exponential distributions.

Parameters:

O_k : np.ndarray, float, shape=(n_states)

Offset parameters for each state.

K_k : np.ndarray, float, shape=(n_states)

Force constants for each state.

beta : float, optional, default=1.0

Inverse temperature.

Notes

We assume potentials of the form U(x) = (k / 2) * (x - o)^2 Here, k and o are the corresponding entries of O_k and K_k. The equilibrium distribution is given analytically by p(x;beta,K) = sqrt[(beta K) / (2 pi)] exp[-beta K (x-x_0)**2 / 2] The dimensionless free energy is therefore f(beta,K) = - (1/2) * ln[ (2 pi) / (beta K) ]

classmethod evenly_spaced_oscillators(n_states, n_samples_per_state, lower_O_k=1.0, upper_O_k=5.0, lower_k_k=1.0, upper_k_k=3.0)

Generate samples from evenly spaced harmonic oscillators.

Parameters:

n_states : np.ndarray, int

number of states

n_samples_per_state : np.ndarray, int

number of samples per state. The total number of samples n_samples will be equal to n_states * n_samples_per_state

lower_O_k : float, optional, default=1.0

Lower bound of O_k values

upper_O_k : float, optional, default=5.0

Upper bound of O_k values

lower_k_k : float, optional, default=1.0

Lower bound of O_k values

upper_k_k : float, optional, default=3.0

Upper bound of k_k values

Returns:

name: str

Name of testsystem

testsystem : TestSystem

The testsystem object

x_n : np.ndarray, shape=(n_samples)

Coordinates of the samples

u_kn : np.ndarray, shape=(n_states, n_samples)

Reduced potential energies

N_k : np.ndarray, shape=(n_states)

Number of samples drawn from each state

s_n : np.ndarray, shape=(n_samples)

State of origin of each sample

sample(N_k=[10, 20, 30, 40, 50], mode='u_kn', seed=None)

Draw samples from the distribution.

Parameters:

N_k : np.ndarray, int

number of samples per state

mode : str, optional, default=’u_kn’

If ‘u_kln’, return K x K x N_max matrix where u_kln[k,l,n] is reduced potential of sample n from state k evaluated at state l. If ‘u_kn’, return K x N_tot matrix where u_kn[k,n] is reduced potential of sample n (in concatenated indexing) evaluated at state k.

seed: int, optional, default=None. Provides control over the random seed for replicability.

Returns:

if mode == ‘u_kn’:

x_n : np.ndarray, shape=(n_states*n_samples), dtype=float

x_n[n] is sample n (in concatenated indexing)

u_kn : np.ndarray, shape=(n_states, n_states*n_samples), dtype=float

u_kn[k,n] is reduced potential of sample n (in concatenated indexing) evaluated at state k.

N_k : np.ndarray, shape=(n_states), dtype=float

N_k[k] is the number of samples generated from state k

s_n : np.ndarray, shape=(n_samples), dtype=’int’

s_n is the state of origin of x_n

x_kn : np.ndarray, shape=(n_states, n_samples), dtype=float

1D harmonic oscillator positions

u_kln : np.ndarray, shape=(n_states, n_states, n_samples), dytpe=float, only if mode=’u_kln’

u_kln[k,l,n] is reduced potential of sample n from state k evaluated at state l.

N_k : np.ndarray, shape=(n_states), dtype=int32

N_k[k] is the number of samples generated from state k

pymbar.testsystems.timeseries.correlated_timeseries_example(N=10000, tau=5.0, seed=None)

Generate synthetic timeseries data with known correlation time.

Parameters:

N : int, optional

length (in number of samples) of timeseries to generate

tau : float, optional

correlation time (in number of samples) for timeseries

seed : int, optional

If not None, specify the numpy random number seed.

Returns:

dih : np.ndarray, shape=(num_dihedrals), dtype=float

dih[i,j] gives the dihedral angle at traj[i] correponding to indices[j].

Notes

Synthetic timeseries generated using bivariate Gaussian process described by Janke (Eq. 41 of Ref. [1]).

As noted in Eq. 45-46 of Ref. [1], the true integrated autocorrelation time will be given by tau_int = (1/2) coth(1 / 2 tau) = (1/2) (1+rho)/(1-rho) which, for tau >> 1, is approximated by tau_int = tau + 1/(12 tau) + O(1/tau^3) So for tau >> 1, tau_int is approximately the given exponential tau.

References

[R11]Janke W. Statistical analysis of simulations: Data correlations and error estimation. In ‘Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms’. NIC Series, VOl. 10, pages 423-445, 2002.

Examples

Generate a timeseries of length 10000 with correlation time of 10.

>>> A_t = correlated_timeseries_example(N=10000, tau=10.0)

Generate an uncorrelated timeseries of length 1000.

>>> A_t = correlated_timeseries_example(N=1000, tau=1.0)

Generate a correlated timeseries with correlation time longer than the length.

>>> A_t = correlated_timeseries_example(N=1000, tau=2000.0)
class pymbar.testsystems.exponential_distributions.ExponentialTestCase(rates=[1, 2, 3, 4, 5], beta=1.0)

Test cases using exponential distributions.

Examples

Generate energy samples with default parameters.

>>> testcase = ExponentialTestCase()
>>> [x_kn, u_kln, N_k] = testcase.sample()

Retrieve analytical properties.

>>> analytical_means = testcase.analytical_means()
>>> analytical_variances = testcase.analytical_variances()
>>> analytical_standard_deviations = testcase.analytical_standard_deviations()
>>> analytical_free_energies = testcase.analytical_free_energies()
>>> analytical_x_squared = testcase.analytical_x_squared()

Generate energy samples with default parameters in one line.

>>> [x_kn, u_kln, N_k] = ExponentialTestCase().sample()

Generate energy samples with specified parameters.

>>> testcase = ExponentialTestCase(rates=[1., 2., 3., 4., 5.])
>>> [x_kn, u_kln, N_k] = testcase.sample(N_k=[10, 20, 30, 40, 50])

Test sampling in different output modes.

>>> [x_kn, u_kln, N_k] = testcase.sample(N_k=[10, 20, 30, 40, 50], mode='u_kln')
>>> [x_n, u_kn, N_k, s_n] = testcase.sample(N_k=[10, 20, 30, 40, 50], mode='u_kn')

Generate test case with exponential distributions.

Parameters:

rates : np.ndarray, float, shape=(n_states)

Rate parameters (e.g. lambda) for each state.

beta : float, optional, default=1.0

Inverse temperature.

Notes

We assume potentials of the form U(x) = lambda x.

analytical_free_energies()

Return the FE: -log(Z)

classmethod evenly_spaced_exponentials(n_states, n_samples_per_state, lower_rate=1.0, upper_rate=3.0)

Generate samples from evenly spaced exponential distributions.

Parameters:

n_states : np.ndarray, int

number of states

n_samples_per_state : np.ndarray, int

number of samples per state. The total number of samples n_samples will be equal to n_states * n_samples_per_state

lower_O_k : float, optional, default=1.0

Lower bound of O_k values

upper_O_k : float, optional, default=5.0

Upper bound of O_k values

lower_k_k : float, optional, default=1.0

Lower bound of O_k values

upper_k_k : float, optional, default=3.0

Upper bound of k_k values

Returns:

name: str

Name of testsystem

testsystem : TestSystem

The testsystem object

x_n : np.ndarray, shape=(n_samples)

Coordinates of the samples

u_kn : np.ndarray, shape=(n_states, n_samples)

Reduced potential energies

N_k : np.ndarray, shape=(n_states)

Number of samples drawn from each state

s_n : np.ndarray, shape=(n_samples)

State of origin of each sample

sample(N_k=(10, 20, 30, 40, 50), mode='u_kln', seed=None)

Draw samples from the distribution.

Parameters:

N_k : np.ndarray, int

number of samples per state

mode : str, optional, default=’u_kln’

If ‘u_kln’, return K x K x N_max matrix where u_kln[k,l,n] is reduced potential of sample n from state k evaluated at state l. If ‘u_kn’, return K x N_tot matrix where u_kn[k,n] is reduced potential of sample n (in concatenated indexing) evaluated at state k.

seed: int, optional, default=None. Provides control over the random seed for replicability.

Returns:

if mode == ‘u_kn’:

x_n : np.ndarray, shape=(n_states*n_samples), dtype=float

x_n[n] is sample n (in concatenated indexing)

u_kn : np.ndarray, shape=(n_states, n_states*n_samples), dtype=float

u_kn[k,n] is reduced potential of sample n (in concatenated indexing) evaluated at state k.

N_k : np.ndarray, shape=(n_states), dtype=float

N_k[k] is the number of samples generated from state k

s_n : np.ndarray, shape=(n_samples), dtype=’int’

s_n is the state of origin of x_n

x_kn : np.ndarray, shape=(n_states, n_samples), dtype=float

1D harmonic oscillator positions

u_kln : np.ndarray, shape=(n_states, n_states, n_samples), dytpe=float, only if mode=’u_kln’

u_kln[k,l,n] is reduced potential of sample n from state k evaluated at state l.

N_k : np.ndarray, shape=(n_states), dtype=float

N_k[k] is the number of samples generated from state k

pymbar.testsystems.gaussian_work.gaussian_work_example(N_F=200, N_R=200, mu_F=2.0, DeltaF=None, sigma_F=1.0, seed=None)

Generate samples from forward and reverse Gaussian work distributions.

Parameters:

N_F : int, optional

number of forward measurements (default: 200)

N_R : float, optional

number of reverse measurements (default: 200)

mu_F : float, optional

mean of forward work distribution (default: 2.0)

DeltaF : float, optional

the free energy difference, which can be specified instead of mu_F (default: None)

sigma_F : float, optional

variance of the forward work distribution (default: 1.0)

seed : int, optional

If not None, specify the numpy random number seed. Old state is restored after completion.

Returns:

w_F : np.ndarray, dtype=float

forward work values

w_R : np.ndarray, dtype=float

reverse work values

Notes

By the Crooks fluctuation theorem (CFT), the forward and backward work distributions are related by

P_R(-w) = P_F(w) exp[DeltaF - w]

If the forward distribution is Gaussian with mean mu_F and std dev sigma_F, then

P_F(w) = (2 pi)^{-1/2} sigma_F^{-1} exp[-(w - mu_F)^2 / (2 sigma_F^2)]

With some algebra, we then find the corresponding mean and std dev of the reverse distribution are

mu_R = - mu_F + sigma_F^2 sigma_R = sigma_F exp[mu_F - sigma_F^2 / 2 + Delta F]

where all quantities are in reduced units (e.g. divided by kT).

Note that mu_F and Delta F are not independent! By the Zwanzig relation,

E_F[exp(-w)] = int dw exp(-w) P_F(w) = exp[-Delta F]

which, with some integration, gives

Delta F = mu_F + sigma_F^2/2

which can be used to determine either mu_F or DeltaF.

Examples

Generate work values with default parameters.

>>> [w_F, w_R] = gaussian_work_example()

Generate 50 forward work values and 70 reverse work values.

>>> [w_F, w_R] = gaussian_work_example(N_F=50, N_R=70)

Generate work values specifying the work distribution parameters.

>>> [w_F, w_R] = gaussian_work_example(mu_F=3.0, sigma_F=2.0)

Generate work values specifying the work distribution parameters, specifying free energy difference instead of mu_F.

>>> [w_F, w_R] = gaussian_work_example(mu_F=None, DeltaF=3.0, sigma_F=2.0)

Generate work values with known seed to ensure reproducibility for testing.

>>> [w_F, w_R] = gaussian_work_example(seed=0)