2.17. Analysis Notebook#

A copy of this notebook is run to analyse the molecular dynamics simulations. The type of MD simulation is specified in the Snakemake rule as a parameter, such that it is accessible via: snakemake.params.method.

There are various additional analysis steps, that are included in the notebook, but are not part of the paper. To turn these on, set the beta_run parameter to True.

There are also some commented out lines in the notebook. These are mainly for the purpose of debugging. Some of them are for interactively exploring the 3d structure of the system. These don’t work as part of the automated snakemake workflow, but can be enabled when running a notebook interactively.

#Check if we should use shortened trajectories for analysis.
if snakemake.config["shortened"]:
    print("Using shortened trajectories and dihedrals. This only works if these were created previously!")
    if not (os.path.exists(snakemake.params.traj_short) 
            and os.path.exists(snakemake.params.dihedrals_short)
            and os.path.exists(snakemake.params.dPCA_weights_MC_short)
            and os.path.exists(snakemake.params.weights_short)
           ):
        raise FileNotFoundError("Shortened trajectories and dihedrals files do not exist, but config value is set to use shortened files! Switch off the use of shortenend files and first analyse this simulation using the full trajectory!")
    else:
        use_shortened = True
else:
    use_shortened = False

# Imports
import matplotlib
import mdtraj as md
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
import matplotlib.image as mpimg

# set matplotlib font sizes
SMALL_SIZE = 9
MEDIUM_SIZE = 11
BIGGER_SIZE = 13

plt.rc('font', size=MEDIUM_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=BIGGER_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

DPI = 600

import scipy.cluster.hierarchy
from scipy.spatial.distance import squareform
import pandas as pd

sys.path.append(os.getcwd())
import src.dihedrals
import src.pca
import src.noe
import src.Ring_Analysis
import src.stats
from src.pyreweight import reweight
from src.utils import json_load, pickle_dump
from sklearn.manifold import TSNE
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors
from sklearn.decomposition import PCA
import nglview as nv
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
import py_rdl
import seaborn as sns

IPythonConsole.molSize = (900, 300)  # (450, 150)
IPythonConsole.drawOptions.addStereoAnnotation = True
IPythonConsole.drawOptions.annotationFontScale = 1.5
import tempfile
import io
import svgutils.transform as sg
import svgutils.compose as sc
import scipy.stats as stats
from IPython.display import display, Markdown

# Can set a stride to make prelim. analysis faster. for production, use 1 (use all MD frames)
stride = int(snakemake.config["stride"])
print(f"Using stride {stride} to analyse MD simulations.")
# Perform additional analysis steps (e.g. compute structural digits)
beta_run = False

# Analysing compound
compound_index = int(snakemake.wildcards.compound_dir)
simtime = float(snakemake.wildcards.time)

# Storage for overview figure
final_figure_axs = []

Using stride 1 to analyse MD simulations.

2.17.1. Compound details#

display(Markdown(f"This notebook refers to compound {compound_index}."))

compound = json_load(snakemake.input.parm)
multi = compound.multi
if multi:
    display(Markdown(
        "According to the literature reference, there are two distinct structures in solution."
    ))
else:
    display(Markdown(
        "According to the literature reference, there is only one distinct structure in solution."
    ))
display(Markdown(f"""The sequence of the compound is **{compound.sequence}**. \n
A 2d structure of the compound is shown below."""))

This notebook refers to compound 55.

According to the literature reference, there is only one distinct structure in solution.

The sequence of the compound is cyclo-[Ser-Tyr-Ser-Met-Glu-His-Phe-Arg-Trp-Gly].

A 2d structure of the compound is shown below.

2.17.2. Simulation details#

# TODO: change notebook that it supports use of a shortened trajectory file

# only load protein topology
topo = md.load_frame(snakemake.input.traj, 0, top=snakemake.input.top)
protein = topo.topology.select("protein or resname ASH")
display(Markdown(f"The following atom numbers are part of the protein: {protein}"))

The following atom numbers are part of the protein: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166]

# Stereo check 1-frame trajectory to tmp-pdb file
t_stereo_check = topo.restrict_atoms(topo.topology.select("protein or resname ASH"))
tf = tempfile.NamedTemporaryFile(delete=False)
# tf.name
t_stereo_check.save_pdb(tf.name)

# Get reference mol
mol_ref = Chem.MolFromMol2File(
    snakemake.input.ref_mol,
    removeHs=False,
)

# Get 1st frame pdb from tempfile
post_eq_mol = Chem.MolFromPDBFile(
    tf.name,
    removeHs=False,
    sanitize=False,
)

# could compare smiles to automate the stereo-check. Problem: mol2 reference file has wrong bond orders
# (amber does not write those correctly). The ref-pdb file cannot be read b/c geometry is not optimized.
# This leads to funky valences in rdkit. The post-eq pdb file reads fine but then charges etc. dont match
# with the reference (b/c of wrong bond orders). But can manually check that all stereocentres are correct (below)
Chem.CanonSmiles(Chem.MolToSmiles(post_eq_mol)) == Chem.CanonSmiles(
    Chem.MolToSmiles(mol_ref)
)
display(Markdown("""Following we compare an annotated 2d structure of the compound's starting topology, with the 
                 topology post equilibration"""))

Following we compare an annotated 2d structure of the compound's starting topology, with the topology post equilibration

post_eq_mol.RemoveAllConformers()
display(Markdown("2d structure of the compound post equilibration:"))
post_eq_mol

2d structure of the compound post equilibration:

../_images/28159d44aa267024_GaMD_processed_12_1.png

mol_ref.RemoveAllConformers()
display(Markdown("2d structure of the compound reference topology:"))
mol_ref

2d structure of the compound reference topology:

../_images/28159d44aa267024_GaMD_processed_13_1.png

# load trajectory
display(Markdown("Now we load the MD trajectory."))
if not use_shortened:
    t = md.load(
        snakemake.input.traj, top=snakemake.input.top, atom_indices=protein, stride=stride
    )  # added strideint for GaMD 2k
    print(t)
    # Remove solvent from trajectory
    t = t.restrict_atoms(t.topology.select("protein or resname ASH"))
    t = t.superpose(t, 0)

    # for GaMD, skip equlibration...
    if snakemake.params.method == "GaMD":
        weight_lengths = np.loadtxt(snakemake.input.weights)
        weight_lengths = int(len(weight_lengths))
        frames_start = int(t.n_frames - weight_lengths)
        t = t[int(frames_start / stride) :]  # added 13000 instead of 26000 for 2k
    else:
        frames_start = 0
    print(t)
else:
    stride = 1  # set stride to 1 for shortened files!
    t = md.load(
        snakemake.params.traj_short, top=snakemake.input.top, atom_indices=protein, stride=1
    )  # added strideint for GaMD 2k
    t = t.restrict_atoms(t.topology.select("protein or resname ASH"))
    t = t.superpose(t, 0)

Now we load the MD trajectory.

<mdtraj.Trajectory with 513000 frames, 167 atoms, 10 residues, and unitcells>
<mdtraj.Trajectory with 500000 frames, 167 atoms, 10 residues, and unitcells>

The simulation type is GaMD, 2000 ns. The simulation was performed in H2O.

There are a total of 500000 frames available to analyse.

# Create a short trajectory & weights if working with the full trajectory
if not use_shortened:
    # determine stride to get 10k frames:
    stride_short = int(t.n_frames / 10000)
    if stride_short == 0:
        stride_short = 1

    # save short trajectory to file
    t[::stride_short].save_netcdf(snakemake.params.traj_short)
    
    # load weights for GaMD
    if snakemake.params.method != "cMD":
        weight_data = np.loadtxt(snakemake.input.weights)
        weight_data = weight_data[::stride]
        #create shortened weights
        np.savetxt(snakemake.params.weights_short, weight_data[::stride_short])
else:
    # load shortened weights for GaMD
    if snakemake.params.method != "cMD":
        weight_data = np.loadtxt(snakemake.params.weights_short)

# this determines a cutoff for when we consider cis/trans conformers separately.
# only relevant if 2 sets of NOE values present.
# t.n_frames / 1000 -> 0.1% of frames need to be cis/trans to consider both forms.
CIS_TRANS_CUTOFF = int(t.n_frames / 1000)

However, for some of the analysis steps below, only 1% of these frames have been used to ensure better rendering in the browser.

Loading BokehJS ...

2.17.3. Convergence of the simulation#

2.17.3.1. RMSD#

To check for convergence of the simulation, we can look at the root mean squared deviation of the atomic positions over the course of the simulation.

# compute rmsd for different atom types
rmsds = md.rmsd(t, t, 0) * 10
bo = topo.topology.select("protein and (backbone and name O)")
ca = topo.topology.select("name CA")
rmsds_ca = md.rmsd(t, t, 0, atom_indices=ca) * 10  # Convert to Angstrom!
rmsds_bo = md.rmsd(t, t, 0, atom_indices=bo) * 10  # Convert to Angstrom!

rmsds = rmsds[::100]
rmsds_ca = rmsds_ca[::100]
rmsds_bo = rmsds_bo[::100]

# Create x data (simulation time)
x = [x / len(rmsds_ca) * simtime for x in range(0, len(rmsds_ca))]

# Make plot
fig = figure(
    plot_width=600,
    plot_height=400,
    title="RMSD of different atom types",
    x_axis_label="Simulation time in ns",
    y_axis_label="RMSD in angstrom, relative to first frame",
    sizing_mode="stretch_width",
    toolbar_location=None,
)
fig.line(
    x,
    rmsds,
    line_width=2,
    line_alpha=0.6,
    legend_label="all atoms",
    color="black",
    muted_alpha=0.1,
)
fig.line(
    x,
    rmsds_ca,
    line_width=2,
    line_alpha=0.6,
    legend_label="C-alpha atoms",
    color="blue",
    muted_alpha=0.1,
)
fig.line(
    x,
    rmsds_bo,
    line_width=2,
    line_alpha=0.6,
    legend_label="backbone O atoms",
    color="orange",
    muted_alpha=0.1,
)
fig.legend.click_policy = "mute"  #'hide'
show(fig)
# TODO: save rmsds as png, instead of manual screenshot https://docs.bokeh.org/en/latest/docs/user_guide/export.html

2.17.3.2. Dihedral angles#

if multi is not None:
    multi = {v: k for k, v in multi.items()}
    multiple = True
    distinction = compound.distinction
    print("Multiple compounds detected")
else:
    multiple = False
    pickle_dump(snakemake.output.multiple, multiple)
if multiple:  # if Compound.cistrans:
    ca_c = t.top.select(f"resid {distinction[0]} and name CA C")
    n_ca_next = t.top.select(f"resid {distinction[1]} and name N CA")
    omega = np.append(ca_c, n_ca_next)
    t_omega_rad = md.compute_dihedrals(t, [omega])
    t_omega_deg = np.abs(np.degrees(t_omega_rad))
    plt.plot(t_omega_deg)
    plt.hlines(90, 0, t.n_frames, color="red")
    plt.xlabel("Frames")
    plt.ylabel("Omega 0-1 [°]")
    plt.title(f"Dihedral angle over time. Compound {compound_index}")
    cis = np.where(t_omega_deg <= 90)[0]
    trans = np.where(t_omega_deg > 90)[0]
    pickle_dump(snakemake.output.multiple, (cis, trans))
    # t[trans]
# TODO: save dihedrals as png

resnames = []
for i in range(0, t.n_residues):
    resnames.append(t.topology.residue(i))

*_, omega = src.dihedrals.getDihedrals(t)
omega_deg = np.abs(np.degrees(omega))

omega_deg = omega_deg[::100]

simtime = float(snakemake.wildcards.time)

colors = src.utils.color_cycle()

# Create x data (simulation time)
x = [x / len(omega_deg) * simtime for x in range(0, len(omega_deg))]

# Make plot
fig = figure(
    plot_width=600,
    plot_height=400,
    title="Omega dihedral angles over time",
    x_axis_label="Simulation time in ns",
    y_axis_label="Dihedral angle in ˚",
    sizing_mode="stretch_width",
    toolbar_location=None,
)

for res, i, col in zip(resnames, range(len(resnames)), colors):
    fig.line(
        x,
        omega_deg[:, i],
        line_width=2,
        line_alpha=0.6,
        legend_label=str(res),
        color=col,
        muted_alpha=0.1,
    )

fig.legend.click_policy = "mute"  #'hide'
show(fig)

# Compute dihedral angles [Phi] [Psi] [Omega]
phi, psi, omega = src.dihedrals.getDihedrals(t)
if beta_run:
    # Print mean of dihedral angles [Phi] [Psi] [Omega]
    print(
        np.degrees(src.dihedrals.angle_mean(phi)),
        np.degrees(src.dihedrals.angle_mean(psi)),
        np.degrees(src.dihedrals.angle_mean(omega)),
    )

# Plot ramachandran plot for each amino acid
if beta_run:
    fig, axs = plt.subplots(int(np.ceil(len(phi.T) / 5)), 5, sharex="all", sharey="all")
    fig.set_size_inches(16, 4)
    motives = []
    i = 0
    for phi_i, psi_i in zip(np.degrees(phi.T), np.degrees(psi.T)):
        weights_phi_psi = reweight(
            np.column_stack((phi_i, psi_i)),
            None,
            "amdweight_MC",
            weight_data,
        )
        axs.flatten()[i].scatter(
            phi_i, psi_i, s=0.5, c=weights_phi_psi, vmin=0, vmax=8, cmap="Spectral_r"
        )
        axs.flatten()[i].set_title(i)
        motives.append(src.dihedrals.miao_ramachandran(phi_i, psi_i))
        i += 1
    fig.show()

if beta_run:
    # compute most common motives
    combined_motives = np.column_stack((motives))
    combined_motives = ["".join(test) for test in combined_motives]
    from collections import Counter

    c = Counter(combined_motives)
    motive_percentage = [
        (i, c[i] / len(combined_motives) * 100.0) for i, count in c.most_common()
    ]
    # 10 most common motives and percentage of frames
    print(motive_percentage[:10])

if beta_run:
    # Get indicies of most common motives
    combined_motives = np.array(combined_motives)
    idxs = []
    values = [i[0] for i in c.most_common(10)]
    for i, v in enumerate(values):
        idxs.append(np.where(combined_motives == v)[0])

2.17.4. Dimensionality Reductions#

The simulation trajectories contain the positions of all atoms. This high dimensional data (3*N_atoms) is too complicated to analyse by itself. To get a feeling of the potential energy landscape we need to apply some kind of dimensionality reduction. Here, we apply the PCA (Principal Component Analysis) method.

2.17.4.1. Cartesian PCA#

Details about cartesian PCA

c_pca, reduced_cartesian = src.pca.make_PCA(t, "cartesian")

# reweighting:
if snakemake.params.method == "cMD":
    c_weights = reweight(reduced_cartesian, None, "noweight")
else:
    c_weights = reweight(
        reduced_cartesian, None, "amdweight_MC", weight_data
    )

if multiple:
    fig, axs = plt.subplots(1, 2, sharex="all", sharey="all", figsize=(6.7323, 3.2677))
    axs[0] = src.pca.plot_PCA(
        reduced_cartesian,
        "cartesian",
        compound_index,
        c_weights,
        "Energy [kcal/mol]",
        fig,
        axs[0],
        explained_variance=c_pca.explained_variance_ratio_[:2],
    )
    axs[1] = src.pca.plot_PCA_citra(
        reduced_cartesian[cis],
        reduced_cartesian[trans],
        "cartesian",
        compound_index,
        [multi["cis"] + " (cis)", multi["trans"] + " (trans)"],
        fig,
        axs[1],
    )

else:
    fig, ax = plt.subplots(figsize=(3.2677, 3.2677))
    ax = src.pca.plot_PCA(
        reduced_cartesian,
        "cartesian",
        compound_index,
        c_weights,
        "Energy [kcal/mol]",
        fig,
        ax,
        explained_variance=c_pca.explained_variance_ratio_[:2],
    )

../_images/28159d44aa267024_GaMD_processed_33_0.png

2.17.4.2. Pairwise distances PCA#

pd_pca, reduced_pd = src.pca.make_PCA(t, "pairwise_N_O")

# reweighting:
if snakemake.params.method == "cMD":
    p_weights = reweight(reduced_pd, None, "noweight")
else:
    p_weights = reweight(
        reduced_pd, None, "amdweight_MC", weight_data
    )

if multiple:
    fig, axs = plt.subplots(1, 2, sharex="all", sharey="all", figsize=(6.7323, 3.2677))
    axs[0] = src.pca.plot_PCA(
        reduced_pd,
        "pairwise",
        compound_index,
        p_weights,
        "Energy [kcal/mol]",
        fig,
        axs[0],
        explained_variance=pd_pca.explained_variance_ratio_[:2],
    )
    axs[1] = src.pca.plot_PCA_citra(
        reduced_pd[cis],
        reduced_pd[trans],
        "pairwise",
        compound_index,
        [multi["cis"] + " (cis)", multi["trans"] + " (trans)"],
        fig,
        axs[1],
    )
else:
    fig, ax = plt.subplots(figsize=(3.2677, 3.2677))
    ax = src.pca.plot_PCA(
        reduced_pd,
        "pairwise",
        compound_index,
        p_weights,
        "Energy [kcal/mol]",
        fig,
        ax,
        explained_variance=pd_pca.explained_variance_ratio_[:2],
    )

../_images/28159d44aa267024_GaMD_processed_35_0.png

2.17.4.3. Dihedral PCA#

pca_d, reduced_dihedrals = src.pca.make_PCA(t, "dihedral")
reduced_dihedrals_full = src.dihedrals.getReducedDihedrals(t)

# save pca object & reduced dihedrals
pickle_dump(snakemake.output.dPCA, pca_d)
pickle_dump(snakemake.output.dihedrals, reduced_dihedrals_full)
if not use_shortened:
    pickle_dump(snakemake.params.dihedrals_short, reduced_dihedrals_full[::stride_short])

# reweighting:
if snakemake.params.method == "cMD":
    d_weights = reweight(reduced_dihedrals, None, "noweight")
else:
    d_weights = reweight(
        reduced_dihedrals, None, "amdweight_MC", weight_data
    )
if multiple:
    fig, axs = plt.subplots(1, 2, sharex="all", sharey="all", figsize=(6.7323, 3.2677))
    axs[0] = src.pca.plot_PCA(
        reduced_dihedrals,
        "dihedral",
        compound_index,
        d_weights,
        "Energy [kcal/mol]",
        fig,
        axs[0],
        explained_variance=pca_d.explained_variance_ratio_[:2],
    )
    axs[1] = src.pca.plot_PCA_citra(
        reduced_dihedrals[cis],
        reduced_dihedrals[trans],
        "dihedral",
        compound_index,
        [multi["cis"] + " (cis)", multi["trans"] + " (trans)"],
        fig,
        axs[1],
    )
    fig.savefig(snakemake.output.pca_dihe, dpi=DPI)
else:
    fig, ax = plt.subplots(figsize=(3.2677, 3.2677))
    ax = src.pca.plot_PCA(
        reduced_dihedrals,
        "dihedral",
        compound_index,
        d_weights,
        "Energy [kcal/mol]",
        fig,
        ax,
        explained_variance=pca_d.explained_variance_ratio_[:2],
    )
    fig.tight_layout()
    fig.savefig(snakemake.output.pca_dihe, dpi=DPI)
final_figure_axs.append(sg.from_mpl(fig, savefig_kw={"dpi": DPI}))
pickle_dump(snakemake.output.dPCA_weights_MC, d_weights)
if not use_shortened:
    pickle_dump(snakemake.params.dPCA_weights_MC_short, d_weights[::stride_short])

../_images/28159d44aa267024_GaMD_processed_37_0.png

if beta_run:
    # Plot structural digits on top of dPCA
    fig, axs = plt.subplots(2, 5, sharex="all", sharey="all")
    fig.set_size_inches(12, 8)
    for i in range(10):
        axs.flatten()[i] = src.pca.plot_PCA(
            reduced_dihedrals,
            "dihedral",
            compound_index,
            d_weights,
            "Energy [kcal/mol]",
            fig,
            axs.flatten()[i],
            cbar_plot="nocbar",
            explained_variance=pca_d.explained_variance_ratio_[:2],
        )
        axs.flatten()[i].scatter(
            reduced_dihedrals[idxs[i]][:, 0],
            reduced_dihedrals[idxs[i]][:, 1],
            label=values[i],
            s=0.2,
            marker=".",
            color="black",
        )
        axs.flatten()[i].set_title(f"{values[i]}: {motive_percentage[i][1]:.2f}%")
    fig.tight_layout()

2.17.4.4. TSNE#

# TSNE dimensionality reduction
# TSNE
if not use_shortened:
    plot_stride = 100
else:
    plot_stride = 1
cluster_stride = plot_stride  # 125 previously
dihe = src.dihedrals.getReducedDihedrals(t)
tsne = TSNE(n_components=2, verbose=0, perplexity=50, n_iter=2000, random_state=42)
tsne_results = tsne.fit_transform(dihe[::cluster_stride, :])  # 250
plt.scatter(tsne_results[:, 0], tsne_results[:, 1])
plt.xlabel("t-SNE dimension 1")
plt.ylabel("t-SNE dimension 2")
plt.show()

/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,

../_images/28159d44aa267024_GaMD_processed_41_1.png

2.17.4.5. Shape analysis - principal moments of inertia#

inertia_tensor = md.compute_inertia_tensor(t)
principal_moments = np.linalg.eigvalsh(inertia_tensor)

# Compute normalized principal moments of inertia
npr1 = principal_moments[:, 0] / principal_moments[:, 2]
npr2 = principal_moments[:, 1] / principal_moments[:, 2]
mol_shape = np.stack((npr1, npr2), axis=1)

# Reweighting
if snakemake.params.method == "cMD":
    mol_shape_weights = reweight(mol_shape, None, "noweight")
else:
    mol_shape_weights = reweight(
        mol_shape, None, "amdweight_MC", weight_data
    )

# save
pickle_dump(snakemake.output.NPR_shape_data, mol_shape)
pickle_dump(snakemake.output.NPR_shape_weights, mol_shape_weights)

# Plot
x = mol_shape[:, 0]
y = mol_shape[:, 1]
v = mol_shape_weights
# create a triangulation out of these points
T = tri.Triangulation(x, y)

fig, ax = plt.subplots(figsize=(3.2677, 3.2677))

# plot the contour
# plt.tricontourf(x,y,T.triangles,v)
scat = ax.scatter(
    mol_shape[:, 0],
    mol_shape[:, 1],
    s=0.5,
    c=mol_shape_weights,
    cmap="Spectral_r",
    vmin=0,
    vmax=8,
    rasterized=True,
)

# create the grid
corners = np.array([[1, 1], [0.5, 0.5], [0, 1]])
triangle = tri.Triangulation(corners[:, 0], corners[:, 1])

# creating the outline
refiner = tri.UniformTriRefiner(triangle)
outline = refiner.refine_triangulation(subdiv=0)

# creating the outline
refiner = tri.UniformTriRefiner(triangle)
trimesh = refiner.refine_triangulation(subdiv=2)

# plotting the mesh
ax.triplot(trimesh, "--", color="grey")
ax.triplot(outline, "k-")
ax.set_xlabel(r"$I_{1}/I_{3}$")
ax.set_ylabel("$I_{2}/I_{3}$")
ax.text(0, 1.01, "rod")
ax.text(0.75, 1.01, "sphere")
ax.text(0.52, 0.48, "disk")
ax.set_ylim(0.45, 1.05)  # 0.6
ax.set_xlim(-0.05, 1.08) # 1.13
ax.set_aspect(1.88)  # 1.13 / 0.6
ax.set_title('Shape analysis')

colorbar = fig.colorbar(scat, label="Energy [kcal/mol]", fraction=0.046, pad=0.04)

fig.tight_layout()
fig.savefig(snakemake.output.NPR_shape_plot, dpi=DPI)
# final_figure_axs.append(sg.from_mpl(fig, savefig_kw={"dpi": DPI}))

../_images/28159d44aa267024_GaMD_processed_44_0.png

2.17.4.6. Cremer pople analysis#

# load topology reference
# mol_ref = Chem.MolFromPDBFile(pdb_amber, removeHs=False, proximityBonding=True) #removeHs=True, proximityBonding=True)
mol_ref = Chem.MolFromMol2File(
    snakemake.input.ref_mol,
    removeHs=False,
)
mol_ref.RemoveAllConformers()
display(Markdown("Topology Reference:"))
mol_ref

Topology Reference:

../_images/28159d44aa267024_GaMD_processed_50_1.png

mol_ref.GetNumAtoms() == t.n_atoms

True

# Get Bond Set
bonds = []
for bond in mol_ref.GetBonds():
    bonds.append((bond.GetBeginAtom().GetIdx(), bond.GetEndAtom().GetIdx()))

cremerpople_store = []

data = py_rdl.Calculator.get_calculated_result(bonds)

ring_length = []
for urf in data.urfs:
    rcs = data.get_relevant_cycles_for_urf(urf)
    for rc in rcs:
        ring_length.append(
            len(src.Ring_Analysis.Rearrangement(mol_ref, list(rc.nodes)))
        )
max_ring = ring_length.index(max(ring_length))

# for urf in data.urfs:
urf = data.urfs[max_ring]
rcs = data.get_relevant_cycles_for_urf(urf)
for rc in rcs:
    ringloop = src.Ring_Analysis.Rearrangement(
        mol_ref, list(rc.nodes)
    )  # rearrange the ring atom order
    # src.Ring_Analysis.CTPOrder(mol_ref, list(rc.nodes), n_res=t.n_residues) ## this does not work...
    coord = t.xyz[:, ringloop]
    for i in range(t.n_frames):
        ccoord = src.Ring_Analysis.Translate(coord[i])
        qs, angle = src.Ring_Analysis.GetRingPuckerCoords(
            ccoord
        )  # get cremer-pople parameters
        qs.extend([abs(x) for x in angle])
        cremerpople_store.append(qs)  # flatten tuple/list to just 1d list...
        # coord = np.array([mol0.GetConformer(1).GetAtomPosition(atom) for atom in ringloop]) # get current ring atom coordinates
        # ccoord = RA.Translate(coord) # translate ring with origin as cetner
        # cremerpople = RA.GetRingPuckerCoords(ccoord) # get cremer-pople parameters

cremerpople_store = np.array(cremerpople_store)

# from sklearn.preprocessing import normalize

pca = PCA(n_components=2)
pca_input = cremerpople_store.reshape(t.n_frames, len(qs))
# normalize(cremerpople_store.reshape(t.n_frames, len(qs)))

cp_reduced_output = pca.fit_transform(pca_input)

if snakemake.params.method == "cMD":
    cp_weights = reweight(cp_reduced_output, None, "noweight")
else:
    cp_weights = reweight(
        cp_reduced_output, None, "amdweight_MC", weight_data
    )

ax = src.pca.plot_PCA(
    cp_reduced_output,
    "CP",
    compound_index,
    cp_weights,
    explained_variance=pca.explained_variance_ratio_[:2],
)

if multiple:
    src.pca.plot_PCA_citra(
        cp_reduced_output[cis],
        cp_reduced_output[trans],
        "CP",
        compound_index,
        label=None,
        fig=None,
        ax=None,
    )

../_images/28159d44aa267024_GaMD_processed_54_0.png

2.17.4.7. Comparison#

# produce a shared datasource with shared labels
if not use_shortened:
    plot_stride = 100
else:
    plot_stride = 1
reduced_dihedrals_t = reduced_dihedrals[::plot_stride]
reduced_pd_t = reduced_pd[::plot_stride]
mol_shape_t = mol_shape[::plot_stride]

# Either show cremer pople, or show shapes
show_cremer_pople = False

if show_cremer_pople:
    crepop_t = cp_reduced_output[::plot_stride]
    tmp_dict = {
        "dh_pc1": reduced_dihedrals_t[:, 0],
        "dh_pc2": reduced_dihedrals_t[:, 1],
        "pd_pc1": reduced_pd_t[:, 0],
        "pd_pc2": reduced_pd_t[:, 1],
        "tsne1": tsne_results[:, 0],
        "tsne2": tsne_results[:, 1],
        "cp1": crepop_t[:, 0],
        "cp2": crepop_t[:, 1],
    }
else:
    tmp_dict = {
        "dh_pc1": reduced_dihedrals_t[:, 0],
        "dh_pc2": reduced_dihedrals_t[:, 1],
        "pd_pc1": reduced_pd_t[:, 0],
        "pd_pc2": reduced_pd_t[:, 1],
        "tsne1": tsne_results[:, 0],
        "tsne2": tsne_results[:, 1],
        "npr1": mol_shape_t[:, 0],
        "npr2": mol_shape_t[:, 1],
    }
df = pd.DataFrame(tmp_dict)
source = ColumnDataSource(data=df)

# Linked plots in different representations
from bokeh.io import output_file, show
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, Label, LabelSet
from bokeh.plotting import figure
from bokeh.models import BooleanFilter, CDSView

TOOLS = "box_select,lasso_select,reset"

# create a new plot and add a renderer
left = figure(tools=TOOLS, plot_width=300, plot_height=300, title="Dihedral PCA")
left.dot("dh_pc1", "dh_pc2", source=source, selection_color="firebrick")

# create another new plot, add a renderer that uses the view of the data source
right = figure(
    tools=TOOLS, plot_width=300, plot_height=300, title="Pairwise NO distances"
)
right.dot("pd_pc1", "pd_pc2", source=source, selection_color="firebrick")

rightr = figure(
    tools=TOOLS, plot_width=300, plot_height=300, title="TSNE (of dihedral angles)"
)
rightr.dot("tsne1", "tsne2", source=source, selection_color="firebrick")

if show_cremer_pople:
    rightrr = figure(tools=TOOLS, plot_width=300, plot_height=300, title="Cremer-Pople")
    rightrr.dot("cp1", "cp2", source=source, selection_color="firebrick")
else:
    rightrr = figure(tools=TOOLS, plot_width=300, plot_height=300, title="PMI")
    rightrr.dot("npr1", "npr2", source=source, selection_color="firebrick")
    rightrr.line([0.5, 0, 1, 0.5], [0.5, 1, 1, 0.5], line_width=2, color="black")
    rightrr.line(
        [0.45, -0.05, 1.05, 0.45],
        [0.4, 1.1, 1.1, 0.4],
        line_width=2,
        color="white",
        line_alpha=0,
    )

    triangle = ColumnDataSource(
        data=dict(x=[0, 0.83, 0.44], y=[1, 1, 0.45], names=["rod", "sphere", "disk"])
    )

    labels = LabelSet(
        x="x",
        y="y",
        text="names",
        x_offset=0,
        y_offset=0,
        source=triangle,
        render_mode="canvas",
    )

    rightrr.add_layout(labels)

p = gridplot([[left, right, rightr, rightrr]], sizing_mode="stretch_width")
show(p)

2.17.5. DBSCAN-Clustering#

The following section provides details about the performed DBSCAN clustering. Detailed plots about parameter derivation for the clustering are hidden, but can be revealed.

# Derive epsilon for DBSCAN-clustering from data: epsilon = max distance between nearest neighbors
nbrs = NearestNeighbors(n_neighbors=2).fit(tsne_results)
distances, indices = nbrs.kneighbors(tsne_results)
epsilon = distances.max()
distances = np.sort(distances, axis=0)
distances = distances[:, 1]
plt.plot(distances)
plt.title("NN-distances in tsne plot")
plt.ylabel("NN-distance")
plt.show()

../_images/28159d44aa267024_GaMD_processed_61_0.png

# Perform DBSCAN-clustering with varying min_samples parameter
num_clusters = []
num_noise = []
for i in range(0, 200, 1):
    clustering = DBSCAN(eps=epsilon, min_samples=i).fit(tsne_results)
    labels = clustering.labels_
    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    n_noise = list(labels).count(-1)
    num_clusters.append(n_clusters)
    num_noise.append(n_noise)

# Drop all points following the first detection of 0 clusters
num_clusters = np.array(num_clusters)
cutoff = np.argmin(num_clusters > 0)
num_clusters = num_clusters[:cutoff]
# print(num_clusters)

x = np.arange(0, len(num_clusters))

# Fit polynomial to detect right-most plateau
if x.size > 0:
    mymodel = np.poly1d(np.polyfit(x, num_clusters, 8))

    deriv = mymodel.deriv()
    roots = deriv.roots

    # discard complex roots
    r_roots = roots[np.isreal(roots)].real

    # discard negative values
    r_roots = r_roots[r_roots >= 0]

    # discard values greater than x.max()
    r_roots = r_roots[r_roots <= x.max() - 3]

    # Take largest root
    if r_roots != []:
        min_samples = int(r_roots.max())
        print(f"min_samples = {min_samples} was selected as parameter for clustering")
    else:
        min_samples = 15
        print(
            "Caution! min samples parameter was selected as fixed value b/c automatic determination failed. specify the parameter manually in the config!"
        )
else:
    min_samples = 15
# If config overrides, use config value:
if snakemake.wildcards.index in snakemake.config["cluster_conf"]:
    min_samples = int(snakemake.config["cluster_conf"][snakemake.wildcards.index])
    print(
        f"Override: Use min_samples={min_samples} instead of the above determined parameter"
    )

min_samples = 50 was selected as parameter for clustering

/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/ipykernel_launcher.py:18: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.

# Alternative: determine min_samples parameter by deciding for a fix fraction of points to classify as noise
# set cutoff as threshold percent of noise
threshhold = 0.05
num_noise = np.array(num_noise)
min_samples = np.argmin(num_noise < len(tsne_results) * threshhold)
# noise_cutoff
# min_samples = 8
display(f"Override: Use min_samples={min_samples} instead of the above determined parameter")

'Override: Use min_samples=47 instead of the above determined parameter'

plt.scatter(x, num_clusters, label="clustering with different min_sample parm.")
if x.size > 0:
    plt.plot(x, mymodel(x), label="poly-fit")
    plt.vlines(
        min_samples,
        2,
        num_clusters.max(),
        color="red",
        label="selected min_sample paramter",
    )
    plt.plot(x, deriv(x), label="derivative of poly-fit")
plt.legend(loc="lower left")
plt.title("Determining min_samples parameter for clustering")
plt.xlabel("min_samples parameter")
plt.ylabel("Number of clusters observed")
plt.savefig(snakemake.output.cluster_min_samp, dpi=DPI)

../_images/28159d44aa267024_GaMD_processed_65_0.png

plt.plot(num_noise, label="Number of points classified as noise")
plt.xlabel("min_samples parameter")
plt.ylabel("Number of points classified as noise")
plt.title("Number of points classified as noise")
plt.show()

../_images/28159d44aa267024_GaMD_processed_66_0.png

# Perform clustering for selected min_samples parameter
clustering = DBSCAN(eps=epsilon, min_samples=min_samples).fit(tsne_results)

threshhold = 0.01  # 0.05

n_clusters = len(set(clustering.labels_)) - (1 if -1 in clustering.labels_ else 0)
print(f"There are {n_clusters} clusters")

cluster_points = []
cluster_label_filter = []
cluster_percentage = []

cluster_labels_sorted_by_population = list(dict.fromkeys(sorted(clustering.labels_, key=list(clustering.labels_).count, reverse=True)))

plt.figure(figsize=(3.2677, 3.2677))
for cluster in cluster_labels_sorted_by_population:
    if cluster != -1:
        if len(clustering.labels_[clustering.labels_ == cluster]) >= threshhold * len(
            clustering.labels_
        ):
            clus_points = tsne_results[clustering.labels_ == cluster]
            plt.plot(
                clus_points[:, 0],
                clus_points[:, 1],
                marker=".",
                linewidth=0,
                label=f"Cluster {cluster}",
            )
            percentage = len(clustering.labels_[clustering.labels_ == cluster]) / len(
                clustering.labels_
            )
            plt.text(
                x=np.mean(clus_points[:, 0]),
                y=np.mean(clus_points[:, 1]),
                s=f"{cluster}: {percentage*100:0.2f}%",
                verticalalignment="center",
                horizontalalignment="center",
            )
            print(
                f"Cluster {cluster} makes up more than {threshhold * 100}% of points. ({percentage * 100:0.2f} % of total points)"
            )
            cluster_percentage.append(percentage)
            cluster_points.append(clustering.labels_ == cluster)
            cluster_label_filter.append(cluster)
        else:
            clus_points = tsne_results[clustering.labels_ == cluster]
            plt.plot(
                clus_points[:, 0],
                clus_points[:, 1],
                marker=".",
                linewidth=0,
                label=f"Cluster {cluster}",
                alpha=0.1,
            )
            percentage = len(clustering.labels_[clustering.labels_ == cluster]) / len(
                clustering.labels_
            )
            print(
                f"Exlude Cluster {cluster} is less than {threshhold*100}% of points. ({percentage * 100:0.2f} % of total points)"
            )
            plt.plot
    else:
        clus_points = tsne_results[clustering.labels_ == cluster]
        plt.plot(
            clus_points[:, 0],
            clus_points[:, 1],
            marker=".",
            linewidth=0,
            label=f"Noise",
            alpha=0.1,
            color="grey",
        )

        percentage = len(clustering.labels_[clustering.labels_ == cluster]) / len(
            clustering.labels_
        )
        print(f"Noise makes up {percentage * 100:0.2f} % of total points.")

# Shrink current axis by 20%

# plt.legend(loc="center right", bbox_to_anchor=(1.3,0.25))
plt.title(f"Clusters in t-SNE. \n Label: Cluster no. : % of total")
plt.xlabel("t-SNE dimension 1")
plt.ylabel("t-SNE dimension 2")
plt.tight_layout()
plt.savefig(snakemake.output.cluster_plot, dpi=DPI)

There are 30 clusters
Cluster 24 makes up more than 1.0% of points. (8.44 % of total points)
Noise makes up 6.32 % of total points.
Cluster 5 makes up more than 1.0% of points. (6.22 % of total points)
Cluster 4 makes up more than 1.0% of points. (5.76 % of total points)
Cluster 6 makes up more than 1.0% of points. (5.46 % of total points)
Cluster 1 makes up more than 1.0% of points. (4.96 % of total points)
Cluster 16 makes up more than 1.0% of points. (4.26 % of total points)
Cluster 8 makes up more than 1.0% of points. (4.10 % of total points)
Cluster 7 makes up more than 1.0% of points. (4.06 % of total points)
Cluster 13 makes up more than 1.0% of points. (4.06 % of total points)
Cluster 12 makes up more than 1.0% of points. (3.92 % of total points)
Cluster 11 makes up more than 1.0% of points. (3.58 % of total points)
Cluster 28 makes up more than 1.0% of points. (3.34 % of total points)
Cluster 15 makes up more than 1.0% of points. (3.28 % of total points)
Cluster 3 makes up more than 1.0% of points. (3.20 % of total points)
Cluster 0 makes up more than 1.0% of points. (3.18 % of total points)
Cluster 19 makes up more than 1.0% of points. (2.92 % of total points)
Cluster 29 makes up more than 1.0% of points. (2.70 % of total points)
Cluster 27 makes up more than 1.0% of points. (2.52 % of total points)
Cluster 17 makes up more than 1.0% of points. (2.08 % of total points)
Cluster 21 makes up more than 1.0% of points. (2.04 % of total points)
Cluster 18 makes up more than 1.0% of points. (2.02 % of total points)
Cluster 2 makes up more than 1.0% of points. (1.88 % of total points)
Cluster 25 makes up more than 1.0% of points. (1.58 % of total points)
Cluster 10 makes up more than 1.0% of points. (1.52 % of total points)
Cluster 20 makes up more than 1.0% of points. (1.20 % of total points)
Cluster 22 makes up more than 1.0% of points. (1.14 % of total points)
Cluster 9 makes up more than 1.0% of points. (1.12 % of total points)
Cluster 14 makes up more than 1.0% of points. (1.08 % of total points)
Cluster 23 makes up more than 1.0% of points. (1.08 % of total points)
Exlude Cluster 26 is less than 1.0% of points. (0.98 % of total points)

../_images/28159d44aa267024_GaMD_processed_67_1.png

plt.figure(figsize=(3.2677, 3.2677))
plt.plot(clustering.labels_, marker=1, linewidth=0)
plt.title("Clusters over time (-1: noise)")
plt.xlabel("Snapshot")
plt.ylabel("Cluster no.")
plt.savefig(snakemake.output.cluster_time, dpi=DPI)

../_images/28159d44aa267024_GaMD_processed_68_0.png

# Find cluster points in original trajectory, compute average structure,
# then find closest (min-rmsd) cluster structure to this
reduced_ind = np.arange(0, len(dihe), cluster_stride)
reduced_g_dihe = dihe[reduced_ind, :]
cluster_min_pca = []
cluster_index = []
mol_shape_cluster = []

t0 = t[0].time
dt = t.timestep

for i, cluster_name in zip(cluster_points, cluster_label_filter):

    # cluster points in original trajectory
    indices = reduced_ind[i]
    avg_struct = np.mean(t[indices].xyz, axis=0)
    avg_t = md.Trajectory(xyz=avg_struct, topology=None)

    # compute average dihedral angles for each cluster:
    phi, psi, omega = src.dihedrals.getDihedrals(t[indices])
    print(
        np.degrees(src.dihedrals.angle_mean(phi)),
        np.degrees(src.dihedrals.angle_mean(psi)),
        np.degrees(src.dihedrals.angle_mean(omega)),
    )

    # find min-RMSD structure to the average
    rmsd = md.rmsd(t[indices], avg_t, 0)
    min_rmsd_idx = np.where(rmsd == rmsd.min())
    cluster_min = t[indices][min_rmsd_idx]
    cluster_index.append(int((cluster_min.time - t0) / dt))
    print(
        f"Cluster {cluster_name}: Closest min structure is frame {int((cluster_min.time - t0) / dt)} (time: {float(cluster_min.time)})"
    )
    # Compute dihedrals of min-RMSD cluster structure, and transform to PCA
    cluster_min = src.dihedrals.getReducedDihedrals(cluster_min)
    cluster_min_pca.append(pca_d.transform(cluster_min))
    
    # Compute shape
    inertia_tensor_cluster = md.compute_inertia_tensor(t[indices][min_rmsd_idx])
    principal_moments_cluster = np.linalg.eigvalsh(inertia_tensor_cluster)

    # Compute normalized principal moments of inertia
    npr1_cluster = principal_moments_cluster[:, 0] / principal_moments_cluster[:, 2]
    npr2_cluster = principal_moments_cluster[:, 1] / principal_moments_cluster[:, 2]
    mol_shape_cluster.append(np.stack((npr1_cluster, npr2_cluster), axis=1))

[ -73.14119  -75.71519 -122.07714  -68.31255  -76.18559 -103.20332
  -38.91143  -64.52563 -104.8221  -161.47319] [ -3.8667078 -25.605879  139.0887    -30.2004    -31.365112    2.2837873
 -32.23091   -30.895123  -21.641815  128.4833   ] [ 177.12692  177.25035  177.7997   175.81432 -177.6895  -167.81197
  176.99008  177.34323  177.13531 -168.99701]
Cluster 24: Closest min structure is frame 291700 (time: 1218804.0)
[ -73.80773   -99.3768     18.5107   -118.155876  -76.89821  -117.802505
  -59.949238  -76.18479   -99.8898    132.10962 ] [ -12.764251    23.994259   -13.6840315  152.19179     17.534044
  139.04498    -27.276167   -34.840557   -11.766883  -148.65732  ] [ 178.1282  -174.5051   175.26662 -179.2083   176.99008 -175.16928
  174.98679 -176.21016  175.97    -176.60329]
Cluster 5: Closest min structure is frame 140000 (time: 612004.0)
[ -67.11648   -89.919525 -112.6645    -78.36043   -89.02935  -133.07469
  -62.908997  -74.67566   -96.85155   -44.021362] [ -25.637003  -22.07053   110.154854  -22.361904  -36.36295   142.72577
  -27.263166  -33.932148   17.29991  -146.99536 ] [ 175.09988  176.78389  177.7684  -178.70024  176.36182 -177.79982
  177.50238 -176.19244  178.30621 -178.1161 ]
Cluster 4: Closest min structure is frame 391100 (time: 1616404.0)
[-111.12001   -69.79914   -87.51579  -104.744606 -138.04433  -131.1221
  -58.7457    -68.14246  -100.56033   142.59311 ] [127.40683   -32.433884  -28.7495    -40.20102   -88.961266  149.36691
 -29.386683  -32.75964    -5.2446313 -75.62164  ] [ 177.4782   173.89902  177.10777 -177.63503 -176.43765  179.60672
  174.2115   179.59956  177.19641  179.94499]
Cluster 6: Closest min structure is frame 133100 (time: 584404.0)
[-104.819664  -81.77574   -56.17303   -75.71229  -107.02371   -66.68666
  -91.630264  -69.32767  -120.90248  -176.34087 ] [ -20.136818   141.86324    -34.8096     -21.44156    -17.797695
  131.042      -24.370846   -39.033348    -6.9844317 -154.10513  ] [ 175.00539   -167.49084    175.94794   -178.08101    174.35732
    1.6001023 -167.86342   -174.73056   -175.86769   -174.22609  ]
Cluster 1: Closest min structure is frame 21400 (time: 137604.0)
[ -22.074615  -68.25397  -117.85649   -64.1723    -80.64086   -93.36235
  -88.98432   -46.31416   -86.23672  -131.91359 ] [ 121.20602   -23.33006   140.88223   -28.362732  -17.802954 -150.43721
  133.12033   -29.14707   -17.14093   -41.92365 ] [ 175.93265  174.64899  179.88004  176.37885  179.57205 -176.92194
 -172.49701  177.27126  176.74452 -179.33069]
Cluster 16: Closest min structure is frame 410400 (time: 1693604.0)
[-115.065895  -60.685177  -61.214016  -99.840675   60.63179  -118.33484
  -59.297234  -69.46425   -96.337326  133.761   ] [135.60559   -28.865057  -24.231491    7.7126746   2.369428  130.15906
 -30.293135  -33.901955   -8.620592  -42.185787 ] [ 176.43884  174.53424  173.36562 -179.01018 -177.31113 -177.49608
  175.14296 -177.98062  177.26897 -176.06506]
Cluster 8: Closest min structure is frame 149500 (time: 650004.0)
[-113.00905   -57.70171   -73.783966 -109.05545   -74.064575  -93.891945
  -59.699265  -67.836624  -96.22425    83.7582  ] [149.51071  -24.907572 -14.351663 135.83751  -14.615072 133.1023
 -25.289358 -29.911566 -18.211458  12.649303] [-179.61824  178.21927  179.84685 -173.45755  177.33456 -174.65285
  176.73238 -179.5915   177.00642 -176.37537]
Cluster 7: Closest min structure is frame 92400 (time: 421604.0)
[ -76.36029  -103.21789   -80.08157   -70.035736  -61.952934  -73.18245
 -106.225525   54.284332 -106.241165   35.36924 ] [-2.16704712e+01  1.02310646e+02  7.51095191e-02  1.38854126e+02
 -2.97305832e+01 -2.15370502e+01 -4.95312500e+01  2.53781147e+01
  1.39104828e+02 -1.00260048e+02] [ 179.95876 -175.90005  177.06885 -163.4224   176.0032  -179.68343
  178.60527 -179.9664   178.61993  176.88043]
Cluster 13: Closest min structure is frame 452000 (time: 1860004.0)
[ -87.94294   -87.73517   -66.170235 -103.839615  -65.94145   -79.32024
 -110.90969   -87.126366  -67.28514    66.941216] [  14.571175  134.99205    19.562002  120.0309    -29.279528  -25.115416
   99.29118   104.996254   60.976734 -133.46878 ] [-179.59178 -177.61124  175.27979 -170.36089  176.27672 -178.25467
 -177.29602  178.97772 -179.41202  179.05989]
Cluster 12: Closest min structure is frame 449300 (time: 1849204.0)
[ -74.19399   -79.918045  -93.68088   -64.86083   -74.94685  -115.56253
 -118.452126  -92.31028  -127.7849     81.70461 ] [  30.412235  144.1757    124.73195   -29.786613  -30.728394  -47.50895
  124.90518    -5.978231  144.47603  -132.9193  ] [ 179.37122  170.72595 -174.41434  176.36928 -179.57304 -178.97548
 -179.1894  -177.89113  178.42892  177.39642]
Cluster 11: Closest min structure is frame 415600 (time: 1714404.0)
[ -79.56136  -111.13097   -86.661835  -80.95717   -60.67742   -73.21431
 -107.129875  -68.24898   -95.78636   150.02245 ] [ -14.517522  148.58488   -10.017265  137.58939   -34.895702  -20.444504
  -14.276252  143.52643   150.50568  -143.78688 ] [-178.40454    177.41814    178.02647   -168.14027    177.39871
 -178.12744    179.3407       1.9924214 -179.17421   -179.53854  ]
Cluster 28: Closest min structure is frame 475900 (time: 1955604.0)
[ -82.11863   -80.71265  -125.19814  -134.91216   -80.72413   -71.68381
  -83.42819   -77.195755 -103.74565    89.14588 ] [  69.75157   123.01525   -37.684593  -18.565002  151.8591    -29.155811
  114.45532    12.008615  109.001625 -129.69337 ] [-176.39857  170.85564 -178.2195  -172.53853  172.35353  174.44743
 -171.04393 -179.00928  179.06258  176.45398]
Cluster 15: Closest min structure is frame 428100 (time: 1764404.0)
[ -65.494446 -105.00088    35.82634   -98.41767   -73.02687    50.472504
   52.84945  -136.50056  -108.30372    85.91149 ] [ -36.42529    26.35544    50.315155  -34.944813  122.428375   26.698341
   25.062243  -39.13706   159.1124   -178.701   ] [ 170.99142 -173.45917 -171.42316  178.9827  -166.65839 -168.02498
 -173.2153  -165.69594 -176.55745 -174.28705]
Cluster 3: Closest min structure is frame 51500 (time: 258004.0)
[ -83.59447   -68.6274    -92.51053   -79.6184   -109.40358   -70.7761
  -88.60169   -60.523087 -112.53234   138.48322 ] [ 1.3563107e+02  1.3728575e+02 -2.3669405e+01 -3.3105946e+01
  3.9539374e-02  1.3665724e+02 -3.0625378e+01 -4.1764790e+01
 -1.0363207e+01 -9.1099739e+01] [ 175.93654       0.18000373 -164.12321    -176.1162      173.16608
   -0.6832617  -165.01482    -177.05345     175.22005    -167.06224   ]
Cluster 0: Closest min structure is frame 9800 (time: 91204.0)
[  63.416214 -133.2089    -74.25613   -95.86605   -58.67264   -86.80105
   56.41544  -103.569695 -118.71122  -162.84224 ] [-32.662292 161.1003   122.65686  154.05766  -35.509575  49.245975
  14.762199 157.14734  143.31679   86.8892  ] [ 168.24342  173.3695   178.81706 -174.61841  174.68579  175.09592
  176.01562  173.1692   170.83258 -163.39995]
Cluster 19: Closest min structure is frame 235800 (time: 995204.0)
[ -67.98594   -99.800575   51.97364  -100.25509   -65.36817   -70.83611
  -96.74428    36.64749  -135.14018    75.22832 ] [ -30.179152     0.9908711   47.552498   134.7282     -31.252922
  -27.063063   -19.326101    74.67202    -33.338398  -166.23627  ] [ 178.23839   179.07118  -179.05403  -171.81488   174.08916   177.9159
 -178.69707    12.031762 -178.27972  -177.28339 ]
Cluster 29: Closest min structure is frame 486600 (time: 1998404.0)
[-106.79672    51.121338  -86.88942   -67.102     -76.92858  -127.807
   42.206837  -76.935326  -55.50268    79.565346] [-25.406656   28.456036  138.51805   -19.570057  -28.820852  -17.66464
  22.829906  146.02405   135.08243   -14.2330885] [ 178.97209  179.10852 -175.62334  173.85725 -174.91415  178.54639
  178.91168  178.38646 -177.96301  178.28267]
Cluster 27: Closest min structure is frame 319500 (time: 1330004.0)
[ -74.660805 -100.82674  -102.07034   -69.093605  -84.04957   -94.49347
  -64.75948   -88.95976   -56.193764   99.128944] [ -34.006733 -147.4991    157.05487   -16.77013    -8.868133  154.64781
  -31.945002   51.3073    146.33989   -64.90253 ] [ 166.84505  175.969    179.55896  174.46107  179.85155  174.42052
  170.92867 -173.62871 -177.65417  172.0065 ]
Cluster 17: Closest min structure is frame 213100 (time: 904404.0)
[-65.04011 -85.97982 -82.81019 -89.17985  65.80587 -97.74805  55.94
 -87.41446 -15.89842  90.99306] [-16.330019  26.43144   72.92046  142.63823  -38.271393 -26.149357
  32.9364   143.66905   86.93973  -22.274418] [ 178.10419 -179.45137  177.07487  177.26402  177.40163  174.7866
  173.64928 -179.15794  179.37715 -178.8927 ]
Cluster 21: Closest min structure is frame 256200 (time: 1076804.0)
[ -77.33032  -106.11962   -76.44742  -110.3696    -70.30864  -106.18249
  -67.25462   -92.47304   -69.846436 -163.45808 ] [ -8.163844 147.98541   58.826485 147.27191   38.36042  147.68309
 -17.329315 126.50593  129.60782  176.03342 ] [ 175.01926  179.20602  176.7721   174.18561 -179.25963 -178.69704
  175.57559  178.38054  175.14    -179.82922]
Cluster 18: Closest min structure is frame 334900 (time: 1391604.0)
[ -73.01028   -84.610275  -78.405914  -49.332233  -84.734215  -58.209534
   56.060093  -72.193855 -117.151764  102.014565] [-2.27205429e+01  4.56074295e+01  1.22424690e+02 -2.19296227e+01
  1.24019997e+02  1.23423836e+02 -9.51037025e+00 -4.45499115e+01
 -8.73466879e-02 -1.53050186e+02] [ 174.15274 -176.43445  174.1129   170.94926 -173.35657 -178.218
 -173.92647 -177.95604 -176.5106  -176.36765]
Cluster 2: Closest min structure is frame 63200 (time: 304804.0)
[ -66.43722   -62.067196  -82.07866    57.093933  -95.213684 -124.623184
  -60.956295  -80.6458    -94.677376  169.404   ] [-26.46358  -29.936264 -16.48979    3.978875 -39.377365  94.51909
 -30.260565 -35.424183 -22.88561   94.97822 ] [ 173.68794  178.18211 -169.848   -177.48535  177.82301 -170.52643
  175.72314 -179.92409  177.29147 -172.22078]
Cluster 25: Closest min structure is frame 299700 (time: 1250804.0)
[ -80.599594   59.158512  -99.56493   -96.6478    -97.08281  -138.09264
  -63.529526  -84.26043   -99.781715   96.1161  ] [  93.94718    29.44612   126.99696    -5.852943  -34.67204   146.49432
  -24.404568  -31.27329   100.05625  -165.96663 ] [-170.86385  177.89374  179.28915 -178.32115  176.04544 -178.36761
  178.37563 -176.26784 -179.94592  173.72609]
Cluster 10: Closest min structure is frame 276500 (time: 1158004.0)
[ -71.82231  -126.22693   -76.537964 -102.67373   -61.760857  -81.942116
   54.723293 -116.330536 -104.843895 -146.57668 ] [ -29.661812  154.50917   122.46703   149.05374   -29.967676   37.17925
   33.18099   147.16809   139.23462  -131.6291  ] [ 167.14006  175.0349   178.9097  -175.02226  174.37602  175.99321
  174.3505   178.99881  171.20187  165.71155]
Cluster 20: Closest min structure is frame 248200 (time: 1044804.0)
[ -63.52467   -88.04496  -102.956375 -105.797226  -94.98016   -74.50699
  -60.367516  -63.821243  -96.94724   177.45424 ] [-29.464909  -25.75046   -39.58464   150.78957   130.31735   147.92755
  -4.6181836 -33.880665  -20.789118  125.75041  ] [ 173.40855   -170.72417   -178.55566    169.46704     -3.0077996
 -175.98929    174.64824    179.01833    178.50714   -168.45859  ]
Cluster 22: Closest min structure is frame 268200 (time: 1124804.0)
[ -61.596916  -98.939156  -91.185    -125.012115   59.096947 -115.69951
  -57.229652  -80.43496  -101.043076  155.87898 ] [-29.437937 -29.742857 -37.823627   8.212163  10.91077  131.93274
 -33.98437  -40.241226 -25.724401 100.65348 ] [ 176.32925 -178.66551 -177.10922 -177.77924 -176.98767 -173.58412
  177.66562 -168.53271  173.80356 -170.10184]
Cluster 9: Closest min structure is frame 116100 (time: 516404.0)
[ -71.24418   -98.36395  -105.97933   -66.18685   -75.7175    -95.57603
 -137.25433   -55.592648  -89.53706  -109.48377 ] [-31.269508 -26.375946  45.291893 -25.893044 -30.835024 -43.354084
 110.07629  -34.52576   14.963694 143.0529  ] [ 176.52809 -179.07436 -170.94447  176.93536  177.19875  179.3897
 -170.42014 -177.92021  177.00357 -174.42664]
Cluster 14: Closest min structure is frame 209700 (time: 890804.0)
[ -77.04937   -13.498522  -91.284836  -69.57975  -123.64579  -125.624275
  -57.914795  -47.789143  -96.35018   130.08398 ] [133.95592     -0.45819178 146.53719    -29.599638    96.690315
 144.75835    -31.613491   -40.3904     -19.202011    51.88331   ] [ 174.58167  -178.87512   174.64297   175.46562    -4.840775 -179.38965
  176.76324   178.8947    178.21829  -179.2095  ]
Cluster 23: Closest min structure is frame 274700 (time: 1150804.0)

# Plot cluster mins in original d-PCA plot
fig, ax = plt.subplots(figsize=(3.2677, 3.2677))
ax = src.pca.plot_PCA(
    reduced_dihedrals,
    "dihedral",
    compound_index,
    d_weights,
    "Energy [kcal/mol]",
    fig,
    ax,
    explained_variance=pca_d.explained_variance_ratio_[:2],
)
# ADD LEGEND ENTRY FOR MD
ax.plot(
    np.array(cluster_min_pca)[:, 0, 0],
    np.array(cluster_min_pca)[:, 0, 1],
    label="Clusters",
    linewidth=0,
    marker="x",
    c='black',
)

ax.legend(["MD","MD Clusters"], framealpha=0.5)

fig.tight_layout()

ax.set_title("Dihedral PCA")
fig.savefig(snakemake.output.cluster_pca, dpi=DPI)
final_figure_axs.append(sg.from_mpl(fig, savefig_kw={"dpi": DPI}))

/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/ipykernel_launcher.py:25: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/ipykernel_launcher.py:28: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/svgutils/transform.py:425: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
  fig.savefig(fid, format="svg", **savefig_kw)
/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/IPython/core/pylabtools.py:151: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
  fig.canvas.print_figure(bytes_io, **kw)

../_images/28159d44aa267024_GaMD_processed_70_1.png

# Plot cluster mins in shape plot
# Plot
x = mol_shape[:, 0]
y = mol_shape[:, 1]
v = mol_shape_weights
# create a triangulation out of these points
T = tri.Triangulation(x, y)

fig, ax = plt.subplots(figsize=(3.2677, 3.2677))

# plot the contour
# plt.tricontourf(x,y,T.triangles,v)
scat = ax.scatter(
    mol_shape[:, 0],
    mol_shape[:, 1],
    s=0.5,
    c=mol_shape_weights,
    cmap="Spectral_r",
    vmin=0,
    vmax=8,
    rasterized=True,
)

# create the grid
corners = np.array([[1, 1], [0.5, 0.5], [0, 1]])
triangle = tri.Triangulation(corners[:, 0], corners[:, 1])

# creating the outline
refiner = tri.UniformTriRefiner(triangle)
outline = refiner.refine_triangulation(subdiv=0)

# creating the outline
refiner = tri.UniformTriRefiner(triangle)
trimesh = refiner.refine_triangulation(subdiv=2)

# plotting the mesh
ax.triplot(trimesh, "--", color="grey")
ax.triplot(outline, "k-")
ax.set_xlabel(r"$I_{1}/I_{3}$")
ax.set_ylabel("$I_{2}/I_{3}$")
ax.text(0, 1.01, "rod")
ax.text(0.75, 1.01, "sphere")
ax.text(0.52, 0.48, "disk")
ax.set_ylim(0.45, 1.05)  # 0.6
ax.set_xlim(-0.05, 1.08) # 1.13
ax.set_aspect(1.88)  # 1.13 / 0.6
ax.set_title('Shape analysis')

ax.plot(
    np.array(mol_shape_cluster)[:, 0, 0],
    np.array(mol_shape_cluster)[:, 0, 1],
    label="Clusters",
    linewidth=0,
    marker="x",
    c='black',
)

# ax.legend(["MD","MD Clusters"], framealpha=0.5)

colorbar = fig.colorbar(scat, label="Energy [kcal/mol]", fraction=0.046, pad=0.04)

fig.tight_layout()
fig.savefig(snakemake.output.NPR_shape_plot, dpi=DPI)
final_figure_axs.append(sg.from_mpl(fig, savefig_kw={"dpi": DPI}))

../_images/28159d44aa267024_GaMD_processed_71_0.png

display(Markdown("Show most extreme structures."))
most_spherical = (x + y).argmax()
most_disk = (x - 0.5 + y - 0.5).argmin()
most_rod = (1 - y + x).argmin()
most_occupied_cluster = cluster_index[0]

fig, axs = plt.subplots(1,4, figsize=(6.7323, 3.2677 / 1.8))
for ax, frame_i, title_i in zip(axs.flatten(), [most_spherical, most_disk, most_rod, most_occupied_cluster], ["most spherical", "most disk-like", "most rod-like", "most pop. cluster"]):
    img = src.utils.pymol_image(t[frame_i], t[most_occupied_cluster])
    ax.imshow(img)
    ax.set_title(title_i)
    ax.axis('off')
fig.tight_layout()
# fig.subplots_adjust(wspace=0, hspace=0)
final_figure_axs.append(sg.from_mpl(fig, savefig_kw={"dpi": DPI}))

Show most extreme structures.

../_images/28159d44aa267024_GaMD_processed_72_1.png

# create a new plot and add a renderer
plot_stride = 10
from bokeh.models import HoverTool

x = np.array(cluster_min_pca)[:, 0, 0]
y = np.array(cluster_min_pca)[:, 0, 1]
percentage = cluster_percentage

data = dict(x=x, y=y, percentage=percentage)


p = figure(
    plot_width=400,
    plot_height=400,
    title="Average cluster structures in PCA space",
    tools="reset",
)
p.dot(
    reduced_dihedrals[::plot_stride, 0],
    reduced_dihedrals[::plot_stride, 1],
    selection_color="firebrick",
    legend_label="simulation frames",
)

hoverable = p.triangle(
    x="x", y="y", source=data, color="firebrick", size=8, legend_label="Clusters"
)
p.add_tools(
    HoverTool(
        tooltips=[("cluster #", "$index"), ("% of total frames", "@percentage{0.0%}")],
        renderers=[hoverable],
    )
)
show(p)

# Interactive viewing of clusters in 3d (needs running jupyter notebook)
cluster_traj = t[cluster_index]
cluster_traj.superpose(
    cluster_traj, 0, atom_indices=cluster_traj.top.select("backbone")
)
view = nv.show_mdtraj(cluster_traj)
view

# save rst files from clusters, account for GaMD equilibration. Does not work with stride.
cluster_full_store = md.load_frame(snakemake.input.traj, cluster_index[0] + frames_start, top=snakemake.input.top)
for idx in cluster_index:
    cluster_full_t = md.load_frame(snakemake.input.traj, idx, top=snakemake.input.top)
    cluster_full_t.save_netcdfrst(
        f"{snakemake.params.rst_dir}rst_{idx}.rst"
    )
    cluster_full_store = cluster_full_store.join(cluster_full_t, discard_overlapping_frames=True)
cluster_full_store.superpose(
    cluster_full_store, 0, atom_indices=cluster_full_t.top.select("backbone")
)
cluster_full_store.save_pdb(snakemake.output.cluster_solvated)

# compute rmsd between clusters
from itertools import combinations

indices = list(combinations(range(cluster_traj.n_frames), 2))

rmsd_backbone = np.zeros((cluster_traj.n_frames, cluster_traj.n_frames))
rmsd = np.zeros((cluster_traj.n_frames, cluster_traj.n_frames))
for i, j in indices:
    rmsd_backbone[i, j] = (
        md.rmsd(
            cluster_traj[i],
            cluster_traj[j],
            atom_indices=cluster_traj.top.select("backbone"),
        )
        * 10
    )
    rmsd[i, j] = md.rmsd(cluster_traj[i], cluster_traj[j]) * 10



sns.set_style("ticks")
fig, axs = plt.subplots(1, 2, figsize=(6.7323, 3.2677))
titles = ["RMSD", "Backbone RMSD"]  # between different clusters
for i, X in enumerate([rmsd, rmsd_backbone]):
    X = X + X.T - np.diag(np.diag(X))
    # get lower diagonal matrix
    X = np.tril(X)
    df = pd.DataFrame(X)
    axs[i] = sns.heatmap(
        df, annot=False, cmap="Greys", ax=axs[i], cbar_kws={"label": r"RMSD in $\AA$"}
    )
    axs[i].set_title(titles[i])
    axs[i].set_xlabel("Cluster no.")
    axs[i].set_ylabel("Cluster no.")
    # ax.invert_xaxis()
fig.tight_layout()
plt.show()

../_images/28159d44aa267024_GaMD_processed_77_0.png

# compute dihedral angles
*_, omega = src.dihedrals.getDihedrals(cluster_traj)
omega_deg = np.abs(np.degrees(omega))
plt.plot(omega_deg)
plt.title(f"Omega angles of different clusters. Compound {compound_index}")
plt.xlabel("Cluster id")
plt.ylabel("Omega dihedral angle [°]")
plt.show()

../_images/28159d44aa267024_GaMD_processed_78_0.png

pymol_script = f"""load {snakemake.output.cluster_pdb}
# inspired by: https://gist.github.com/bobbypaton/1cdc4784f3fc8374467bae5eb410edef
cmd.bg_color("white")
cmd.set("ray_opaque_background", "off")
cmd.set("orthoscopic", 0)
cmd.set("transparency", 0.1)
cmd.set("dash_gap", 0)
cmd.set("ray_trace_mode", 1)
cmd.set("ray_texture", 2)
cmd.set("antialias", 3)
cmd.set("ambient", 0.5)
cmd.set("spec_count", 5)
cmd.set("shininess", 50)
cmd.set("specular", 1)
cmd.set("reflect", .1)
cmd.space("cmyk")

#cmd.cartoon("oval")
cmd.show("sticks")
cmd.show("spheres")
cmd.color("gray85","elem C")
cmd.color("gray98","elem H")
cmd.color("slate","elem N")
cmd.color("red","elem O")
cmd.set("stick_radius",0.07)
cmd.set("sphere_scale",0.18)
cmd.set("sphere_scale",0.13, "elem H")
cmd.set("dash_gap",0.01)
cmd.set("dash_radius",0.07)
cmd.set("stick_color","black")
cmd.set("dash_gap",0.01)
cmd.set("dash_radius",0.035)
cmd.hide("nonbonded")
cmd.hide("cartoon")
cmd.hide("lines")
cmd.orient()
cmd.zoom()
cmd.hide("labels")

cmd.mpng("{snakemake.params.cluster_dir}test_", width=1000, height=1000)

"""
pymol_script_file = f"{snakemake.params.cluster_dir}pym.pml"
with open(pymol_script_file, "w") as f:
    f.write(pymol_script)

# Run pymol to plot clusters
!pymol -c $pymol_script_file

 PyMOL(TM) Molecular Graphics System, Version 2.5.0.
 Copyright (c) Schrodinger, LLC.
 All Rights Reserved.
 
    Created by Warren L. DeLano, Ph.D. 
 
    PyMOL is user-supported open-source software.  Although some versions
    are freely available, PyMOL is not in the public domain.
 
    If PyMOL is helpful in your work or study, then please volunteer 
    support for our ongoing efforts to create open and affordable scientific
    software by purchasing a PyMOL Maintenance and/or Support subscription.

    More information can be found at "http://www.pymol.org".
 
    Enter "help" for a list of commands.
    Enter "help <command-name>" for information on a specific command.

 Hit ESC anytime to toggle between text and graphics.

 Detected 24 CPU cores.  Enabled multithreaded rendering.
PyMOL>load data/processed/refactor-test/results/55/H2O/GaMD/2000/0/28159d44aa267024_clusters/clusters.pdb
 ObjectMolecule: Read crystal symmetry information.
 ObjectMoleculeReadPDBStr: read MODEL 1
 ObjectMoleculeReadPDBStr: read MODEL 2
 ObjectMoleculeReadPDBStr: read MODEL 3
 ObjectMoleculeReadPDBStr: read MODEL 4
 ObjectMoleculeReadPDBStr: read MODEL 5
 ObjectMoleculeReadPDBStr: read MODEL 6
 ObjectMoleculeReadPDBStr: read MODEL 7
 ObjectMoleculeReadPDBStr: read MODEL 8
 ObjectMoleculeReadPDBStr: read MODEL 9
 ObjectMoleculeReadPDBStr: read MODEL 10
 ObjectMoleculeReadPDBStr: read MODEL 11
 ObjectMoleculeReadPDBStr: read MODEL 12
 ObjectMoleculeReadPDBStr: read MODEL 13
 ObjectMoleculeReadPDBStr: read MODEL 14
 ObjectMoleculeReadPDBStr: read MODEL 15
 ObjectMoleculeReadPDBStr: read MODEL 16
 ObjectMoleculeReadPDBStr: read MODEL 17
 ObjectMoleculeReadPDBStr: read MODEL 18
 ObjectMoleculeReadPDBStr: read MODEL 19
 ObjectMoleculeReadPDBStr: read MODEL 20
 ObjectMoleculeReadPDBStr: read MODEL 21
 ObjectMoleculeReadPDBStr: read MODEL 22
 ObjectMoleculeReadPDBStr: read MODEL 23
 ObjectMoleculeReadPDBStr: read MODEL 24
 ObjectMoleculeReadPDBStr: read MODEL 25
 ObjectMoleculeReadPDBStr: read MODEL 26
 ObjectMoleculeReadPDBStr: read MODEL 27
 ObjectMoleculeReadPDBStr: read MODEL 28
 ObjectMoleculeReadPDBStr: read MODEL 29
 CmdLoad: "" loaded as "clusters".
PyMOL>cmd.bg_color("white")
PyMOL>cmd.set("ray_opaque_background", "off")
PyMOL>cmd.set("orthoscopic", 0)
PyMOL>cmd.set("transparency", 0.1)
PyMOL>cmd.set("dash_gap", 0)
PyMOL>cmd.set("ray_trace_mode", 1)
PyMOL>cmd.set("ray_texture", 2)
PyMOL>cmd.set("antialias", 3)
PyMOL>cmd.set("ambient", 0.5)
PyMOL>cmd.set("spec_count", 5)
PyMOL>cmd.set("shininess", 50)
PyMOL>cmd.set("specular", 1)
PyMOL>cmd.set("reflect", .1)
PyMOL>cmd.space("cmyk")
 Color: loaded table '/biggin/b147/univ4859/research/snakemake_conda/b998fbb8f687250126238eb7f5e2e52c/lib/python3.7/site-packages/pymol/pymol_path/data/pymol/cmyk.png'.
PyMOL>cmd.show("sticks")
PyMOL>cmd.show("spheres")
PyMOL>cmd.color("gray85","elem C")
PyMOL>cmd.color("gray98","elem H")
PyMOL>cmd.color("slate","elem N")
PyMOL>cmd.color("red","elem O")
PyMOL>cmd.set("stick_radius",0.07)
PyMOL>cmd.set("sphere_scale",0.18)
PyMOL>cmd.set("sphere_scale",0.13, "elem H")
PyMOL>cmd.set("dash_gap",0.01)
PyMOL>cmd.set("dash_radius",0.07)
PyMOL>cmd.set("stick_color","black")
PyMOL>cmd.set("dash_gap",0.01)
PyMOL>cmd.set("dash_radius",0.035)
PyMOL>cmd.hide("nonbonded")
PyMOL>cmd.hide("cartoon")
PyMOL>cmd.hide("lines")
PyMOL>cmd.orient()
PyMOL>cmd.zoom()
PyMOL>cmd.hide("labels")
PyMOL>cmd.mpng("data/processed/refactor-test/results/55/H2O/GaMD/2000/0/28159d44aa267024_clusters/test_", width=1000, height=1000)
 Movie: frame    1 of   29, 0.90 sec. (0:00:26 - 0:00:26 to go).
 Movie: frame    2 of   29, 0.91 sec. (0:00:25 - 0:00:25 to go).
 Movie: frame    3 of   29, 0.90 sec. (0:00:24 - 0:00:24 to go).
 Movie: frame    4 of   29, 0.90 sec. (0:00:23 - 0:00:23 to go).
 Movie: frame    5 of   29, 0.89 sec. (0:00:22 - 0:00:22 to go).
 Movie: frame    6 of   29, 0.90 sec. (0:00:21 - 0:00:21 to go).
 Movie: frame    7 of   29, 0.89 sec. (0:00:20 - 0:00:20 to go).
 Movie: frame    8 of   29, 0.91 sec. (0:00:19 - 0:00:19 to go).
 Movie: frame    9 of   29, 0.92 sec. (0:00:19 - 0:00:18 to go).
 Movie: frame   10 of   29, 0.90 sec. (0:00:18 - 0:00:18 to go).
 Movie: frame   11 of   29, 0.92 sec. (0:00:17 - 0:00:17 to go).
 Movie: frame   12 of   29, 0.90 sec. (0:00:16 - 0:00:16 to go).
 Movie: frame   13 of   29, 0.89 sec. (0:00:15 - 0:00:15 to go).
 Movie: frame   14 of   29, 0.90 sec. (0:00:14 - 0:00:14 to go).
 Movie: frame   15 of   29, 0.90 sec. (0:00:13 - 0:00:13 to go).
 Movie: frame   16 of   29, 0.94 sec. (0:00:13 - 0:00:12 to go).
 Movie: frame   17 of   29, 0.90 sec. (0:00:11 - 0:00:11 to go).
 Movie: frame   18 of   29, 0.90 sec. (0:00:10 - 0:00:10 to go).
 Movie: frame   19 of   29, 0.90 sec. (0:00:09 - 0:00:09 to go).
 Movie: frame   20 of   29, 0.89 sec. (0:00:08 - 0:00:09 to go).
 Movie: frame   21 of   29, 0.94 sec. (0:00:08 - 0:00:08 to go).
 Movie: frame   22 of   29, 0.90 sec. (0:00:07 - 0:00:07 to go).
 Movie: frame   23 of   29, 0.95 sec. (0:00:06 - 0:00:06 to go).
 Movie: frame   24 of   29, 0.93 sec. (0:00:05 - 0:00:05 to go).
 Movie: frame   25 of   29, 0.90 sec. (0:00:04 - 0:00:04 to go).
 Movie: frame   26 of   29, 0.90 sec. (0:00:03 - 0:00:03 to go).
 Movie: frame   27 of   29, 0.91 sec. (0:00:02 - 0:00:02 to go).
 Movie: frame   28 of   29, 0.90 sec. (0:00:01 - 0:00:01 to go).
 Movie: frame   29 of   29, 0.90 sec. (0:00:00 - 0:00:00 to go).

data = []
cluster_imgs = [
    f"{snakemake.params.cluster_dir}test_{str(i+1).zfill(4)}.png"
    for i in range(cluster_traj.n_frames)
]

[data.append(mpimg.imread(img)) for img in cluster_imgs]
display("Pymol images read")

'Pymol images read'

# get default colors
colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]
# make colors longer if more clusters than colors...
while len(cluster_label_filter) > len(colors):
    colors.extend(colors)
    print("Colors appended..")

fig, axs = plt.subplots(len(cluster_label_filter), 3, sharex="col", squeeze=False)
fig.set_size_inches(12, 3 * len(cluster_label_filter))
# plot cluster images
for i in range(cluster_traj.n_frames):
    # print(f"final {i}")
    axs[i, 0].imshow(data[i])
    axs[i, 0].tick_params(
        axis="both",
        which="both",
        bottom=False,
        top=False,
        left=False,
        labelleft=False,
        labelbottom=False,
    )
    # axs[i,0].tick_params(axis='y', which='both', bottom=False, top=False, labelbottom=False)
# plot corresponding pca's:
for i in range(cluster_traj.n_frames):
    axs[i, 1].scatter(
        reduced_dihedrals[:, 0],
        reduced_dihedrals[:, 1],
        marker=".",
        s=0.5,
        alpha=1,
        c="black",
    )
# add cluster representations
for ii, iii, iiii in zip(
    cluster_min_pca, cluster_label_filter, range(len(cluster_label_filter))
):
    (clus,) = axs[iiii, 1].plot(
        ii[:, 0],
        ii[:, 1],
        marker="^",
        label=f"Cluster {iii}",
        linewidth=0,
        c=colors[iiii],
    )
    # clus.get_color()


# add noe plots
for i, j, k in zip(range(cluster_traj.n_frames), cluster_index, cluster_label_filter):
    NOE = src.noe.read_NOE(snakemake.input.noe)
    if multiple:
        NOE_trans, NOE_cis = NOE
        NOE_cis_dict = NOE_cis.to_dict(orient="index")
        NOE_trans_dict = NOE_trans.to_dict(orient="index")
    else:
        NOE_dict = NOE.to_dict(orient="index")

    current_cluster = cluster_traj[i]
    # print(j)
    if multiple:
        if j in cis:
            # print("cis")
            NOE_dict = NOE_cis_dict
            NOE = NOE_cis
            axs[i, 2].set_title(f"Cluster {k} (cis)")
        else:
            # print("trans!")
            NOE_dict = NOE_trans_dict
            NOE = NOE_trans
            axs[i, 2].set_title(f"Cluster {k} (trans)")
    else:
        axs[i, 2].set_title(f"Cluster {k}")
    NOE["md"], _, _2, NOE_dist, _3 = src.noe.compute_NOE_mdtraj(
        NOE_dict, current_cluster
    )
    # Deal with ambigous NOEs
    NOE = NOE.explode("md")
    # and ambigous/multiple values
    NOE = NOE.explode("NMR exp")
    fig, axs[i, 2] = src.noe.plot_NOE(NOE, fig, axs[i, 2])
fig.tight_layout()
fig.savefig(snakemake.output.cluster_structs)

Colors appended..
Colors appended..
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN

../_images/28159d44aa267024_GaMD_processed_83_1.png

# TODO? cluster NOE statistics....

2.17.6. NOEs#

In the following section, we compute the NOE values for the simulation.

NOE = src.noe.read_NOE(snakemake.input.noe)
NOE_output = {}

2.17.6.1. NOE without reweighting.#

The following NOE plot is computed via \(r^{-6}\) averaging. No reweighting is performed. (so unless the simulation is a conventional MD simulation, the following plot is not a valid comparison to experiment.)

if multiple:
    fig, axs = plt.subplots(2, 1, figsize=(6.7323, 3.2677))
    NOE_trans, NOE_cis = NOE
    NOE_cis_dict = NOE_cis.to_dict(orient="index")
    NOE_trans_dict = NOE_trans.to_dict(orient="index")
    if len(cis) > CIS_TRANS_CUTOFF:
        NOE_cis["md"], _, _2, NOE_dist_cis, _3 = src.noe.compute_NOE_mdtraj(
            NOE_cis_dict, t[cis]
        )

        NOE_output[f"{multi['cis']}"] = NOE_cis.to_dict(orient="index")
        # Deal with ambigous NOEs
        NOE_cis = NOE_cis.explode("md")
        # and ambigous/multiple values
        NOE_cis = NOE_cis.explode("NMR exp")
        fig, axs[1] = src.noe.plot_NOE(NOE_cis, fig, axs[1])
        axs[1].set_title(f"Compound {multi['cis']} (cis)")
    else:
        print("Cis skipped because no frames are cis.")
    if len(trans) > CIS_TRANS_CUTOFF:
        NOE_trans["md"], _, _2, NOE_dist_trans, _3 = src.noe.compute_NOE_mdtraj(
            NOE_trans_dict, t[trans]
        )

        NOE_output[f"{multi['trans']}"] = NOE_trans.to_dict(orient="index")
        # Deal with ambigous NOEs
        NOE_trans = NOE_trans.explode("md")
        # and ambigous/multiple values
        NOE_trans = NOE_trans.explode("NMR exp")

        fig, axs[0] = src.noe.plot_NOE(NOE_trans, fig, axs[0])
        axs[0].set_title(f"Compound {multi['trans']} (trans)")
    else:
        print("Trans skipped because no frames are cis")
else:
    NOE_dict = NOE.to_dict(orient="index")
    NOE["md"], _, _2, NOE_dist, _3 = src.noe.compute_NOE_mdtraj(NOE_dict, t)

    # Save NOE dict
    NOE_output = {f"{compound_index}": NOE.to_dict(orient="index")}
    # Deal with ambigous NOEs
    NOE = NOE.explode("md")
    # and ambigous/multiple values
    NOE = NOE.explode("NMR exp")
    fig, ax = src.noe.plot_NOE(NOE)
    ax.set_title(f"Compound {compound_index}. NOE without reweighting.", y=1.2)
fig.tight_layout()
fig.savefig(snakemake.output.noe_plot, dpi=DPI)
# save as .json file
src.utils.json_dump(snakemake.output.noe_result, NOE_output)

NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN

../_images/28159d44aa267024_GaMD_processed_88_1.png

2.17.6.2. Reweighted NOEs#

The following NOE plot was reweighted via a 1d PMF method.

# 1d PMF reweighted NOEs

NOE_output = {}

if snakemake.params.method != "cMD":
    if multiple:
        fig, axs = plt.subplots(2, 1, figsize=(6.7323, 6.7323))
        NOE_trans, NOE_cis = NOE
        NOE_cis_dict = NOE_cis.to_dict(orient="index")
        NOE_trans_dict = NOE_trans.to_dict(orient="index")
        if len(cis) > CIS_TRANS_CUTOFF:
            (
                NOE_cis["md"],
                NOE_cis["lower"],
                NOE_cis["upper"],
                NOE_dist_cis,
                pmf_plot_cis,
            ) = src.noe.compute_NOE_mdtraj(
                NOE_cis_dict, t[cis],
                reweigh_type=1, slicer=cis, weight_data=weight_data,
            )
            # TODO: this should not give an error!

            NOE_output[f"{multi['cis']}"] = NOE_cis.to_dict(orient="index")

            # Deal with ambigous NOEs
            NOE_cis = NOE_cis.explode(["md", "lower", "upper"])
            # and ambigous/multiple values
            NOE_cis = NOE_cis.explode("NMR exp")
            fig, axs[1] = src.noe.plot_NOE(NOE_cis, fig, axs[1])
            axs[1].set_title(f"Compound {multi['cis']} (cis)")
        else:
            print("Cis skipped because no frames are cis.")
        if len(trans) > CIS_TRANS_CUTOFF:
            (
                NOE_trans["md"],
                NOE_trans["lower"],
                NOE_trans["upper"],
                NOE_dist_trans,
                pmf_plot_trans,
            ) = src.noe.compute_NOE_mdtraj(
                NOE_trans_dict, t[trans],
                reweigh_type=1, slicer=trans, weight_data=weight_data
            )

            NOE_output[f"{multi['trans']}"] = NOE_trans.to_dict(orient="index")
            # Deal with ambigous NOEs
            NOE_trans = NOE_trans.explode(["md", "lower", "upper"])
            # and ambigous/multiple values
            NOE_trans = NOE_trans.explode("NMR exp")
            fig, axs[0] = src.noe.plot_NOE(NOE_trans, fig, axs[0])
            axs[0].set_title(f"Compound {multi['trans']} (trans)")
        else:
            print("Trans skipped because no frames are cis")
        src.utils.json_dump(snakemake.output.noe_result, NOE_output)
        fig.tight_layout()
        fig.savefig(snakemake.output.noe_plot)
    else:
        NOE = src.noe.read_NOE(snakemake.input.noe)
        NOE_dict = NOE.to_dict(orient="index")
        NOE["md"], NOE["lower"], NOE["upper"], _, pmf_plot = src.noe.compute_NOE_mdtraj(
            NOE_dict, t, reweigh_type=1, weight_data=weight_data
        )
        plt.close()
        # Save NOE dict
        NOE_output = {f"{compound_index}": NOE.to_dict(orient="index")}
        # save as .json file
        src.utils.json_dump(snakemake.output.noe_result, NOE_output)

        # Deal with ambigous NOEs
        NOE = NOE.explode(["md", "lower", "upper"])
        # and ambigous/multiple values
        NOE = NOE.explode("NMR exp")
        fig, ax = src.noe.plot_NOE(NOE)
#         ax.set_title(f"Compound {compound_index}. NOE", y=1.5, pad=0)
        fig.tight_layout()
        fig.savefig(snakemake.output.noe_plot, dpi=DPI)
else:
    print("cMD - no reweighted NOEs performed.")
final_figure_axs.append(sg.from_mpl(fig))
pickle_dump(snakemake.output.noe_dist, NOE_dist)

NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN

../_images/28159d44aa267024_GaMD_processed_90_1.png

display(NOE)

	Atom 1	Atom 2	NMR exp	lower bound	upper bound	md	lower	upper
0	(12,)	(33,)	3.00	2.8	3.2	2.649483	2.265108	4.043308
1	(33,)	(44,)	3.60	3.3	3.9	2.452681	2.096983	3.856414
2	(1,)	(163, 164)	2.55	1.9	3.2	2.665066	2.289398	3.322858
2	(1,)	(163, 164)	2.55	1.9	3.2	1.761048	1.716541	2.895134
3	(12,)	(3,)	2.65	2.5	2.8	3.123338	2.360239	3.510066
4	(12,)	(14,)	2.75	2.6	2.9	2.74706	2.373828	2.932866
5	(33,)	(14,)	2.15	2.0	2.3	1.914194	1.849463	3.120772
6	(33,)	(35,)	2.55	2.4	2.7	1.923385	1.884846	2.831929
7	(61,)	(46,)	2.45	2.3	2.6	2.116119	2.010509	3.239585
8	(61,)	(63,)	2.65	2.5	2.8	2.718624	2.348762	2.919661
9	(113,)	(95,)	2.20	1.4	3.0	2.856988	2.277256	3.493369
10	(113,)	(115,)	2.75	2.6	2.9	2.156535	2.086211	2.860166
11	(137,)	(115,)	2.35	2.2	2.5	2.800421	2.281094	3.507687
12	(161,)	(163, 164)	2.70	2.0	3.4	1.944637	1.914197	2.677009
12	(161,)	(163, 164)	2.70	2.0	3.4	2.081861	2.028697	2.789421
13	(12,)	(5, 6)	3.65	3.1	4.2	3.366097	2.764019	4.125228
13	(12,)	(5, 6)	3.65	3.1	4.2	1.778446	1.704837	3.991289
14	(12,)	()	3.85	2.4	5.3	NaN	NaN	NaN
15	(12,)	(16, 17)	3.05	2.5	3.6	2.59479	2.337149	3.347708
15	(12,)	(16, 17)	3.05	2.5	3.6	3.203238	2.519536	3.705234
16	(33,)	(16, 17)	3.35	2.8	3.9	1.768858	1.711582	3.964627
16	(33,)	(16, 17)	3.35	2.8	3.9	3.483828	2.737477	4.239136
17	(33,)	(37, 38)	3.50	3.0	4.0	2.363137	2.203345	3.351527
17	(33,)	(37, 38)	3.50	3.0	4.0	3.454694	2.713928	3.800401
18	(44,)	(37, 38)	3.45	2.9	4.0	4.973459	3.709995	5.026678
18	(44,)	(37, 38)	3.45	2.9	4.0	4.992541	3.831568	5.025888
19	(44,)	(48, 49)	3.55	3.0	4.1	1.750549	1.70783	2.945588
19	(44,)	(48, 49)	3.55	3.0	4.1	2.145843	2.048025	3.324382
20	(44,)	(51, 52)	4.30	3.7	4.9	2.822456	2.337072	4.185563
20	(44,)	(51, 52)	4.30	3.7	4.9	2.647745	2.275963	4.055106
21	(61,)	(65, 66)	3.20	2.7	3.7	2.11739	2.011979	3.039163
21	(61,)	(65, 66)	3.20	2.7	3.7	2.455496	2.278085	3.414909
22	(76,)	(65, 66)	3.55	3.0	4.1	3.474018	2.799152	4.208027
22	(76,)	(65, 66)	3.55	3.0	4.1	2.991009	2.504713	4.074276
23	(76,)	(68, 69)	3.65	1.3	6.0	3.198315	2.631972	4.812148
23	(76,)	(68, 69)	3.65	1.3	6.0	3.491207	2.854136	4.900594
24	(76,)	(80, 81)	2.95	2.4	3.5	3.818314	2.801633	3.980013
24	(76,)	(80, 81)	2.95	2.4	3.5	4.260282	2.850439	4.285403
25	(93,)	(97, 98)	2.95	2.4	3.5	2.459508	2.227164	3.319413
25	(93,)	(97, 98)	2.95	2.4	3.5	3.075716	2.465233	3.723797
26	(113,)	(97, 98)	3.65	1.3	6.0	4.978145	3.787286	5.023971
26	(113,)	(97, 98)	3.65	1.3	6.0	2.077395	1.941941	3.885782
27	(113,)	(117, 118)	3.05	2.5	3.6	3.716046	2.589803	3.915484
27	(113,)	(117, 118)	3.05	2.5	3.6	2.162939	2.067299	3.296566
28	(113,)	(120, 121)	4.15	3.6	4.7	3.359688	2.582107	4.371507
28	(113,)	(120, 121)	4.15	3.6	4.7	2.027542	1.905555	3.911524
29	(137,)	(117, 118)	3.80	3.3	4.3	3.868877	3.057241	4.37445
29	(137,)	(117, 118)	3.80	3.3	4.3	3.663032	2.824351	4.287708
30	(137,)	(141, 142)	3.00	2.5	3.5	2.632384	2.388871	3.242584
30	(137,)	(141, 142)	3.00	2.5	3.5	3.03661	2.541021	3.691158
31	()	(14,)	2.90	0.8	5.0	NaN	NaN	NaN
32	()	(16, 17)	3.25	1.2	5.3	NaN	NaN	NaN
33	()	(97, 98)	3.45	1.5	5.4	NaN	NaN	NaN
34	()	(95,)	3.05	0.8	5.3	NaN	NaN	NaN
35	(156,)	(139,)	3.05	2.9	3.2	2.927862	2.424466	4.613149
36	(145,)	(139,)	3.20	3.0	3.4	3.090609	2.588747	4.391723
37	(156,)	(141, 142)	3.40	2.9	3.9	3.296746	2.65866	4.243307
37	(156,)	(141, 142)	3.40	2.9	3.9	2.782111	2.562516	3.725645
38	(145,)	(141, 142)	3.45	2.9	4.0	2.66025	2.469729	3.59002
38	(145,)	(141, 142)	3.45	2.9	4.0	3.265185	2.699097	3.687947

# matplotlib.rcParams.update(matplotlib.rcParamsDefault)

if snakemake.params.method != "cMD":
    if not multiple:
        pmf_plot.suptitle("NOE PMF plots")
        pmf_plot.tight_layout()
        pmf_plot.savefig(snakemake.output.noe_pmf)
        fig = pmf_plot
    else:
        # save to image data
        io_cis = io.BytesIO()
        io_trans = io.BytesIO()
        if len(cis) > CIS_TRANS_CUTOFF:
            pmf_plot_cis.savefig(io_cis, format="raw", dpi=pmf_plot_cis.dpi)
        if len(trans) > CIS_TRANS_CUTOFF:
            pmf_plot_trans.savefig(io_trans, format="raw", dpi=pmf_plot_trans.dpi)

        if len(cis) > CIS_TRANS_CUTOFF:
            io_cis.seek(0)
            img_cis = np.reshape(
                np.frombuffer(io_cis.getvalue(), dtype=np.uint8),
                newshape=(
                    int(pmf_plot_cis.bbox.bounds[3]),
                    int(pmf_plot_cis.bbox.bounds[2]),
                    -1,
                ),
            )
            io_cis.close()

        if len(trans) > CIS_TRANS_CUTOFF:
            io_trans.seek(0)
            img_trans = np.reshape(
                np.frombuffer(io_trans.getvalue(), dtype=np.uint8),
                newshape=(
                    int(pmf_plot_trans.bbox.bounds[3]),
                    int(pmf_plot_trans.bbox.bounds[2]),
                    -1,
                ),
            )
            io_trans.close()

        fig, axs = plt.subplots(2, 1)
        fig.set_size_inches(16, 30)
        if len(cis) > CIS_TRANS_CUTOFF:
            axs[0].imshow(img_cis)
            axs[0].axis("off")
            axs[0].set_title("cis")
        if len(trans) > CIS_TRANS_CUTOFF:
            axs[1].imshow(img_trans)
            axs[1].set_title("trans")
            axs[1].axis("off")
        # fig.suptitle('PMF plots. PMF vs. distance')
        fig.tight_layout()
        fig.savefig(snakemake.output.noe_pmf, dpi=DPI)
else:
    fig, ax = plt.subplots()
    ax.text(0.5, 0.5, "not applicable.")
    fig.savefig(snakemake.output.noe_pmf, dpi=DPI)
display(fig)

../_images/28159d44aa267024_GaMD_processed_92_0.png

2.17.7. NOE-Statistics#

Following, we compute various statistical metrics to evaluate how the simulated NOEs compare to the experimental ones.

# Compute deviations of experimental NOE values to the MD computed ones
NOE_stats_keys = []
NOE_i = []
NOE_dev = {}

if multiple:
    if len(cis) > CIS_TRANS_CUTOFF:
        NOE_stats_keys.append("cis")
        NOE_i.append(NOE_cis)
    if len(trans) > CIS_TRANS_CUTOFF:
        NOE_stats_keys.append("trans")
        NOE_i.append(NOE_trans)
else:
    NOE_stats_keys.append("single")
    NOE_i.append(NOE)

for k, NOE_d in zip(NOE_stats_keys, NOE_i):
    if (NOE_d["NMR exp"].to_numpy() == 0).all():
        # if all exp values are 0: take middle between upper / lower bound as reference value
        NOE_d["NMR exp"] = (NOE_d["upper bound"] + NOE_d["lower bound"]) * 0.5

    # Remove duplicate values (keep value closest to experimental value)
    NOE_d["dev"] = NOE_d["md"] - np.abs(NOE_d["NMR exp"])
    NOE_d["abs_dev"] = np.abs(NOE_d["md"] - np.abs(NOE_d["NMR exp"]))

    NOE_d = NOE_d.sort_values("abs_dev", ascending=True)
    NOE_d.index = NOE_d.index.astype(int)
    NOE_d = NOE_d[~NOE_d.index.duplicated(keep="first")].sort_index(kind="mergesort")

    NOE_d = NOE_d.dropna()
    NOE_dev[k] = NOE_d

# Compute NOE statistics
NOE_stats = {}

for k in NOE_stats_keys:
    NOE_d = NOE_dev[k]
    NOE_stats_k = pd.DataFrame(columns=["stat", "value", "up", "low"])

    MAE, upper, lower = src.stats.compute_MAE(NOE_d["NMR exp"], NOE_d["md"])
    append = {"stat": "MAE", "value": MAE, "up": upper, "low": lower}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    MSE, upper, lower = src.stats.compute_MSE(NOE_d["dev"])
    append = {"stat": "MSE", "value": MSE, "up": upper, "low": lower}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    RMSD, upper, lower = src.stats.compute_RMSD(NOE_d["NMR exp"], NOE_d["md"])
    append = {"stat": "RMSD", "value": RMSD, "up": upper, "low": lower}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    pearsonr, upper, lower = src.stats.compute_pearsonr(NOE_d["NMR exp"], NOE_d["md"])
    append = {"stat": "pearsonr", "value": pearsonr, "up": upper, "low": lower}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    kendalltau, upper, lower = src.stats.compute_kendalltau(
        NOE_d["NMR exp"], NOE_d["md"]
    )
    append = {"stat": "kendalltau", "value": kendalltau, "up": upper, "low": lower}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    chisq, upper, lower = src.stats.compute_chisquared(NOE_d["NMR exp"], NOE_d["md"])
    append = {"stat": "chisq", "value": chisq, "up": upper, "low": lower}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    fulfilled = src.stats.compute_fulfilled_percentage(NOE_d)
    append = {"stat": "percentage_fulfilled", "value": fulfilled, "up": 0, "low": 0}
    NOE_stats_k = NOE_stats_k.append(append, ignore_index=True)

    NOE_stats[k] = NOE_stats_k

# Compute statistics for most populated cluster
if multiple:
    NOE_stats_keys = ["cis", "trans"]
    differentiation = {"cis": cis, "trans": trans}
else:
    NOE_stats_keys = ["single"]

n_cluster_traj = {}
n_cluster_percentage = {}
n_cluster_index = {}
remover = []
for k in NOE_stats_keys:
    if multiple:
        cluster_in_x = np.in1d(cluster_index, differentiation[k])
        print(cluster_in_x)
        if np.all(cluster_in_x == False):
            # No clusters found for specific cis/trans/other
            remover.append(k)
    else:
        cluster_in_x = np.ones((len(cluster_index)), dtype=bool)
    cluster_in_x = np.arange(0, len(cluster_index))[cluster_in_x]
    n_cluster_traj[k] = cluster_traj[cluster_in_x]
    n_cluster_percentage[k] = np.array(cluster_percentage)[cluster_in_x]
    n_cluster_index[k] = np.array(cluster_index)[cluster_in_x]
cluster_traj = n_cluster_traj
cluster_percentage = n_cluster_percentage
cluster_index = n_cluster_index
[NOE_stats_keys.remove(k) for k in remover]

[]

# Compute statistics for most populated cluster
NOE_dict = {}
NOE = src.noe.read_NOE(snakemake.input.noe)
NOE_n = {}
if multiple:
    NOE_trans, NOE_cis = NOE
    NOE_n["cis"] = NOE_cis
    NOE_n["trans"] = NOE_trans
    NOE_dict["cis"] = NOE_cis.to_dict(orient="index")
    NOE_dict["trans"] = NOE_trans.to_dict(orient="index")
else:
    NOE_dict["single"] = NOE.to_dict(orient="index")
    NOE_n["single"] = NOE


for k in NOE_stats_keys:
    # max. populated cluster
    # NOE = NOE_n.copy()
    max_populated_cluster_idx = np.argmax(cluster_percentage[k])
    max_populated_cluster = cluster_traj[k][max_populated_cluster_idx]
    NOE_n[k]["md"], *_ = src.noe.compute_NOE_mdtraj(NOE_dict[k], max_populated_cluster)
    # Deal with ambigous NOEs
    NOE_n[k] = NOE_n[k].explode("md")
    # and ambigous/multiple values
    NOE_n[k] = NOE_n[k].explode("NMR exp")

    # Remove duplicate values (keep value closest to experimental value)
    NOE_test = NOE_n[k]
    if (NOE_test["NMR exp"].to_numpy() == 0).all():
        # if all exp values are 0: take middle between upper / lower bound as reference value
        NOE_test["NMR exp"] = (NOE_test["upper bound"] + NOE_test["lower bound"]) * 0.5
    NOE_test["dev"] = NOE_test["md"] - np.abs(NOE_test["NMR exp"])
    NOE_test["abs_dev"] = np.abs(NOE_test["md"] - np.abs(NOE_test["NMR exp"]))

    NOE_test = NOE_test.sort_values("abs_dev", ascending=True)
    NOE_test.index = NOE_test.index.astype(int)
    NOE_test = NOE_test[~NOE_test.index.duplicated(keep="first")].sort_index(
        kind="mergesort"
    )

    # drop NaN values:
    NOE_test = NOE_test.dropna()
    # Compute metrics now
    # Compute NOE statistics, since no bootstrap necessary, do a single iteration.. TODO: could clean this up further to pass 0, then just return the value...
    RMSD, *_ = src.stats.compute_RMSD(
        NOE_test["NMR exp"], NOE_test["md"], n_bootstrap=1
    )
    MAE, *_ = src.stats.compute_MAE(NOE_test["NMR exp"], NOE_test["md"], n_bootstrap=1)
    MSE, *_ = src.stats.compute_MSE(NOE_test["dev"], n_bootstrap=1)
    fulfil = src.stats.compute_fulfilled_percentage(NOE_test)
    # insert values
    values = [MAE, MSE, RMSD, None, None, None, fulfil]
    NOE_stats[k].insert(4, "most-populated-1", values)

# If there are no cis/trans clusters, still write a column 'most-populated-1', but fill with NaN
for k in remover:
    values = [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]
    NOE_stats[k].insert(4, "most-populated-1", values)

NOE 14 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 31 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 32 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 33 error: at least 1 of the             atoms is undefined! Set value to NaN
NOE 34 error: at least 1 of the             atoms is undefined! Set value to NaN

for k in NOE_stats.keys():
    display(NOE_stats[k])
    # convert df to dict for export
    NOE_stats[k] = NOE_stats[k].to_dict()
# Save
src.utils.json_dump(snakemake.output.noe_stats, NOE_stats)

	stat	value	up	low	most-populated-1
0	MAE	0.472866	0.630383	0.327913	0.585169
1	MSE	-0.080592	0.137013	0.000000	0.239544
2	RMSD	0.652473	0.812288	0.469192	0.760445
3	pearsonr	0.493513	0.707600	0.261873	NaN
4	kendalltau	0.409033	0.612470	0.180314	NaN
5	chisq	4.292417	6.469079	2.414905	NaN
6	percentage_fulfilled	0.558824	0.000000	0.000000	0.529412

plt.rc('font', size=MEDIUM_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=BIGGER_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title


if multiple:
    fig, axs = plt.subplots(2, 1)
    if len(cis) > CIS_TRANS_CUTOFF:
        # cis
        axs[0].scatter(NOE_dev["cis"]["NMR exp"], NOE_dev["cis"]["md"])
        axs[0].set_ylabel("MD")
        axs[0].set_xlabel("Experimental NOE value")
        axs[0].axline((1.5, 1.5), slope=1, color="black")
        axs[0].set_title("Experimental vs MD derived NOE values - cis")

    if len(trans) > CIS_TRANS_CUTOFF:
        # trans
        axs[1].scatter(NOE_dev["trans"]["NMR exp"], NOE_dev["trans"]["md"])
        axs[1].set_ylabel("MD")
        axs[1].set_xlabel("Experimental NOE value")
        axs[1].axline((1.5, 1.5), slope=1, color="black")
        axs[1].set_title("Experimental vs MD derived NOE values - trans")
    fig.tight_layout()
    fig.savefig(snakemake.output.noe_stat_plot)
else:
    plt.scatter(NOE_dev["single"]["NMR exp"], NOE_dev["single"]["md"])
    if snakemake.params.method != "cMD":
        plt.scatter(
            NOE_dev["single"]["NMR exp"], NOE_dev["single"]["upper"], marker="_"
        )
        plt.scatter(
            NOE_dev["single"]["NMR exp"], NOE_dev["single"]["lower"], marker="_"
        )
    plt.ylabel("MD")
    plt.xlabel("Experimental NOE value")
    plt.axline((1.5, 1.5), slope=1, color="black")
    plt.title("Experimental vs MD derived NOE values")
    plt.tight_layout()
    plt.savefig(snakemake.output.noe_stat_plot)
    

../_images/28159d44aa267024_GaMD_processed_99_0.png

# is the mean deviation significantly different than 0? if pvalue < 5% -> yes! We want: no! (does not deviate from exp. values)
if multiple:
    if len(cis) > CIS_TRANS_CUTOFF:
        print(stats.ttest_1samp(NOE_dev["cis"]["dev"], 0.0))
    if len(trans) > CIS_TRANS_CUTOFF:
        print(stats.ttest_1samp(NOE_dev["trans"]["dev"], 0.0))
else:
    print(stats.ttest_1samp(NOE_dev["single"]["dev"], 0.0))

Ttest_1sampResult(statistic=-0.7150261526332978, pvalue=0.4796212744884518)

if multiple:
    if len(cis) > CIS_TRANS_CUTOFF:
        print(stats.describe(NOE_dev["cis"]["dev"]))
    if len(trans) > CIS_TRANS_CUTOFF:
        print(stats.describe(NOE_dev["trans"]["dev"]))
else:
    print(stats.describe(NOE_dev["single"]["dev"]))

DescribeResult(nobs=34, minmax=(-1.4775437333962005, 1.523458928926428), mean=-0.08059150996572174, variance=0.4319302783161461, skewness=0.16801845701400994, kurtosis=0.5198239565606388)

# Make overview figure
plot1 = final_figure_axs[0].getroot()
plot2 = final_figure_axs[1].getroot()
plot3 = final_figure_axs[2].getroot()
plot4 = final_figure_axs[3].getroot()
plot5 = final_figure_axs[4].getroot()

if multiple:
#     # TODO: fix this!
#         sc.Figure(
#         "4039",
#         "5048",
#         sc.Panel(plot3, sc.Text("A", 0, 0, size=16, weight='bold')),
#         sc.Panel(
#             plot2,
#             sc.Text("B", "8.5cm", "0cm", size=16),
#         ).move(0,200),
#         sc.Panel(plot4, sc.Text("C", 25, 20, size=16, weight='bold')),
#         # sc.Panel(plot4, sc.Text("D", 25, 20, size=20, weight='bold')).scale(0.8).move(-200,0),
#     ).tile(2, 2).save(snakemake.output.overview_plot)
    sc.Figure(
        "4039",
        "7068", # 5048
        sc.Panel(plot2, sc.Text("A", 25, 20, size=16, weight='bold').move(-12,0)).scale(8),
        sc.Panel(plot3, sc.Text("B", 25, 20, size=16, weight='bold').move(-12,0)).scale(8),
        sc.Panel(plot4, sc.Text("C", 25, 20, size=16, weight='bold').move(-12,-24)).scale(8).move(0, -300),
        sc.Panel(sc.Text("2")),
        sc.Panel(plot5, sc.Text("D", 25, 20, size=16, weight='bold').move(-12,0)).scale(8).move(0, -1550),
        # sc.Panel(plot1, sc.Text("D", 25, 0, size=16, weight='bold')),
    ).tile(2, 3).save(snakemake.output.overview_plot)
else:
    sc.Figure(
        "4039",
        "5048", # 4039
        sc.Panel(plot2, sc.Text("A", 25, 20, size=16, weight='bold').move(-12,0)).scale(8),
        sc.Panel(plot3, sc.Text("B", 25, 20, size=16, weight='bold').move(-12,0)).scale(8),
        sc.Panel(plot4, sc.Text("C", 25, 20, size=16, weight='bold').move(-12,-24)).scale(8).move(0, 350),
        sc.Panel(sc.Text("2")),
        sc.Panel(plot5, sc.Text("D", 25, 20, size=16, weight='bold').move(-12,0)).scale(8).move(0, -250),
        # sc.Panel(plot1, sc.Text("D", 25, 0, size=16, weight='bold')),
    ).tile(2, 3).save(snakemake.output.overview_plot)

src.utils.show_svg(snakemake.output.overview_plot)

../_images/28159d44aa267024_GaMD_processed_104_0.svg

print("Done")

Done

MacroConf

Analysis Notebook

Contents

2.17. Analysis Notebook#

2.17.1. Compound details#

2.17.2. Simulation details#

2.17.3. Convergence of the simulation#

2.17.3.1. RMSD#

2.17.3.2. Dihedral angles#

2.17.4. Dimensionality Reductions#

2.17.4.1. Cartesian PCA#

2.17.4.2. Pairwise distances PCA#

2.17.4.3. Dihedral PCA#

2.17.4.4. TSNE#

2.17.4.5. Shape analysis - principal moments of inertia#

2.17.4.6. Cremer pople analysis#

2.17.4.7. Comparison#

2.17.5. DBSCAN-Clustering#

2.17.6. NOEs#

2.17.6.1. NOE without reweighting.#

2.17.6.2. Reweighted NOEs#

2.17.7. NOE-Statistics#