Source code for lipyphilic.lib.registration

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# lipyphilic --- lipyphilic.readthedocs.io
#
# Released under the GNU Public Licence, v2 or any higher version
#

r"""Registration --- :mod:`lipyphilic.lib.registration`
======================================================

:Author: Paul Smith
:Year: 2021
:Copyright: GNU Public License v2

This module provides methods for determining registration of leaflets in a bilayer.

The degree of registration is calculated as the pearson correlation coefficient of
densities in the upper and lower leaflets. First, the 2D density of each leaflet,
:math:`L`, is calculated:

.. math::

  \rho(x, y)_{L} = \displaystyle \int\limits_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}\sigma} \exp \Bigg({-}\frac{1}{2} \bigg(\frac{x' - x}{\sigma} \bigg)^2 \Bigg) \,dx' dy'


where the :math:`(x, y)` positions of lipid atoms in leaflet :math:`L` are binned
into two-dimensional histograms, then convolved with a circular Gaussian density
of standard deviation :math:`\sigma`. :math:`L` is either the upper (:math:`u`) or
lower (:math:`l`) leaflet.

The correlation between the two leaflets, :math:`r_{u/l}`, is then calculated as
the pearson correlation coefficient between :math:`\rho(x, y)_{u}` and
:math:`\rho(x, y)_{l}`, where values of:

* :math:`1` correspond to perfectly registered
* :math:`-1` correspond to perfectly anti-registered

For more information on interleaflet registration in bilayers
see `Thallmair et al. (2018) <https://pubs.acs.org/doi/abs/10.1021/acs.jpclett.8b01877>`__.

Input
-----

Required:
  - *universe* : an MDAnalysis Universe object.
  - *upper_sel* : atom selection for lipids in the upper leaflet to use in the registration calculation
  - *lower_sel* : atom selection for lipids in the lower leaflet to use in the registration calculation
  - *leaflets* : leaflet membership (-1: lower leaflet, 0: midplane, 1: upper leaflet) of each lipid in the membrane.

Optional:
  - *filter_by* : boolean mask for determining which lipids to include in the registration calculation
  - *n_bins* : the number of bins to use in *x* and *y* for the 2D histogram
  - *gaussian_sd* : the standard deviation of the circular Gaussian to convole with the grid densities
  
Output
------

  - *registration* : the degree of interleaglet registration at each frame

The data are stored in the :attr:`registration.registration` attribute, containing the pearson
correlation coefficient of the two-dimensional leaflet densities at each frame.

Example usage of :class:`Registration`
--------------------------------------

An MDAnalysis Universe must first be created before using :class:`Registration`::

  import MDAnalysis as mda
  from lipyphilic.lib.registration import Registration

  u = mda.Universe(tpr, trajectory)


Then we need to know which leaflet each lipid is in at each frame. This may be done using
:class:`lipyphilic.lib.assign_leaflets.AssignLeaflets`::

  leaflets = AssignLeaflets(
    universe=u,
    lipid_sel="name GL1 GL2 ROH" # assuming we are using the MARTINI forcefield
  )
  leaflets.run()


The leaflets data are stored in the :attr:`leaflets.leaflets` attribute. We can now create our
Registration object by passing our :class:`lipyphilic.lib.assign_leaflets.AssignLeaflets`
object to :class:`Registration` along with atom selections for the lipids::

  registration = Registration(
    upper_sel="resname CHOL and name ROH",
    lower_sel="resname CHOL and name ROH",
    leaflets=leaflets.filter_leaflets("resname CHOL and name ROH")
  )
  
To calculate the interleaflet correlation of cholesterol molecules using their ROH
beads we then need to use the :func:`run()` method. We select which frames of the trajectory
to analyse (`None` will use every frame) and choose to display a progress bar (`verbose=True`)::
  
  registration.run(
    start=None,
    stop=None,
    step=None,
    verbose=True
  )

The results are then available in the :attr:`registration.registration` attribute as a
:class:`numpy.ndarray`. Again, :math:`1` corresponds to the leaflets being perfectly in register
and :math:`-1` corresponds to the leaflets being perfectly anti-registered.

Selecting a subset of lipids for the registration analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The previous example will compute the registration of cholesterol across the upper
and lower leaflets. In, for example, simulations of phase-separation domains, it is useful
to know the registration of liquid-ordered domains (regardless of the species in the domain)
rather than the registrtion of specific lipid species.

If we have a 2D array, 'lipid_order_data', that contains information on whether each lipid is in
the liquid-disordered phase or the liquid-ordered phase at each frame, we can used this to
calculate the registration of ordered domains. The array must take the shape
'(n_residues, n_frames)', and in the below example 'lipid_order_data[i, j]' would be equal to `-1`
if lipid 'i' is liquid-disordered at frame 'j' and equal to `1` if it is liquid-ordered::

  registration = Registration(
    upper_sel="name PO4 ROH",
    lower_sel="name PO4 ROH",
    leaflets=leaflets.leaflets,
    filter_by=lipid_order_data == 1
  )


If we have a ternary mixture of DPPC/DOPC/Cholesterol, we can also specifcy that we wish to
consider only DPPC and cholesterol in the liquid-ordered phase::

  registration = Registration(
    upper_sel="(resname CHOL and name ROH) or (resname DPPC and name PO4)",
    lower_sel="(resname CHOL and name ROH) or (resname DPPC and name PO4)",
    leaflets=leaflets.filter_leaflets("resname CHOL DPPC"),
    filter_by=lipid_order_data == 1
  )
  

Changing the resolution of the 2D grid
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, the lipid positions of each leaflet are binned into a two-dimensional
histogram using :math:`n\_bins_x = \lceil{x}\rceil`, where :math:`n\_bins_x` is the
numer of bins in :math:`x` and :math:`\lceil{x}\rceil` is the size of system in :math:`x`
rounded up to the nearest integer. This gives a grid resolution of  *1* Å.

It is also possible to specify the number of bins to use for binning the data::

  registration = Registration(
    upper_sel="resname CHOL and name ROH",
    lower_sel="resname CHOL and name ROH",
    leaflets=leaflets.filter_leaflets("resname CHOL"),
    n_bins=100
  )


This will use *100* bins for creating the two-dimensional histogram. Fewer bins
will result in a performance increase but at the cost of spatial resolution. For
all but the largest systems, the default of *1* Å is appropriate. If your system
is larger than a few hundred nm in one dimension, you will likely want to set
:attr:`n_bins` to 2000 or less.

Changing the standard deviation of the circular Gaussian density
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The defualt value of :math:`\sigma` is *15*, which is the value used by
`Thallmair et al. (2018) <https://pubs.acs.org/doi/abs/10.1021/acs.jpclett.8b01877>`__
for determining interleaflet cholesterol correlations. This deault value can be
changed using the :attr:`gaussian_sd` parameter::

  registration = Registration(
    upper_sel="resname CHOL and name ROH",
    lower_sel="resname CHOL and name ROH",
    leaflets=leaflets.filter_leaflets("resname CHOL"),
    gaussian_sd=12
  )


Figure 2d of `Thallmair et al. (2018)
<https://pubs.acs.org/doi/abs/10.1021/acs.jpclett.8b01877>`__ shows how correlation
tends to increase with increasing :attr:`gaussian_sd`. This is because the density of
atomic positions is more diffuse and thus more likely to overlap between the two
leaflets. Increasing :attr:`gaussian_sd` also incurs a performance cost.

The class and its methods
-------------------------

.. autoclass:: Registration
    :members:

"""

import numpy as np
import scipy.stats
import scipy.ndimage

from lipyphilic.lib import base


[docs]class Registration(base.AnalysisBase): """Calculate interleaflet registration in a bilayer. """ def __init__(self, universe, upper_sel, lower_sel, leaflets, filter_by=None, n_bins=None, gaussian_sd=15 ): """Set up parameters for the registration calculation. Parameters ---------- universe : Universe MDAnalysis Universe object upper_sel : str Selection string for lipids in the upper leaflet of the bilayer to be used for determining registration. lower_sel : str Selection string for lipids in the lower leaflet of the bilayer to be used for determining registration. leaflets : numpy.ndarray An array of leaflet membership in which: -1 corresponds to the lower leaflet; 1 corresponds to the upper leaflet; and 0 corresponds to the midplane. If the array is 1D and of shape (n_lipids), each lipid is taken to remain in the same leaflet over the trajectory. If the array is 2D and of shape (n_lipids, n_frames), the leaflet to which each lipid is assisgned at each frame will be taken into account when calculating the area per lipid. filter_by : numpy.ndarray, optional A boolean array indicating whether or not to include each lipid in the registration analysis. If the array is 1D and of shape (n_lipids), the same lipids will be used in the registration analysis at every frame. If the array is 2D and of shape (n_lipids, n_frames), the boolean value of each lipid at each frame will be taken into account. The default is `None`, in which case no filtering is performed. n_bins : int, optional The number of bins to use in each dimension for the two-dimensional density calculations. The default is `None`, in which case the number of bins will be given by the size of the system in the 'x' dimension rounded up to the nearest integer. gaussian_sd : float, optional The standard deviation of the circular Gaussian density to convolve with the two-dimensional densities. The spreads out the data to better represent the size of the lipids. The default is 15. """ super(Registration, self).__init__(universe.trajectory) self.u = universe self.upper_sel = upper_sel self.lower_sel = lower_sel self.membrane = self.u.select_atoms(f"({self.upper_sel}) or ({self.lower_sel})") if not np.allclose(self.u.dimensions[3:], 90.0): raise ValueError("Registration requires an orthorhombic box. Please use the on-the-fly " "transformation :class:`lipyphilic.transformations.triclinic_to_orthorhombic` " "before calling Registration" ) if np.array(leaflets).ndim not in [1, 2]: raise ValueError("'leaflets' must either be a 1D array containing non-changing " "leaflet ids of each lipid, or a 2D array of shape (n_residues, n_frames)" " containing the leaflet id of each lipid at each frame." ) if len(leaflets) != self.membrane.n_residues: raise ValueError("The shape of 'leaflets' must be (n_residues,), but 'lipid_sel' " f"generates an AtomGroup with {self.membrane.n_residues} residues" f" and 'leaflets' has shape {leaflets.shape}." ) self.leaflets = leaflets if filter_by is not None and np.array(filter_by).ndim not in [1, 2]: raise ValueError("'filter_by' must either be a 1D array containing non-changing boolean" "values for each lipid, or a 2D array of shape (n_residues, n_frames)" " containing a boolean value for each lipid at each frame." ) elif filter_by is not None and len(filter_by) != self.membrane.n_residues: raise ValueError("The shape of 'filter_by' must be (n_residues,)") # determine which lipids to use in the analysis at each frame if filter_by is None: self.filter_by = np.full_like( self.leaflets, fill_value=True, dtype=bool ) elif filter_by.ndim == 1: self.filter_by = np.full_like( self.leaflets, fill_value=filter_by[:, np.newaxis], dtype=bool ) else: self.filter_by = filter_by.astype(bool) self.n_bins = n_bins self.gaussian_sd = gaussian_sd self.registration = None def _prepare(self): # Output array self.registration = np.full(self.n_frames, fill_value=np.NaN) def _single_frame(self): # Atoms must be inside the primary unit cell self.membrane.residues.atoms.wrap(inplace=True) # get the bins for the 2d histograms x_length = int(np.ceil(self._ts.dimensions)[0]) if self.n_bins is not None: bins = np.linspace(0, x_length, self.n_bins + 1) else: bins = np.linspace(0, x_length, x_length + 1) # Upper leaflet 2d histogram upper_ordered_res = self.membrane.residues[np.logical_and( self.leaflets[:, self._frame_index] == 1, self.filter_by[:, self._frame_index], )] upper_ordered_atoms = upper_ordered_res.atoms.select_atoms(self.upper_sel) upper_hist, *_ = np.histogram2d( upper_ordered_atoms.positions[:, 0], upper_ordered_atoms.positions[:, 1], bins=bins ) # convolve with gaussian kernel upper_hist = scipy.ndimage.gaussian_filter( upper_hist, sigma=self.gaussian_sd, mode="wrap" ) # Find lower ordered atoms lower_ordered_res = self.membrane.residues[np.logical_and( self.leaflets[:, self._frame_index] == -1, self.filter_by[:, self._frame_index], )] lower_ordered_atoms = lower_ordered_res.atoms.select_atoms(self.lower_sel) lower_hist, *_ = np.histogram2d( lower_ordered_atoms.positions[:, 0], lower_ordered_atoms.positions[:, 1], bins=bins ) # convolve with gaussian kernel lower_hist = scipy.ndimage.gaussian_filter( lower_hist, sigma=self.gaussian_sd, mode="wrap" ) # correlation correlation = scipy.stats.pearsonr( upper_hist.flatten(), lower_hist.flatten() ) self.registration[self._frame_index] = correlation[0]