Source code for lipyphilic.lib.assign_leaflets

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# lipyphilic --- lipyphilic.readthedocs.io
#
# Released under the GNU Public Licence, v2 or any higher version
#

"""Assign leaflets --- :mod:`lipyphilic.lib.assign_leaflets`
============================================================

:Author: Paul Smith
:Year: 2021
:Copyright: GNU Public License v2

This module provides methods for assigning lipids to leaflets in a bilayer.

Assigning leaflets in planar bilayers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The class :class:`lipyphilic.lib.assign_leaflets.AssignLeaflets` assigns
each lipid to a leaflet based on the distance in *z* to the midpoint of
the bilayer. Lipids may be assigned to the upper leaflet (indicated by `1`),
the lower leaflet (`-1`) or the bilayer midplane (`0`).

Input
------

Required:
  - *universe* : an MDAnalysis Universe object
  - *lipid_sel* : atom selection for *all* lipids in the bilayer, including e.g. sterols

Options:
  - *midplane_sel* : atom selection for lipid that may occupy the midplane
  - *midplane_cutoff* : atoms within this distance from the midpoint are considered to be the midplane
  - *n_bins* : split the membrane into *n_bins \\* n_bins* patches, and calculate local membrane midpoints for each patch

Output
------

  - *leaflets* : leaflet to which each lipid is assigned at each frame
  
Leaflet data are returned in a :class:`numpy.ndarray`, where each row corresponds
to an individual lipid and each column corresponds to an individual frame, i.e.
leaflets[i, j] refers to the leaflet of lipid *i* at frame *j*. The results are
accessible via the `AssignLeaflets.leaflets` attribute.


Example usage of :class:`AssignLeaflets`
----------------------------------------

An MDAnalysis Universe must first be created before using AssignLeaflets::

  import MDAnalysis as mda
  from lipyphilic.lib.assign_leaflets import AssignLeaflets

  u = mda.Universe(tpr, trajectory)

If we have used the MARTINI forcefield to study a phospholipid/cholesterol mixture,
we can assign lipids and cholesterol to the upper and lower as follows::

  leaflets = AssignLeaflets(
    universe=u,
    lipid_sel="name GL1 GL2 ROH"
  )
  
We then select which frames of the trajectory to analyse (`None` will use every
frame) and choose to display a progress bar (`verbose=True`)::
  
  leaflets.run(
    start=None,
    stop=None,
    step=None,
    verbose=True
  )
  
The results are then available in the :attr:`leaflets.leaflets` attribute as a
:class:`numpy.ndarray`. Each row corresponds to an individual lipid and each column
to an individual frame, i.e `leaflets.leaflets[i, j]` contains the leaflet
membership of lipid *i* at frame *j*. Lipid *i*, at frame *j*, is in the upper
leaflet if `leaflets.leaflets[i, j]==1` and in the lower leaflet if
`leaflets.leaflets[i, j]==-1`.

Allowing lipids in the midplane
-------------------------------

The above example will assign every lipid (including sterols) to either the upper
or lower leaflet. To allow cholesterol to be in the midplane, we can provide
a :attr:`midplane_sel` and :attr:`midplane_cutoff` to :class:`AssignLeaflets`::

  leaflets = AssignLeaflets(
    universe=u,
    lipid_sel="name GL1 GL2 ROH",
    midplane_sel="resname CHOL and name ROH C2",
    midplane_cutoff=12.0
  )
  
A cholesterol molecule that has both its *ROH* and *C2* atoms within *12* Å of
membrane midpoint will be assigned to the midplane, i.e. for cholesterol *i*
at frame *j* that is in the midplane, `leaflets.leaflets[i, j]==0`.

Changing the resolution of the membrane grid
--------------------------------------------

The first two examples compute a global membrane midpoint based on all the atoms
of the lipids in the membrane. Lipids are then assigned a leaflet based on their distance
in :math:`z` to this midpoint. This is okay for planar bilayers, but can lead to incorrect
leaflet classification in membranes with undulations. If your bilayer has undulations,
`AssignLeaflets` can account for this by creating a grid in :math:`xy`
of your membrane, calculating the local membrane midpoint in each patch,
then assigning leaflet membership based on distance in :math:`z` to the local membrane
midpoint. This is done through use of `n_bins`::

  leaflets = AssignLeaflets(
    universe=u,
    lipid_sel="name GL1 GL2 ROH",
    midplane_sel="resname CHOL and name ROH C2",
    midplane_cutoff=12.0,
    n_bins=10
  )
  
In this example, the membrane will be split into a *10 x 10* grid and a lipid
assigned a leaflet based on the distance to the midpoint of the patch the lipid
is in.


Assigning leaflets in membranes with high curvature
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If your membrane is a vesicle or bilayer with *very* large undulations, such as in a
`buckled membrane <https://aip.scitation.org/doi/pdf/10.1063/1.4808077>`__,
:class:`lipyphilic.lib.assign_leaflets.AssignLeaflets` will assign lipids to the wrong
leaflet

The class :class:`lipyphilic.lib.assign_leaflets.AssignCurvedLeaflets` can be used in these
scenaries to assign each lipid to a leaflet using `MDAnalysis' Leaflet Finder
<https://docs.mdanalysis.org/1.0.0/documentation_pages/analysis/leaflet.html>`__.
Lipids may still be assigned to the upper/outer leaflet (indicated by `1`), the lower/inner leaflet
(`-1`) or the membrane midplane (`0`).

Input
------

Required:
  - *universe* : an MDAnalysis Universe object
  - *lipid_sel* : atom selection for *all* lipids in the bilayer, including e.g. sterols
  - lf_cutoff : distance cutoff below which two neighbouring atoms will be considered to be in the same leaflet.
Options:
  - *midplane_sel* : atom selection for lipid that may occupy the midplane
  - *midplane_cutoff* : atoms further than this distance from the either leaflet are considered to be the midplane
  - pbc : bool, specifying whether or not to take periodic boundaries into account

Output
------

  - *leaflets* : leaflet to which each lipid is assigned at each frame
  
Leaflet data are returned in a :class:`numpy.ndarray`, where each row corresponds
to an individual lipid and each column corresponds to an individual frame, i.e.
leaflets[i, j] refers to the leaflet of lipid *i* at frame *j*. The results are
accessible via the `AssignLeaflets.leaflets` attribute.

Example usage of :class:`AssignCurvedLeaflets`
----------------------------------------------

An MDAnalysis Universe must first be created before using AssignCurvedLeaflets::

  import MDAnalysis as mda
  from lipyphilic.lib.assign_leaflets import AssignLeaflets

  u = mda.Universe(tpr, trajectory)

If we have used the MARTINI forcefield to study a phospholipid/cholesterol mixture,
we can assign lipids and cholesterol to the upper and lower as follows::

  leaflets = AssignCurvedLeaflets(
    universe=u,
    lipid_sel="name GL1 GL2 ROH",
    lf_cutoff=12.0,
    midplane_sel="name ROH",
    midplane_cutoff=10.0
  )
  
We then select which frames of the trajectory to analyse (`None` will use every
frame) and choose to display a progress bar (`verbose=True`)::
  
  leaflets.run(
    start=None,
    stop=None,
    step=None,
    verbose=True
  )
  
This will first use `MDAnalysis' Leaflet Finder
<https://docs.mdanalysis.org/1.0.0/documentation_pages/analysis/leaflet.html>`__ to assign
all lipids, excluding those in :attr:`midplane_sel`, to either the upper or lower leaflet. The
:class:`LeafletFinder` will consider two lipids to be in the same leaflet if they have `GL1` or
`GL2` atoms within :math:`12` Å of one another. From this, we find the two largest leaflets,
then assign the remaining phospholipids to a leaflet based on whichever leaflet they are closest
to.

The phospholipids do not change leaflets throughtout the trajectory, only cholesterol --- as specified
with :attr:`midplane_sel` and :attr:`midplane_cutoff`. Thus, at each frame, each cholesterol is
assinged a leaflet based on it's minimum distance to the leaflet. In the above example, if a cholesterol
is within :math:`10` Å of one leaflet it is assigned to that leaflet. If it is within :math:`10` Å of
*neither* or *both* leaflets then it is assigned to the midplane.
    
The results are then available in the :attr:`leaflets.leaflets` attribute as a
:class:`numpy.ndarray`. Each row corresponds to an individual lipid and each column
to an individual frame, i.e `leaflets.leaflets[i, j]` contains the leaflet
membership of lipid *i* at frame *j*. Lipid *i*, at frame *j*, is in the upper
leaflet if `leaflets.leaflets[i, j]==1` and in the lower leaflet if
`leaflets.leaflets[i, j]==-1`.

The classes and their methods
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. autoclass:: AssignLeaflets
    :members:

.. autoclass:: AssignCurvedLeaflets
    :members:
    :inherited-members:
    :exclude-members: run
"""

import numpy as np
import scipy.stats
import MDAnalysis.analysis.leaflet
import MDAnalysis.analysis.distances

from lipyphilic.lib import base


class AssignLeafletsBase(base.AnalysisBase):
    """Abstract base class for leaflet identification.
    """
    
    def __init__(self, universe,
                 lipid_sel,
                 midplane_sel=None, midplane_cutoff=None):
        
        super(AssignLeafletsBase, self).__init__(universe.trajectory)
        
        self.u = universe
        self.membrane = self.u.select_atoms(lipid_sel, updating=False)

        if (midplane_sel is not None) ^ (midplane_cutoff is not None):
            raise ValueError(f"midplane_sel is '{midplane_sel}' and midplane_cutoff "
                             f"is {midplane_cutoff}. To assign molecules to the midplane, "
                             "midplane_sel must be provided and midplane_cutoff must be "
                             "greater than 0."
                             )
        
        if (midplane_cutoff is not None) and (midplane_cutoff <= 0):
            raise ValueError("To assign molecules to the midplane, midplane_cutoff must"
                             "be greater than 0."
                             )
        
        self.potential_midplane = self.u.select_atoms(midplane_sel, updating=False) if midplane_sel else None
        self.midplane_cutoff = midplane_cutoff if midplane_cutoff else 0.0
        
        if self.potential_midplane and ((self.potential_midplane - self.membrane.residues.atoms).n_atoms > 0):
            raise ValueError("midplane_sel contains atoms that are not present in molecules selected "
                             "in lipid_sel. lipid_sel must cover *all* residues in the membrane."
                             )
        
    def _assign_leaflets(self):
        """Assign lipids to the upper (1) or lower (-1) leaflet.
        """
        pass  # pragma: no cover
    
    def _find_midplane(self):
        """Determine which residues are in the midplane
        """
        pass  # pragma: no cover
    
    def filter_leaflets(self, lipid_sel=None, start=None, stop=None, step=None):
        """Create a subset of the leaflets results array.
    
        Filter either by lipid species or by the trajectory frames, or both.

        Parameters
        ----------
        lipid_sel : str, optional
            MDAnalysis selection string that will be used to select a subset of lipids present
            in the leaflets results array. The default is `None`, in which case data for all lipids
            will be returned.
        start : int, optional
            Start frame for filtering. The default is `None`, in which case the first frame is used
            as the start.
        stop : int, optional
            Stop frame for filtering. The default is `None`, in which case the final frame is used
            as the stop.
        step : int, optional
            Number of frames to skip when filtering frames. The deafult is `None`, in which case
            all frames between `start` and `stop` are used.
        
        """
        
        lipid_sel = "all" if lipid_sel is None else lipid_sel
        lipids = self.membrane.residues.atoms.select_atoms(lipid_sel)
        keep_lipids = np.in1d(self.membrane.residues.resindices, lipids.residues.resindices)
        
        start, stop, step = self.u.trajectory.check_slice_indices(start, stop, step)
        frames = np.arange(start, stop, step)
        keep_frames = np.in1d(self.frames, frames)
        
        return self.leaflets[keep_lipids][:, keep_frames]


[docs]class AssignLeaflets(AssignLeafletsBase): """Assign lipids in a bilayer to the upper leaflet, lower leaflet, or midplane. """ def __init__(self, universe, lipid_sel, midplane_sel=None, midplane_cutoff=None, n_bins=1): """Set up parameters for assigning lipids to a leaflet. Parameters ---------- universe : Universe MDAnalysis Universe object lipid_sel : str Selection string for the lipids in a membrane. The selection should cover **all** residues in the membrane, including cholesterol. midplane_sel : str, optional Selection string for residues that may be midplane. Any residues not in this selection will be assigned to a leaflet regardless of its proximity to the midplane. The default is `None`, in which case all lipids will be assigned to either the upper or lower leaflet. midplane_cutoff : float, optional Minimum distance in *z* an atom must be from the midplane to be assigned to a leaflet rather than the midplane. The default is `0`, in which case all lipids will be assigned to either the upper or lower leaflet. Must be non-negative. n_bins : int, optional Number of bins in *x* and *y* to use to create a grid of membrane patches. Local membrane midpoints are computed for each patch, and lipids assigned a leaflet based on the distance to their local membrane midpoint. The default is `1`, which is equivalent to computing a single global midpoint. Note ---- Typically, :attr:`midplane_sel` should select only sterols. Other lipids have flip-flop rates that are currently unaccessible with MD simulations, and thus should always occupy either the upper or lower leaflet. """ super(AssignLeaflets, self).__init__( universe=universe, lipid_sel=lipid_sel, midplane_sel=midplane_sel, midplane_cutoff=midplane_cutoff ) if not np.allclose(self.u.dimensions[3:], 90.0): raise ValueError("AssignLeaflets requires an orthorhombic box. Please use the on-the-fly " "transformation :class:`lipyphilic.transformations.triclinic_to_orthorhombic` " "before calling AssignLeaflets" ) self.n_bins = n_bins self.leaflets = None def _prepare(self): # Output array self.leaflets = np.full( (self.membrane.n_residues, self.n_frames), fill_value=0, dtype=np.int8 # smallest sized `np.int` is 1 byte, still 8 times smaller than using `int` ) def _single_frame(self): # Atoms must be wrapped before creating a lateral grid of the membrane self.membrane.wrap(inplace=True) # Find the midpoint of the bilayer as a function of (x,y), using # `n_bins` grid points in each dimensions # Use all atoms in the membrane to get better statistics if self.n_bins > 1: bins = np.linspace(0.0, self._ts.dimensions[0], self.n_bins + 1) else: # scipy.stats.binned_statistics raises Value error if there is only one bin bins = [0.0, self._ts.dimensions[0] + 1, self._ts.dimensions[0] + 2] memb_midpoint_xy = scipy.stats.binned_statistic_2d( x=self.membrane.positions[:, 0], y=self.membrane.positions[:, 1], values=self.membrane.positions[:, 2], statistic="mean", bins=bins, expand_binnumbers=True ) # Assign leaflets self._assign_leaflets(memb_midpoint_xy) # if necessary, find midplane residues if (self.potential_midplane is not None) and self.midplane_cutoff > 0.0: self._find_midplane(memb_midpoint_xy=memb_midpoint_xy) def _assign_leaflets(self, memb_midpoint_xy): # lgtm [py/inheritance/signature-mismatch] """Assign lipids to the upper (1) or lower (-1) leaflet. Parameters ---------- memb_midpoint_xy : BinnedStatistic2dResult Membrane grid created with stats.binned_statistic_2d. Contains the midpoint of each membrane patch. """ # x and y have the same number of bins bins = memb_midpoint_xy.x_edge # get the binnumbers for each lipid lipid_x_bins, lipid_y_bins = scipy.stats.binned_statistic_2d( x=self.membrane.positions[:, 0], y=self.membrane.positions[:, 1], values=self.membrane.positions[:, 2], statistic="mean", bins=bins, expand_binnumbers=True ).binnumber -1 # These were bin numbers, now bin indices # noqa: E225 upper_leaflet = self.membrane[ self.membrane.positions[:, 2] > (memb_midpoint_xy.statistic[lipid_x_bins, lipid_y_bins]) # we don't to consider midplane_cutoff here ] self.leaflets[ np.in1d(self.membrane.residues.resindices, upper_leaflet.residues.resindices), self._frame_index ] = 1 lower_leaflet = self.membrane[ self.membrane.positions[:, 2] < (memb_midpoint_xy.statistic[lipid_x_bins, lipid_y_bins]) # we don't to consider midplane_cutoff here ] self.leaflets[ np.in1d(self.membrane.residues.resindices, lower_leaflet.residues.resindices), self._frame_index ] = -1 return None def _find_midplane(self, memb_midpoint_xy): # lgtm [py/inheritance/signature-mismatch] """Determine which residues are in the midplane Parameters ---------- memb_midpoint_xy : BinnedStatistic2dResult Membrane grid created with stats.binned_statistic_2d. Contains the midpoint of each membrane patch. """ # Atoms must be wrapped before so we can assign lipids to grid patches self.potential_midplane.wrap(inplace=True) # x and y have the same number of bins bins = memb_midpoint_xy.x_edge midplane_x_bins, midplane_y_bins = scipy.stats.binned_statistic_2d( x=self.potential_midplane.positions[:, 0], y=self.potential_midplane.positions[:, 1], values=self.potential_midplane.positions[:, 2], statistic="mean", bins=bins, expand_binnumbers=True ).binnumber -1 # These were bin numbers, now bin indices # noqa: E225 # First assume they're all midplane # Then find residues that have at least one atom further than # `midplane_cutoff` from the local midplane midplane_mask = np.full(self.potential_midplane.n_residues, fill_value=True, dtype=bool) not_midplane = np.abs( self.potential_midplane.positions[:, 2] - memb_midpoint_xy.statistic[midplane_x_bins, midplane_y_bins] ) > self.midplane_cutoff # These residues have at least one atom in `potential_midplane` # that is more the `midplane_cutoff` from the local midplane midplane_mask[ np.in1d(self.potential_midplane.residues.resindices, self.potential_midplane[not_midplane].resindices), ] = False midplane_residues = self.potential_midplane.residues[midplane_mask] # Assign midplane self.leaflets[ np.in1d(self.membrane.residues.resindices, midplane_residues.resindices), self._frame_index ] = 0 return None
[docs]class AssignCurvedLeaflets(AssignLeafletsBase): """Assign lipids in a membrane to the upper leaflet, lower leaflet, or midplane. """ def __init__(self, universe, lipid_sel, lf_cutoff=15, midplane_sel=None, midplane_cutoff=None, pbc=True): """Set up parameters for assigning lipids to a leaflet. Parameters ---------- universe : Universe MDAnalysis Universe object lipid_sel : str Selection string for the lipids in a membrane. The selection should cover **all** residues in the membrane, including cholesterol. lf_cutoff : float, optional Cutoff to pass to `MDAnalysis.analysis.leaflet.LeafletFinder`. Lipids closer than this cutoff distance apart will be considered to be in the same leaflet. The default is 15.0 midplane_sel : str, optional Selection string for residues that may be midplane. Any residues not in this selection will be assigned to a leaflet at ever frame. The default is `None`, in which case no molecules will be considered to be in the midplane. midplane_cutoff : float, optional Lipids with atoms selected in `midplane_sel` that are within this distance of a leaflet will be to that leaflet. If a molecule is within this distance of *neither* or *both* leaflets, it will be assigned to the midplane. The default is `None`. pbc : bool, optional Take periodic boundary conditions into account. The default is `True`. Note ---- Typically, :attr:`midplane_sel` should select only sterols. Other lipids have flip-flop rates that are currently unaccessible with MD simulations, and thus should always occupy either the upper or lower leaflet. """ super(AssignCurvedLeaflets, self).__init__( universe=universe, lipid_sel=lipid_sel, midplane_sel=midplane_sel, midplane_cutoff=midplane_cutoff ) self.lf_cutoff = lf_cutoff self._pbc = pbc self.upper = None self.lower = None def _prepare(self): # Output array self.leaflets = np.full( (self.membrane.n_residues, self.n_frames), fill_value=0, dtype=np.int8 # smallest sized `np.int` is 1 byte, still 8 times smaller than using `int` ) self._assign_leaflets() def _single_frame(self): # if necessary, find midplane residues if self.potential_midplane is not None: self._find_midplane() def _assign_leaflets(self): """Assign lipids to the upper (1) or lower (-1) leaflet. """ # Assign non-translocating lipids to their leaflets static = self.membrane - self.potential_midplane if self.potential_midplane is not None else self.membrane static_sel = "index " + " ".join(static.indices.astype(str)) leaflets_static = MDAnalysis.analysis.leaflet.LeafletFinder( universe=self.u, select=static_sel, cutoff=self.lf_cutoff, pbc=self._pbc ) # make sure the two largest leaflets are in the first two indices sorter = np.argsort([g.n_atoms for g in leaflets_static.groups_iter()])[::-1] atom_groups = np.array(leaflets_static.groups(), dtype=object)[sorter] # If each group has the same number of atoms the above line will # cause each atom to be listed, rather than each AtomGroup, # thus increasing the dimension of the array to 2 if atom_groups.ndim == 2: atom_groups = np.sum(atom_groups, axis=1) # find the center of geometry of each atom group cogs = np.array([ag.center_of_geometry() for ag in atom_groups]) # Use this to define upper and lower leaflets lower, upper = atom_groups[np.argsort(cogs[:2, 2])] # And assign remaining groups to a leaflet for ag in atom_groups[2:]: upper_dists = MDAnalysis.lib.distances.distance_array( ag.positions, upper.positions ) lower_dists = MDAnalysis.lib.distances.distance_array( ag.positions, lower.positions ) if np.min(upper_dists) < np.min(lower_dists): upper += ag else: lower += ag upper = upper[np.argsort(upper.indices)] lower = lower[np.argsort(lower.indices)] upper_mask = np.in1d(self.membrane.residues.resindices, upper.residues.resindices) lower_mask = np.in1d(self.membrane.residues.resindices, lower.residues.resindices) self._upper = upper self._lower = lower self.leaflets[upper_mask] = 1 self.leaflets[lower_mask] = -1 return None def _find_midplane(self): """Determine which residues are in the midplane. Note, the below method of assigning molecules to the midplane is about two times faster than selecting dynamic atoms around upper/lower leaflet atoms, e.g: potential_upper = potential_midplane.select_atoms("around 10 global group upper", upper=upper).residues """ upper_pairs = MDAnalysis.lib.distances.capped_distance( self.potential_midplane.positions, self._upper.positions, max_cutoff=self.midplane_cutoff, return_distances=False ) lower_pairs = MDAnalysis.lib.distances.capped_distance( self.potential_midplane.positions, self._lower.positions, max_cutoff=self.midplane_cutoff, return_distances=False ) upper_resindices = self.potential_midplane[upper_pairs[:, 0]].resindices lower_resindices = self.potential_midplane[lower_pairs[:, 0]].resindices # In upper but not lower = Upper Leaflet # And in lower but not upper = Lower Leaflet # Those in both or neither = midplane in_upper = np.in1d(upper_resindices, lower_resindices, invert=True) add_to_upper = np.in1d(self.membrane.residues.resindices, upper_resindices[in_upper]) self.leaflets[add_to_upper, self._frame_index] = 1 in_lower = np.in1d(lower_resindices, upper_resindices, invert=True) add_to_lower = np.in1d(self.membrane.residues.resindices, lower_resindices[in_lower]) self.leaflets[add_to_lower, self._frame_index] = -1