Neighbours — lipyphilic.lib.neighbours

Author

Paul Smith

Year

2021

Copyright

GNU Public License v2

This module provides methods for finding neighbouring lipids in a bilayer, calculating local lipid compositions and lipid enrichment, and finding the largest cluster of specific species of lipids over time.

Two lipids are considered neighbours if they have any atoms within a given cutoff distance of one another.

Input

Required:
  • universe : an MDAnalysis Universe object.

  • lipid_sel : atom selection for lipids in the bilayer

Optional:
  • cutoff : lipids are considered to be neighbouring if they have at least one pair of atoms less than this distance apart (in Å)

Output

  • neighbours : a sparse matrix of binary variables, equal to 1 if two lipids are in contact, and 0 otherwise

For efficient use of memory, an adjacency matrix of neighbouring lipids is stored in a scipy.sparse.csr_matrix sparse matrix for each frame of the analysis. The data are stored in the neighbours.neighbours attribute as a NumPy array of sparse matrices. Each matrix has shape (n_residues, n_residues)

Tip

The resultant sparse matrix can be used to calculate the local lipid composition of each individual lipid at each frame using lipyphilic.lib.neighbours.count_neighbours(), or to find the largest cluster of lipids at each frame using lipyphilic.lib.neighbours.largest_cluster().

Example usage of Neighbours

An MDAnalysis Universe must first be created before using Neighbours:

import MDAnalysis as mda
from lipyphilic.lib.neighbours import Neighbours

u = mda.Universe(tpr, trajectory)

We can now create our Neighbours object:

neighbours = Neighbours(
  universe=u,
  lipid_sel="name GL1 GL2 ROH",  # assuming we're using the MARTINI forcefield
  cutoff=12.0
)

A lipid will be considered to be neighbouring a cholesterol molecule if either its GL1 or GL2 bead is within 12 Å of the ROH bead of the cholesterol. For neighbouring lipids, the distances between there respective GL1 and “GL2* beads will be considered.

We then select which frames of the trajectory to analyse (None will use every frame) and select to display a progress bar (verbose=True):

neighbours.run(
  start=None,
  stop=None,
  step=None,
  verbose=True
)

The results are then available in the neighbours.Neighbours attribute as a numpy.ndarray of Compressed Sparse Row matrices.

Counting the number of neighbours: by lipid species

In order to compute the number of each lipid species around each lipid at each frame, after generating the neighbour matrix we can use the count_neighbours() method:

counts = neighbours.count_neighbours()

Counts is a pandas.DataFrame in which each row contains the following information (if there are N distinct species in the membrane):

[
    <lipid identifier>,  # by default, the lipid resname
    <lipid resindex>,
    <frame>,
    <num species_1 neighbours>,
    ...
    <num species_N neighbours>,
    <total num neighbours>
]

Counting the number of neighbours: by user-defined labels

Instead of using the lipid resname to identify neighbouring lipids, any ordinal data may be used for counting lipid neighbours. This is done through use of the count_by and count_by_labels parameters:

counts = neighbours.count_neighbours(
  count_by=lipid_order_data,
  count_by_labels={'Ld': 0, 'Lo': 1}
)

Here we assume that ‘lipid_order_data’ contains information on whether each lipid is in the liquid-disordered phase or the liquid-ordered phase at each frame. It must take the shape ‘(n_residues, n_frames)’, and in this example ‘lipid_order_data[i, j]’ would be equal to ‘0’ if lipid ‘i’ is liquid-disordered at frame ‘j’ and equal to ‘1’ if it is liquid-ordered. ‘count_by_labels’ is used to signify that the value ‘0’ corresponds to the liquid-disordered (Ld) phase and the value ‘1’ to the liquid-ordered (Lo) phase. In this example, the returned pandas.DataFrame would contain the following information in each row:

[
    <Ld or Lo>,
    <lipid resindex>,
    <frame>,
    <num Ld neighbours>,
    <num Lo neighbours>,
    <total num neighbours>
]

Calculate the enrichment index of lipid species

The count_neighbours() method will, by default, return the number of neighbouring lipids around each individual lipid.

However, a clearer picture of aggregation of certain lipid species can be gained by instead considering the enrichment/depletion index of each lipid species, defined in Ingólfsson et al. (2014). In this instance, the number of each neighbour species B around a given reference species A is normalized by the average number of species B around any lipid.

To calculate the enrichment/depletion index of each species at each frame, as well as the raw neighbour counts, we can set the return_enrichment keyword to true:

counts, enrichment = neighbours.count_neighbours(return_enrichment=True)

This will return two pandas DataFrames, one containing the neighbour counts and the other the enrichment/depletion index of each species at each frame. The benefit of having the enrichment index at each frame is that you can plot its time-evolution to see whether particular species form aggregates over time.

Find the largest cluster

To find the largest cluster of a set of lipid species we can use the largest_cluster() method:

largest_cluster = neighbours.largest_cluster(
  cluster_sel="resname CHOL DPPC"
)

The results are returned in a numpy.ndarray and contain the number of lipids in the largest cluster at each frame.

Find the largest cluster in a given leaflet

The previous example will compute the largest cluster formed by cholesterol and DPPC molecules at each frame. In large coarse-grained systems where there is substantial flip-flop of sterols, this cluster may span both leaflets. In order to find the largest cluster at each frame within a given leaflet, we can tell largest_cluster() to consider only lipids in the upper leaflet by using the filter_by parameter.

First, though, we need to know which leaflet each lipid is in at each frame. This may be done using lipyphilic.lib.assign_leaflets.AssignLeaflets:

leaflets = AssignLeaflets(
  universe=u,
  lipid_sel="name GL1 GL2 ROH"  # pass the same selection that was passed to Neighbours
)
leaflets.run()  # run the analysis on the same frames as Neighbours.run()

The leaflets data are stored in the leaflets.leaflets attribute, will be equal to ‘1’ if the lipid is in the upper leaflet at a given frame and equal to ‘-1’ if it is in the lower leaflet. See lipyphilic.lib.assign_leaflets.AssignLeaflets for more information. We can now find the largest cluster over time in the upper (1) leaflet.

The filter_by parameter takes as input a 2D numpy.ndarray of shape (n_residues, n_frames). The array should be a boolean mask, where True indicates that we should include this lipid in the neighbour calculation:

upper_leaflet_mask = leaflet.leaflets == 1

largest_cluster_upper_leaflet = neighbours.largest_cluster(
  cluster_sel="resname CHOL DPPC",
  filter_by=upper_leaflet_mask
)

Now, lipids either in the lower leaflet (-1) or the midplane (0) will not be included when determining the largest cluster.

Get residue indices of lipids in the largest cluster

If we want to know not just the cluster size but also which lipids are in the largest cluster at each frame, we can set the return_indices parameter to True:

largest_cluster, largest_cluster_indices = neighbours.largest_cluster(
  cluster_sel="resname CHOL DPPC",
  return_indices=True
)

The residue indices will be returned as list of numpy.ndarray arrays - one per frame of the analysis. Each array contains the residue indices of the lipids in the largest cluster at that frame

The class and its methods

class lipyphilic.lib.neighbours.Neighbours(universe, lipid_sel, cutoff=10.0)[source]

Find neighbouring lipids in a bilayer.

Set up parameters for finding neighbouring lipids.

Parameters
  • universe (Universe) – MDAnalysis Universe object

  • lipid_sel (str) – Selection string for lipids in the bilayer.

  • cutoff (float, optional) – To be considered neighbours, two lipids must have at least one pair of atoms within this cutoff distance (in Å). The default is 10.0.

count_neighbours(count_by=None, count_by_labels=None, return_enrichment=False)[source]

Count the number of each neighbour type at each frame.

Parameters
  • count_by (numpy.ndarray, optional) – An array containing ordinal data describing each lipid at each frame. For example, it may be an array containing information on the ordered state or each lipid. Defaults to None, in which case the lipid species (resnames) are used for counting neighbours.

  • count_by_labels (dict, optional) – A dictionary of labels describing what each unique value in count_by refers to, e.g if count_by contains information on the ordered state of each lipid at each frame, whereby 0 corresponds to disordered and 1 corresponds to ordered, then count_by_labels = {‘Ld’: 0, ‘Lo’: 1}. There must be precisely one label for each unique value in ‘count_by’. If count_by is given but count_by_labels is left as None, the values in count_by will be used as the labels.

  • return_enrichment (bool, optional) – If True, a second DataFrame containing the fractional enrichment of each lipid species at each frame is also returned. The default is False, in which case the fractional enrichment if not returned.

Returns

  • counts (pandas.DataFrame) – A DataFrame containing the following data for each lipid at each frame: lipid identifier (default is resname), lipid residue index, frame number, number of neighbours of each species (or of each type in ‘count_by’ if this is provided), as well as the total number of neighbours.

  • enrichment (pandas.DataFrame) – A DataFrame containing the following data enrichment/depletion data for each lipid species at each frame.

largest_cluster(cluster_sel=None, filter_by=None, return_indices=False)[source]

Find the largest cluster of lipids at each frame.

Parameters
  • cluster_sel (str, optional) – Selection string for lipids to include in the cluster analysis. The default is None, in which case all lipid used in identiying neighbouring lipids will be used for finding the largest cluster.

  • filter_by (numpy.ndarray, optional) – A boolean array indicating whether or not to include each lipid in the cluster analysis. If the array is 1D and of shape (n_lipids), the same lipids will be used in the cluster analysis at every frame. If the array is 2D and of shape (n_lipids, n_frames), the boolean value of each lipid at each frame will be taken into account. The default is None, in which case all lipids used in identiying neighbours will be used for finding the largest cluster.

  • return_indices (bool, optional) – If True, a list of NumPy arrays will also be returned, on for each frame. Each NumPy array will contain the residue indices of the lipids in the largest cluster at that frame. Note, if there are two largest clusters of equal size, only the residue indices of lipids in one cluster will be returned (the cluster that has the lipid with the smallest residue index). The default is False, in which case no reidue indices are returned.

Returns

  • largest_cluster (numpy.ndarray) – An array containing the number of lipids in the largest cluster at each frame.

  • indices (list) – A list of 1D NumPy arrays, where each array corresponds to a single frame and contains the residue indices of lipids in the largest cluster at that frame.

Note

Neighbours must be found by using Neighbours.run() before calling either Neighbours.count_neighbours() or Neighbours.largest_cluster().