%matplotlib inline

Analyze seqFISH data

This tutorial shows how to apply Squidpy for the analysis of seqFISH data.

The data used here was obtained from [Lohoff et al., 2020]. We provide a pre-processed subset of the data, in anndata.AnnData format. For details on how it was pre-processed, please refer to the original paper.

Import packages & data

To run the notebook locally, create a conda environment as conda env create -f environment.yml using this environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>_.

import numpy as np

import scanpy as sc
import squidpy as sq


# load the pre-processed dataset
adata = sq.datasets.seqfish()
scanpy==1.9.2 anndata==0.8.0 umap==0.5.3 numpy==1.23.5 scipy==1.9.3 pandas==1.5.1 scikit-learn==1.1.3 statsmodels==0.13.2 python-igraph==0.10.2 pynndescent==0.5.7

First, let’s visualize cluster annotation in spatial context with squidpy.pl.spatial_scatter().

    adata, color="celltype_mapped_refined", shape=None, figsize=(10, 10)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

Neighborhood enrichment analysis

Similar to other spatial data, we can investigate spatial organization of clusters in a quantitative way, by computing a neighborhood enrichment score. You can compute such score with the following function: squidpy.gr.nhood_enrichment(). In short, it’s an enrichment score on spatial proximity of clusters: if spots belonging to two different clusters are often close to each other, then they will have a high score and can be defined as being enriched. On the other hand, if they are far apart, the score will be low and they can be defined as depleted. This score is based on a permutation-based test, and you can set the number of permutations with the n_perms argument (default is 1000).

Since the function works on a connectivity matrix, we need to compute that as well. This can be done with squidpy.gr.spatial_neighbors(). Please see Building spatial neighbors graph for more details of how this function works.

Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment(). We’ll add a dendrogram to the heatmap computed with linkage method ward.

sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="celltype_mapped_refined")
sq.pl.nhood_enrichment(adata, cluster_key="celltype_mapped_refined", method="ward")
100%|██████████| 1000/1000 [00:07<00:00, 125.61/s]

A similar analysis was performed in the original publication [Lohoff et al., 2020], and we can appreciate to what extent results overlap. For instance, there seems to be an enrichment between the Lateral plate mesoderm, the Intermediate mesoderm and a milder enrichment for Allantois cells. As in the original publication, there also seems to be an association between the Endothelium and the Haematoendothelial progenitors. Of course, results do not perfectly overlap, and this could be due to several factors:

  • the construction of the neighbors graph (which in our case is not informed by the radius, as we did not have access to this information).

  • the number of permutation of the neighborhood enrichment (500 in the original publication against the default 1000 in our implementation).

We can also visualize the spatial organization of cells again, and appreciate the proximity of specific cell clusters. For this, we’ll use squidpy.pl.spatial_scatter() again.

        "Haematoendothelial progenitors",
        "Lateral plate mesoderm",
        "Intermediate mesoderm",
        "Presomitic mesoderm",
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

Co-occurrence across spatial dimensions

In addition to the neighbor enrichment score, we can visualize cluster co-occurrence in spatial dimensions. This is a similar analysis of the one presented above, yet it does not operate on the connectivity matrix, but on the original spatial coordinates. The co-occurrence score is defined as:

.. math::


where \(p(exp|cond)\) is the conditional probability of observing a cluster \(exp\) conditioned on the presence of a cluster \(cond\), whereas \(p(exp)\) is the probability of observing \(exp\) in the radius size of interest. The score is computed across increasing radii size around each cell in the tissue.

We can compute this score with squidpy.gr.co_occurrence() and set the cluster annotation for the conditional probability with the argument clusters. Then, we visualize the results with squidpy.pl.co_occurrence().

sq.gr.co_occurrence(adata, cluster_key="celltype_mapped_refined")
    clusters="Lateral plate mesoderm",
    figsize=(10, 5),
100%|██████████| 1/1 [00:37<00:00, 37.70s/]

It seems to recapitulate a previous observation, that there is a co-occurrence between the conditional cell type annotation Lateral plate mesoderm and the clusters Intermediate mesoderm and Allantois. It also seems that at longer distances, there is a co-occurrence of cells belonging to the Presomitic mesoderm cluster. By visualizing the full tissue as before we can indeed appreciate that these cell types seems to form a defined clusters relatively close to the Lateral plate mesoderm cells. It should be noted that the distance units corresponds to the spatial coordinates saved in adata.obsm['spatial'].

Ligand-receptor interaction analysis

The analysis showed above has provided us with quantitative information on cellular organization and communication at the tissue level. We might be interested in getting a list of potential candidates that might be driving such cellular communication. This naturally translates in doing a ligand-receptor interaction analysis. In Squidpy, we provide a fast re-implementation the popular method CellPhoneDB [Efremova et al., 2020] (code <https://github.com/Teichlab/cellphonedb>_) and extended its database of annotated ligand-receptor interaction pairs with the popular database Omnipath [Türei et al., 2016]. You can run the analysis for all clusters pairs, and all genes (in seconds, without leaving this notebook), with squidpy.gr.ligrec().

Let’s perform the analysis and visualize the result for three clusters of interest: Lateral plate mesoderm, Intermediate mesoderm and Allantois. For the visualization, we will filter out annotations with low-expressed genes (with the means_range argument) and decreasing the threshold for the adjusted p-value (with the alpha argument).

    source_groups="Lateral plate mesoderm",
    target_groups=["Intermediate mesoderm", "Allantois"],
    means_range=(0.3, np.inf),
9.39MB [00:00, 26.9MB/s]
1.52MB [00:00, 17.4MB/s]
3.80MB [00:00, 25.3MB/s]
100%|██████████| 100/100 [00:11<00:00,  8.81permutation/s]

The dotplot visualization provides an interesting set of candidate interactions that could be involved in the tissue organization of the cell types of interest. It should be noted that this method is a pure re-implementation of the original permutation-based test, and therefore retains all its caveats and should be interpreted accordingly.