%matplotlib inline
Analyze seqFISH data
This tutorial shows how to apply Squidpy for the analysis of seqFISH data.
The data used here was obtained from [Lohoff et al., 2020].
We provide a pre-processed subset of the data, in anndata.AnnData
format.
For details on how it was pre-processed, please refer to the original paper.
Import packages & data
To run the notebook locally, create a conda environment as conda env create -f environment.yml using this
environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>
_.
import numpy as np
import scanpy as sc
import squidpy as sq
sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
# load the pre-processed dataset
adata = sq.datasets.seqfish()
scanpy==1.9.2 anndata==0.8.0 umap==0.5.3 numpy==1.23.5 scipy==1.9.3 pandas==1.5.1 scikit-learn==1.1.3 statsmodels==0.13.2 python-igraph==0.10.2 pynndescent==0.5.7
squidpy==1.2.3
First, let’s visualize cluster annotation in spatial context
with squidpy.pl.spatial_scatter()
.
sq.pl.spatial_scatter(
adata, color="celltype_mapped_refined", shape=None, figsize=(10, 10)
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
Neighborhood enrichment analysis
Similar to other spatial data, we can investigate spatial organization of clusters
in a quantitative way, by computing a neighborhood enrichment score.
You can compute such score with the following function: squidpy.gr.nhood_enrichment()
.
In short, it’s an enrichment score on spatial proximity of clusters:
if spots belonging to two different clusters are often close to each other,
then they will have a high score and can be defined as being enriched.
On the other hand, if they are far apart, the score will be low
and they can be defined as depleted.
This score is based on a permutation-based test, and you can set
the number of permutations with the n_perms
argument (default is 1000).
Since the function works on a connectivity matrix, we need to compute that as well.
This can be done with squidpy.gr.spatial_neighbors()
.
Please see Building spatial neighbors graph for more details
of how this function works.
Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment()
.
We’ll add a dendrogram to the heatmap computed with linkage method ward.
sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="celltype_mapped_refined")
sq.pl.nhood_enrichment(adata, cluster_key="celltype_mapped_refined", method="ward")
100%|██████████| 1000/1000 [00:07<00:00, 125.61/s]
A similar analysis was performed in the original publication [Lohoff et al., 2020], and we can appreciate to what extent results overlap. For instance, there seems to be an enrichment between the Lateral plate mesoderm, the Intermediate mesoderm and a milder enrichment for Allantois cells. As in the original publication, there also seems to be an association between the Endothelium and the Haematoendothelial progenitors. Of course, results do not perfectly overlap, and this could be due to several factors:
the construction of the neighbors graph (which in our case is not informed by the radius, as we did not have access to this information).
the number of permutation of the neighborhood enrichment (500 in the original publication against the default 1000 in our implementation).
We can also visualize the spatial organization of cells again,
and appreciate the proximity of specific cell clusters.
For this, we’ll use squidpy.pl.spatial_scatter()
again.
sq.pl.spatial_scatter(
adata,
color="celltype_mapped_refined",
groups=[
"Endothelium",
"Haematoendothelial progenitors",
"Allantois",
"Lateral plate mesoderm",
"Intermediate mesoderm",
"Presomitic mesoderm",
],
shape=None,
size=2,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
Co-occurrence across spatial dimensions
In addition to the neighbor enrichment score, we can visualize cluster co-occurrence in spatial dimensions. This is a similar analysis of the one presented above, yet it does not operate on the connectivity matrix, but on the original spatial coordinates. The co-occurrence score is defined as:
.. math::
\frac{p(exp|cond)}{p(exp)}
where \(p(exp|cond)\) is the conditional probability of observing a cluster \(exp\) conditioned on the presence of a cluster \(cond\), whereas \(p(exp)\) is the probability of observing \(exp\) in the radius size of interest. The score is computed across increasing radii size around each cell in the tissue.
We can compute this score with squidpy.gr.co_occurrence()
and set the cluster annotation for the conditional probability with
the argument clusters
. Then, we visualize the results with
squidpy.pl.co_occurrence()
.
sq.gr.co_occurrence(adata, cluster_key="celltype_mapped_refined")
sq.pl.co_occurrence(
adata,
cluster_key="celltype_mapped_refined",
clusters="Lateral plate mesoderm",
figsize=(10, 5),
)
100%|██████████| 1/1 [00:37<00:00, 37.70s/]
It seems to recapitulate a previous observation, that there is a co-occurrence between the
conditional cell type annotation Lateral plate mesoderm and the clusters
Intermediate mesoderm and Allantois.
It also seems that at longer distances, there is a co-occurrence of cells belonging to
the Presomitic mesoderm cluster. By visualizing the full tissue as before we can indeed
appreciate that these cell types seems to form a defined clusters relatively close
to the Lateral plate mesoderm cells.
It should be noted that the distance units corresponds to
the spatial coordinates saved in adata.obsm['spatial']
.
Ligand-receptor interaction analysis
The analysis showed above has provided us with quantitative information on
cellular organization and communication at the tissue level.
We might be interested in getting a list of potential candidates that might be driving
such cellular communication.
This naturally translates in doing a ligand-receptor interaction analysis.
In Squidpy, we provide a fast re-implementation the popular method CellPhoneDB [Efremova et al., 2020]
(code <https://github.com/Teichlab/cellphonedb>
_)
and extended its database of annotated ligand-receptor interaction pairs with
the popular database Omnipath [Türei et al., 2016].
You can run the analysis for all clusters pairs, and all genes (in seconds,
without leaving this notebook), with squidpy.gr.ligrec()
.
Let’s perform the analysis and visualize the result for three clusters of
interest: Lateral plate mesoderm,
Intermediate mesoderm and Allantois. For the visualization, we will
filter out annotations
with low-expressed genes (with the means_range
argument)
and decreasing the threshold for the adjusted p-value (with the alpha
argument).
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key="celltype_mapped_refined",
)
sq.pl.ligrec(
adata,
cluster_key="celltype_mapped_refined",
source_groups="Lateral plate mesoderm",
target_groups=["Intermediate mesoderm", "Allantois"],
means_range=(0.3, np.inf),
alpha=1e-4,
swap_axes=True,
)
9.39MB [00:00, 26.9MB/s]
1.52MB [00:00, 17.4MB/s]
3.80MB [00:00, 25.3MB/s]
100%|██████████| 100/100 [00:11<00:00, 8.81permutation/s]
The dotplot visualization provides an interesting set of candidate interactions that could be involved in the tissue organization of the cell types of interest. It should be noted that this method is a pure re-implementation of the original permutation-based test, and therefore retains all its caveats and should be interpreted accordingly.