Launch binder

Analyze Merfish data

This tutorial shows how to apply Squidpy for the analysis of Merfish data.

The data used here was obtained from [Moffitt et al., 2018]. We provide a pre-processed subset of the data, in anndata.AnnData format. For details on how it was pre-processed, please refer to the original paper.

See also

See Analyze Slide-seqV2 data and Analyze seqFISH data for additional analysis examples.

Import packages & data

To run the notebook locally, create a conda environment as conda env create -f environment.yml using this environment.yml.

import scanpy as sc
import squidpy as sq

sc.logging.print_header()
print(f"squidpy=={sq.__version__}")

# load the pre-processed dataset
adata = sq.datasets.merfish()
adata

Out:

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.22.4 scipy==1.8.1 pandas==1.4.2 scikit-learn==1.1.1 statsmodels==0.13.2 python-igraph==0.9.11 pynndescent==0.5.7
squidpy==1.2.2

AnnData object with n_obs × n_vars = 73655 × 161
    obs: 'Cell_ID', 'Animal_ID', 'Animal_sex', 'Behavior', 'Bregma', 'Centroid_X', 'Centroid_Y', 'Cell_class', 'Neuron_cluster_ID', 'batch'
    uns: 'Cell_class_colors'
    obsm: 'spatial', 'spatial3d'

This datasets consists of consecutive slices from the mouse hypothalamic preoptic region. It represents an interesting example of how to work with 3D spatial data in Squidpy. Let’s start with visualization: we can either visualize the 3D stack of slides using scanpy.pl.embedding():

sc.pl.embedding(adata, basis="spatial3d", projection="3d", color="Cell_class")
Cell_class

Or visualize a single slide with squidpy.pl.spatial_scatter(). Here the slide identifier is stored in adata.obs[“Bregma”], see original paper for definition.

sq.pl.spatial_scatter(adata[adata.obs.Bregma == -9], shape=None, color="Cell_class", size=1)
Cell_class

Out:

/Users/giovanni.palla/Projects/squidpy_notebooks/.tox/docs/lib/python3.9/site-packages/anndata/compat/_overloaded_dict.py:106: ImplicitModificationWarning: Trying to modify attribute `._uns` of view, initializing view as actual.
  self.data[key] = value

Neighborhood enrichment analysis in 3D

It is important to consider whether the analysis should be performed on the 3D spatial coordinates or the 2D coordinates for a single slice. Functions that make use of the spatial graph can already support 3D coordinates, but it is important to consider that the z-stack coordinate is in the same unit metrics as the x, y coordinates. Let’s start with the neighborhood enrichment score. You can read more on the function in the docs at Building spatial neighbors graph. First, we need to compute a neighbor graph with squidpy.gr.spatial_neighbors(). If we want to compute the neighbor graph on the 3D coordinate space, we need to specify spatial_key = "spatial3d". Then we can use squidpy.gr.nhood_enrichment() to compute the score, and visualize it with squidpy.gr.nhood_enrichment().

sq.gr.spatial_neighbors(adata, coord_type="generic", spatial_key="spatial3d")
sq.gr.nhood_enrichment(adata, cluster_key="Cell_class")
sq.pl.nhood_enrichment(adata, cluster_key="Cell_class", method="single", cmap="inferno", vmin=-50, vmax=100)
Neighborhood enrichment

Out:

  0%|          | 0/1000 [00:00<?, ?/s]
  0%|          | 1/1000 [00:04<1:15:35,  4.54s/]
  2%|2         | 24/1000 [00:04<02:15,  7.22/s]
  5%|4         | 47/1000 [00:04<00:57, 16.50/s]
  7%|7         | 70/1000 [00:04<00:32, 28.46/s]
  9%|9         | 92/1000 [00:04<00:21, 42.64/s]
 12%|#1        | 115/1000 [00:05<00:14, 60.36/s]
 14%|#3        | 138/1000 [00:05<00:10, 80.34/s]
 16%|#6        | 161/1000 [00:05<00:08, 101.66/s]
 18%|#8        | 183/1000 [00:05<00:06, 121.94/s]
 21%|##        | 206/1000 [00:05<00:05, 142.35/s]
 23%|##2       | 229/1000 [00:05<00:04, 160.04/s]
 25%|##5       | 252/1000 [00:05<00:04, 175.14/s]
 28%|##7       | 275/1000 [00:05<00:03, 187.69/s]
 30%|##9       | 298/1000 [00:05<00:03, 197.26/s]
 32%|###2      | 321/1000 [00:05<00:03, 204.08/s]
 34%|###4      | 344/1000 [00:06<00:03, 208.58/s]
 37%|###6      | 367/1000 [00:06<00:02, 212.35/s]
 39%|###9      | 390/1000 [00:06<00:02, 215.26/s]
 41%|####1     | 413/1000 [00:06<00:02, 216.99/s]
 44%|####3     | 436/1000 [00:06<00:02, 216.70/s]
 46%|####5     | 459/1000 [00:06<00:02, 218.87/s]
 48%|####8     | 482/1000 [00:06<00:02, 218.53/s]
 50%|#####     | 505/1000 [00:06<00:02, 219.94/s]
 53%|#####2    | 528/1000 [00:06<00:02, 220.60/s]
 55%|#####5    | 551/1000 [00:07<00:02, 210.94/s]
 57%|#####7    | 573/1000 [00:07<00:02, 196.24/s]
 59%|#####9    | 593/1000 [00:07<00:02, 186.22/s]
 61%|######1   | 613/1000 [00:07<00:02, 188.45/s]
 64%|######3   | 636/1000 [00:07<00:01, 198.95/s]
 66%|######5   | 658/1000 [00:07<00:01, 204.34/s]
 68%|######8   | 681/1000 [00:07<00:01, 210.25/s]
 70%|#######   | 704/1000 [00:07<00:01, 213.87/s]
 73%|#######2  | 727/1000 [00:07<00:01, 216.35/s]
 75%|#######5  | 750/1000 [00:08<00:01, 219.04/s]
 77%|#######7  | 773/1000 [00:08<00:01, 221.24/s]
 80%|#######9  | 796/1000 [00:08<00:00, 220.94/s]
 82%|########1 | 819/1000 [00:08<00:00, 221.59/s]
 84%|########4 | 842/1000 [00:08<00:00, 221.76/s]
 86%|########6 | 865/1000 [00:08<00:00, 220.98/s]
 89%|########8 | 888/1000 [00:08<00:00, 220.73/s]
 91%|#########1| 911/1000 [00:08<00:00, 221.11/s]
 93%|#########3| 934/1000 [00:08<00:00, 221.22/s]
 96%|#########5| 957/1000 [00:08<00:00, 220.74/s]
 98%|#########8| 980/1000 [00:09<00:00, 221.65/s]
100%|##########| 1000/1000 [00:09<00:00, 109.51/s]

We can visualize some of the co-enriched clusters with scanpy.pl.embedding(). We will set na_colors=(1,1,1,0) to make transparent the other observations, in order to better visualize the clusters of interests across z-stacks.

sc.pl.embedding(
    adata,
    basis="spatial3d",
    groups=["OD Mature 1", "OD Mature 2", "OD Mature 4"],
    na_color=(1, 1, 1, 0),
    projection="3d",
    color="Cell_class",
)
Cell_class

Out:

/Users/giovanni.palla/Projects/squidpy_notebooks/.tox/docs/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:1171: FutureWarning: Categorical.replace is deprecated and will be removed in a future version. Use Series.replace directly instead.
  values = values.replace(values.categories.difference(groups), np.nan)

We can also visualize gene expression in 3D coordinates. Let’s perform differential expression testing with scanpy.tl.rank_genes_groups() and visualize the results

sc.tl.rank_genes_groups(adata, groupby="Cell_class")
sc.pl.rank_genes_groups(adata, groupby="Cell_class")
Ambiguous vs. rest, Astrocyte vs. rest, Endothelial 1 vs. rest, Endothelial 2 vs. rest, Endothelial 3 vs. rest, Ependymal vs. rest, Excitatory vs. rest, Inhibitory vs. rest, Microglia vs. rest, OD Immature 1 vs. rest, OD Immature 2 vs. rest, OD Mature 1 vs. rest, OD Mature 2 vs. rest, OD Mature 3 vs. rest, OD Mature 4 vs. rest, Pericytes vs. rest

and the expression in 3D.

sc.pl.embedding(adata, basis="spatial3d", projection="3d", color=["Gad1", "Mlc1"])
Gad1, Mlc1

If the same analysis should be performed on a single slice, then it is advisable to copy the sample of interest in a new anndata.AnnData and use it as a standard 2D spatial data object.

adata_slice = adata[adata.obs.Bregma == -9].copy()
sq.gr.spatial_neighbors(adata_slice, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="Cell_class")
sq.pl.spatial_scatter(
    adata_slice,
    color="Cell_class",
    shape=None,
    groups=[
        "Ependymal",
        "Pericytes",
        "Endothelial 2",
    ],
    size=10,
)
Cell_class

Out:

  0%|          | 0/1000 [00:00<?, ?/s]
  0%|          | 1/1000 [00:04<1:14:23,  4.47s/]
  2%|2         | 23/1000 [00:04<02:18,  7.03/s]
  5%|4         | 47/1000 [00:04<00:56, 16.84/s]
  7%|7         | 70/1000 [00:04<00:32, 28.92/s]
  9%|9         | 93/1000 [00:04<00:20, 43.94/s]
 12%|#1        | 116/1000 [00:04<00:14, 61.78/s]
 14%|#3        | 138/1000 [00:05<00:10, 80.85/s]
 16%|#6        | 160/1000 [00:05<00:08, 101.19/s]
 18%|#8        | 183/1000 [00:05<00:06, 122.70/s]
 21%|##        | 206/1000 [00:05<00:05, 143.27/s]
 23%|##2       | 229/1000 [00:05<00:04, 160.62/s]
 25%|##5       | 251/1000 [00:05<00:04, 174.45/s]
 27%|##7       | 274/1000 [00:05<00:03, 187.15/s]
 30%|##9       | 297/1000 [00:05<00:03, 196.62/s]
 32%|###2      | 320/1000 [00:05<00:03, 204.81/s]
 34%|###4      | 343/1000 [00:06<00:03, 210.68/s]
 37%|###6      | 366/1000 [00:06<00:02, 213.50/s]
 39%|###8      | 389/1000 [00:06<00:02, 216.81/s]
 41%|####1     | 412/1000 [00:06<00:02, 218.99/s]
 44%|####3     | 435/1000 [00:06<00:02, 220.45/s]
 46%|####5     | 458/1000 [00:06<00:02, 220.83/s]
 48%|####8     | 481/1000 [00:06<00:02, 221.96/s]
 50%|#####     | 504/1000 [00:06<00:02, 223.00/s]
 53%|#####2    | 527/1000 [00:06<00:02, 223.23/s]
 55%|#####5    | 550/1000 [00:06<00:02, 222.66/s]
 57%|#####7    | 573/1000 [00:07<00:01, 222.44/s]
 60%|#####9    | 596/1000 [00:07<00:01, 222.85/s]
 62%|######1   | 619/1000 [00:07<00:01, 223.24/s]
 64%|######4   | 642/1000 [00:07<00:01, 224.09/s]
 66%|######6   | 665/1000 [00:07<00:01, 224.08/s]
 69%|######8   | 688/1000 [00:07<00:01, 223.08/s]
 71%|#######1  | 711/1000 [00:07<00:01, 222.35/s]
 73%|#######3  | 734/1000 [00:07<00:01, 223.23/s]
 76%|#######5  | 757/1000 [00:07<00:01, 223.60/s]
 78%|#######8  | 780/1000 [00:07<00:00, 223.84/s]
 80%|########  | 803/1000 [00:08<00:00, 223.93/s]
 83%|########2 | 826/1000 [00:08<00:00, 223.77/s]
 85%|########4 | 849/1000 [00:08<00:00, 223.64/s]
 87%|########7 | 872/1000 [00:08<00:00, 224.07/s]
 90%|########9 | 895/1000 [00:08<00:00, 222.25/s]
 92%|#########1| 918/1000 [00:08<00:00, 221.79/s]
 94%|#########4| 941/1000 [00:08<00:00, 222.59/s]
 96%|#########6| 964/1000 [00:08<00:00, 223.44/s]
 99%|#########8| 987/1000 [00:08<00:00, 223.80/s]
100%|##########| 1000/1000 [00:08<00:00, 111.78/s]

Spatially variable genes with spatial autocorrelation statistics

With Squidpy we can investigate spatial variability of gene expression. This is an example of a function that only supports 2D data. squidpy.gr.spatial_autocorr() conveniently wraps two spatial autocorrelation statistics: Moran’s I and Geary’s C. They provide a score on the degree of spatial variability of gene expression. The statistic as well as the p-value are computed for each gene, and FDR correction is performed. For the purpose of this tutorial, let’s compute the Moran’s I score. The results are stored in adata.uns[‘moranI’] and we can visualize selected genes with squidpy.pl.spatial_scatter().

sq.gr.spatial_autocorr(adata_slice, mode="moran")
adata_slice.uns["moranI"].head()
sq.pl.spatial_scatter(adata_slice, shape=None, color=["Cd24a", "Necab1", "Mlc1"], size=3)
Cd24a, Necab1, Mlc1

Out:

/Users/giovanni.palla/Projects/squidpy_notebooks/.tox/docs/lib/python3.9/site-packages/scanpy/metrics/_gearys_c.py:293: UserWarning: 1 variables were constant, will return nan for these.
  warnings.warn(

Total running time of the script: ( 0 minutes 57.250 seconds)

Estimated memory usage: 204 MB