%matplotlib inline

Analyze Slide-seqV2 data

This tutorial shows how to apply Squidpy for the analysis of Slide-seqV2 data.

The data used here was obtained from [Stickels et al., 2020]. We provide a pre-processed subset of the data, in anndata.AnnData format. We would like to thank @tudaga for providing cell-type level annotation. For details on how it was pre-processed, please refer to the original paper.

Import packages & data

To run the notebook locally, create a conda environment as conda env create -f environment.yml using this environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>_.

import squidpy as sq

print(f"squidpy=={sq.__version__}")

# load the pre-processed dataset
adata = sq.datasets.slideseqv2()
adata

squidpy==1.2.3

100%|██████████| 251M/251M [00:36<00:00, 7.15MB/s]

AnnData object with n_obs × n_vars = 41786 × 4000
    obs: 'barcode', 'x', 'y', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster'
    var: 'MT', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'deconvolution_results', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

First, let’s visualize cluster annotation in spatial context with squidpy.pl.spatial_scatter().

sq.pl.spatial_scatter(adata, color="cluster", size=1, shape=None)

WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

../../_images/d82a33ea32c400fadee2abe758fcced9d39500d260e2f2f21787ab3c34f92a4b.png

Neighborhood enrichment analysis

Similar to other spatial data, we can investigate spatial organization of clusters in a quantitative way, by computing a neighborhood enrichment score. You can compute such score with the following function: squidpy.gr.nhood_enrichment(). In short, it’s an enrichment score on spatial proximity of clusters: if spots belonging to two different clusters are often close to each other, then they will have a high score and can be defined as being enriched. On the other hand, if they are far apart, the score will be low and they can be defined as depleted. This score is based on a permutation-based test, and you can set the number of permutations with the n_perms argument (default is 1000).

Since the function works on a connectivity matrix, we need to compute that as well. This can be done with squidpy.gr.spatial_neighbors(). Please see Building spatial neighbors graph and Neighbors enrichment analysis for more details of how these functions works.

Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment(). We’ll add a dendrogram to the heatmap computed with linkage method ward.

sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(
    adata, cluster_key="cluster", method="single", cmap="inferno", vmin=-50, vmax=100
)

100%|██████████| 1000/1000 [00:09<00:00, 110.64/s]

../../_images/765da6f1c502325e3e95e7afb2cc95962a95752074d038a33cbfa5498f2f1ce7.png

Interestingly, there seems to be an enrichment between the Endothelial_Tip, the Ependymal cells. Another putative enrichment is between the Oligodendrocytes and Polydendrocytes cells. We can visualize the spatial organization of such clusters. For this, we’ll use squidpy.pl.spatial_scatter() again.

sq.pl.spatial_scatter(
    adata,
    shape=None,
    color="cluster",
    groups=["Endothelial_Tip", "Ependymal", "Oligodendrocytes", "Polydendrocytes"],
    size=3,
)

WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

../../_images/c6daccff008671f14a0e0ab147bcbe4d60d002f79b2c559260040b252cdc1293.png

Ripley’s statistics

In addition to the neighbor enrichment score, we can further investigate spatial organization of cell types in tissue by means of the Ripley’s statistics. Ripley’s statistics allow analyst to evaluate whether a discrete annotation (e.g. cell-type) appears to be clustered, dispersed or randomly distributed on the area of interest. In Squidpy, we implement three closely related Ripley’s statistics, that can be easily computed with squidpy.gr.ripley(). Here, we’ll showcase the Ripley’s L statistic, which is a variance-stabilized version of the Ripley’s K statistics. We’ll visualize the results with squidpy.pl.ripley(). Check Compute Ripley’s statistics for more details.

mode = "L"
sq.gr.ripley(adata, cluster_key="cluster", mode=mode, max_dist=500)
sq.pl.ripley(adata, cluster_key="cluster", mode=mode)

../../_images/2fe08b241d6aace18f3bf740340006a341a2fe8bbb359cefb8d3f9ccc7603176.png

The plot highlight how some cell-types have a more clustered pattern, like Astrocytes and CA11_CA2_CA3_Subiculum cells, whereas other have a more dispersed pattern, like Mural cells. To confirm such interpretation, we can selectively visualize again their spatial organization.

sq.pl.spatial_scatter(
    adata,
    color="cluster",
    groups=["Mural", "CA1_CA2_CA3_Subiculum"],
    size=3,
    shape=None,
)

WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

../../_images/0dd7573890506cebab959a4b3956e6c3a55a239d76bf90803ac751a2c3cd334d.png

Ligand-receptor interaction analysis

The analysis showed above has provided us with quantitative information on cellular organization and communication at the tissue level. We might be interested in getting a list of potential candidates that might be driving such cellular communication. This naturally translates in doing a ligand-receptor interaction analysis. In Squidpy, we provide a fast re-implementation the popular method CellPhoneDB [Efremova et al., 2020] (code <https://github.com/Teichlab/cellphonedb>_ ) and extended its database of annotated ligand-receptor interaction pairs with the popular database Omnipath [Türei et al., 2016]. You can run the analysis for all clusters pairs, and all genes (in seconds, without leaving this notebook), with squidpy.gr.ligrec().

Let’s perform the analysis and visualize the result for three clusters of interest: Polydendrocytes and Oligodendrocytes. For the visualization, we will filter out annotations with low-expressed genes (with the means_range argument) and decreasing the threshold for the adjusted p-value (with the alpha argument) Check Receptor-ligand analysis for more details.

sq.gr.ligrec(
    adata,
    n_perms=100,
    cluster_key="cluster",
    clusters=["Polydendrocytes", "Oligodendrocytes"],
)
sq.pl.ligrec(
    adata,
    cluster_key="cluster",
    source_groups="Oligodendrocytes",
    target_groups=["Polydendrocytes"],
    pvalue_threshold=0.05,
    swap_axes=True,
)

100%|██████████| 100/100 [00:07<00:00, 13.48permutation/s]

../../_images/16327fa0e737bf7e34d99bb501dc6fc503fc670908553e05dce6e1e3355c9bf7.png

The dotplot visualization provides an interesting set of candidate interactions that could be involved in the tissue organization of the cell types of interest. It should be noted that this method is a pure re-implementation of the original permutation-based test, and therefore retains all its caveats and should be interpreted accordingly.

Spatially variable genes with spatial autocorrelation statistics

Lastly, with Squidpy we can investigate spatial variability of gene expression. squidpy.gr.spatial_autocorr() conveniently wraps two spatial autocorrelation statistics: Moran’s I and Geary’s C*. They provide a score on the degree of spatial variability of gene expression. The statistic as well as the p-value are computed for each gene, and FDR correction is performed. For the purpose of this tutorial, let’s compute the Moran’s I score. See Compute Moran’s I score for more details.

sq.gr.spatial_autocorr(adata, mode="moran")
adata.uns["moranI"].head(10)

	I	var_norm
Ttr	0.703289	0.000008
Plp1	0.531680	0.000008
Mbp	0.495970	0.000008
Hpca	0.490302	0.000008
Enpp2	0.455090	0.000008
1500015O10Rik	0.453225	0.000008
Pcp4	0.428500	0.000008
Sst	0.398053	0.000008
Ptgds	0.385718	0.000008
Nrgn	0.368533	0.000008

The results are stored in adata.uns["moranI"] and we can visualize selected genes with squidpy.pl.spatial_scatter().

sq.pl.spatial_scatter(
    adata,
    shape=None,
    color=["Ttr", "Plp1", "Mbp", "Hpca", "Enpp2"],
    size=0.1,
)

WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

../../_images/e35714a6f9ecb045aa17a619849b5e22cd066c1254672db2aadff9bb0ecacfee.png