%matplotlib inline

Analyze Slide-seqV2 data

This tutorial shows how to apply Squidpy for the analysis of Slide-seqV2 data.

The data used here was obtained from [Stickels et al., 2020]. We provide a pre-processed subset of the data, in anndata.AnnData format. We would like to thank @tudaga for providing cell-type level annotation. For details on how it was pre-processed, please refer to the original paper.

Import packages & data

To run the notebook locally, create a conda environment as conda env create -f environment.yml using this environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>_.

import squidpy as sq


# load the pre-processed dataset
adata = sq.datasets.slideseqv2()
100%|██████████| 251M/251M [00:36<00:00, 7.15MB/s] 
AnnData object with n_obs × n_vars = 41786 × 4000
    obs: 'barcode', 'x', 'y', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster'
    var: 'MT', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
    obsm: 'X_pca', 'X_umap', 'deconvolution_results', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

First, let’s visualize cluster annotation in spatial context with squidpy.pl.spatial_scatter().

sq.pl.spatial_scatter(adata, color="cluster", size=1, shape=None)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

Neighborhood enrichment analysis

Similar to other spatial data, we can investigate spatial organization of clusters in a quantitative way, by computing a neighborhood enrichment score. You can compute such score with the following function: squidpy.gr.nhood_enrichment(). In short, it’s an enrichment score on spatial proximity of clusters: if spots belonging to two different clusters are often close to each other, then they will have a high score and can be defined as being enriched. On the other hand, if they are far apart, the score will be low and they can be defined as depleted. This score is based on a permutation-based test, and you can set the number of permutations with the n_perms argument (default is 1000).

Since the function works on a connectivity matrix, we need to compute that as well. This can be done with squidpy.gr.spatial_neighbors(). Please see Building spatial neighbors graph and Neighbors enrichment analysis for more details of how these functions works.

Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment(). We’ll add a dendrogram to the heatmap computed with linkage method ward.

sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
    adata, cluster_key="cluster", method="single", cmap="inferno", vmin=-50, vmax=100
100%|██████████| 1000/1000 [00:09<00:00, 110.64/s]

Interestingly, there seems to be an enrichment between the Endothelial_Tip, the Ependymal cells. Another putative enrichment is between the Oligodendrocytes and Polydendrocytes cells. We can visualize the spatial organization of such clusters. For this, we’ll use squidpy.pl.spatial_scatter() again.

    groups=["Endothelial_Tip", "Ependymal", "Oligodendrocytes", "Polydendrocytes"],
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

Ripley’s statistics

In addition to the neighbor enrichment score, we can further investigate spatial organization of cell types in tissue by means of the Ripley’s statistics. Ripley’s statistics allow analyst to evaluate whether a discrete annotation (e.g. cell-type) appears to be clustered, dispersed or randomly distributed on the area of interest. In Squidpy, we implement three closely related Ripley’s statistics, that can be easily computed with squidpy.gr.ripley(). Here, we’ll showcase the Ripley’s L statistic, which is a variance-stabilized version of the Ripley’s K statistics. We’ll visualize the results with squidpy.pl.ripley(). Check Compute Ripley’s statistics for more details.

mode = "L"
sq.gr.ripley(adata, cluster_key="cluster", mode=mode, max_dist=500)
sq.pl.ripley(adata, cluster_key="cluster", mode=mode)

The plot highlight how some cell-types have a more clustered pattern, like Astrocytes and CA11_CA2_CA3_Subiculum cells, whereas other have a more dispersed pattern, like Mural cells. To confirm such interpretation, we can selectively visualize again their spatial organization.

    groups=["Mural", "CA1_CA2_CA3_Subiculum"],
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`

Ligand-receptor interaction analysis

The analysis showed above has provided us with quantitative information on cellular organization and communication at the tissue level. We might be interested in getting a list of potential candidates that might be driving such cellular communication. This naturally translates in doing a ligand-receptor interaction analysis. In Squidpy, we provide a fast re-implementation the popular method CellPhoneDB [Efremova et al., 2020] (code <https://github.com/Teichlab/cellphonedb>_ ) and extended its database of annotated ligand-receptor interaction pairs with the popular database Omnipath [Türei et al., 2016]. You can run the analysis for all clusters pairs, and all genes (in seconds, without leaving this notebook), with squidpy.gr.ligrec().

Let’s perform the analysis and visualize the result for three clusters of interest: Polydendrocytes and Oligodendrocytes. For the visualization, we will filter out annotations with low-expressed genes (with the means_range argument) and decreasing the threshold for the adjusted p-value (with the alpha argument) Check Receptor-ligand analysis for more details.

    clusters=["Polydendrocytes", "Oligodendrocytes"],
100%|██████████| 100/100 [00:07<00:00, 13.48permutation/s]

The dotplot visualization provides an interesting set of candidate interactions that could be involved in the tissue organization of the cell types of interest. It should be noted that this method is a pure re-implementation of the original permutation-based test, and therefore retains all its caveats and should be interpreted accordingly.

Spatially variable genes with spatial autocorrelation statistics

Lastly, with Squidpy we can investigate spatial variability of gene expression. squidpy.gr.spatial_autocorr() conveniently wraps two spatial autocorrelation statistics: Moran’s I and Geary’s C*. They provide a score on the degree of spatial variability of gene expression. The statistic as well as the p-value are computed for each gene, and FDR correction is performed. For the purpose of this tutorial, let’s compute the Moran’s I score. See Compute Moran’s I score for more details.

sq.gr.spatial_autocorr(adata, mode="moran")
I pval_norm var_norm pval_norm_fdr_bh
Ttr 0.703289 0.0 0.000008 0.0
Plp1 0.531680 0.0 0.000008 0.0
Mbp 0.495970 0.0 0.000008 0.0
Hpca 0.490302 0.0 0.000008 0.0
Enpp2 0.455090 0.0 0.000008 0.0
1500015O10Rik 0.453225 0.0 0.000008 0.0
Pcp4 0.428500 0.0 0.000008 0.0
Sst 0.398053 0.0 0.000008 0.0
Ptgds 0.385718 0.0 0.000008 0.0
Nrgn 0.368533 0.0 0.000008 0.0

The results are stored in adata.uns["moranI"] and we can visualize selected genes with squidpy.pl.spatial_scatter().

    color=["Ttr", "Plp1", "Mbp", "Hpca", "Enpp2"],
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`