%matplotlib inline
Analyze Slide-seqV2 data
This tutorial shows how to apply Squidpy for the analysis of Slide-seqV2 data.
The data used here was obtained from [Stickels et al., 2020].
We provide a pre-processed subset of the data, in anndata.AnnData
format.
We would like to thank @tudaga for providing cell-type level annotation.
For details on how it was pre-processed, please refer to the original paper.
Import packages & data
To run the notebook locally, create a conda environment as conda env create -f environment.yml using this
environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>
_.
import squidpy as sq
print(f"squidpy=={sq.__version__}")
# load the pre-processed dataset
adata = sq.datasets.slideseqv2()
adata
squidpy==1.2.3
100%|██████████| 251M/251M [00:36<00:00, 7.15MB/s]
AnnData object with n_obs × n_vars = 41786 × 4000
obs: 'barcode', 'x', 'y', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster'
var: 'MT', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'spatial_neighbors', 'umap'
obsm: 'X_pca', 'X_umap', 'deconvolution_results', 'spatial'
varm: 'PCs'
obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
First, let’s visualize cluster annotation in spatial context
with squidpy.pl.spatial_scatter()
.
sq.pl.spatial_scatter(adata, color="cluster", size=1, shape=None)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
Neighborhood enrichment analysis
Similar to other spatial data, we can investigate spatial organization of clusters
in a quantitative way, by computing a neighborhood enrichment score.
You can compute such score with the following function: squidpy.gr.nhood_enrichment()
.
In short, it’s an enrichment score on spatial proximity of clusters:
if spots belonging to two different clusters are often close to each other,
then they will have a high score and can be defined as being enriched.
On the other hand, if they are far apart, the score will be low
and they can be defined as depleted.
This score is based on a permutation-based test, and you can set
the number of permutations with the n_perms
argument (default is 1000).
Since the function works on a connectivity matrix, we need to compute that as well.
This can be done with squidpy.gr.spatial_neighbors()
.
Please see Building spatial neighbors graph and
Neighbors enrichment analysis for more details
of how these functions works.
Finally, we’ll directly visualize the results with squidpy.pl.nhood_enrichment()
.
We’ll add a dendrogram to the heatmap computed with linkage method ward.
sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(
adata, cluster_key="cluster", method="single", cmap="inferno", vmin=-50, vmax=100
)
100%|██████████| 1000/1000 [00:09<00:00, 110.64/s]
Interestingly, there seems to be an enrichment between the Endothelial_Tip,
the Ependymal cells. Another putative enrichment is between the Oligodendrocytes
and Polydendrocytes cells. We can visualize the spatial organization of such clusters.
For this, we’ll use squidpy.pl.spatial_scatter()
again.
sq.pl.spatial_scatter(
adata,
shape=None,
color="cluster",
groups=["Endothelial_Tip", "Ependymal", "Oligodendrocytes", "Polydendrocytes"],
size=3,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
Ripley’s statistics
In addition to the neighbor enrichment score, we can further investigate spatial
organization of cell types in tissue by means of the Ripley’s statistics.
Ripley’s statistics allow analyst to evaluate whether a discrete annotation (e.g. cell-type)
appears to be clustered, dispersed or randomly distributed on the area of interest.
In Squidpy, we implement three closely related Ripley’s statistics, that can be
easily computed with squidpy.gr.ripley()
. Here, we’ll showcase the Ripley’s L statistic,
which is a variance-stabilized version of the Ripley’s K statistics.
We’ll visualize the results with squidpy.pl.ripley()
.
Check Compute Ripley’s statistics for more details.
mode = "L"
sq.gr.ripley(adata, cluster_key="cluster", mode=mode, max_dist=500)
sq.pl.ripley(adata, cluster_key="cluster", mode=mode)
The plot highlight how some cell-types have a more clustered pattern, like Astrocytes and CA11_CA2_CA3_Subiculum cells, whereas other have a more dispersed pattern, like Mural cells. To confirm such interpretation, we can selectively visualize again their spatial organization.
sq.pl.spatial_scatter(
adata,
color="cluster",
groups=["Mural", "CA1_CA2_CA3_Subiculum"],
size=3,
shape=None,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
Ligand-receptor interaction analysis
The analysis showed above has provided us with quantitative information on
cellular organization and communication at the tissue level.
We might be interested in getting a list of potential candidates that might be driving
such cellular communication.
This naturally translates in doing a ligand-receptor interaction analysis.
In Squidpy, we provide a fast re-implementation the popular method CellPhoneDB [Efremova et al., 2020]
(code <https://github.com/Teichlab/cellphonedb>
_ )
and extended its database of annotated ligand-receptor interaction pairs with
the popular database Omnipath [Türei et al., 2016].
You can run the analysis for all clusters pairs, and all genes (in seconds,
without leaving this notebook), with squidpy.gr.ligrec()
.
Let’s perform the analysis and visualize the result for three clusters of
interest: Polydendrocytes and Oligodendrocytes.
For the visualization, we will filter out annotations
with low-expressed genes (with the means_range
argument)
and decreasing the threshold
for the adjusted p-value (with the alpha
argument)
Check Receptor-ligand analysis for more details.
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key="cluster",
clusters=["Polydendrocytes", "Oligodendrocytes"],
)
sq.pl.ligrec(
adata,
cluster_key="cluster",
source_groups="Oligodendrocytes",
target_groups=["Polydendrocytes"],
pvalue_threshold=0.05,
swap_axes=True,
)
100%|██████████| 100/100 [00:07<00:00, 13.48permutation/s]
The dotplot visualization provides an interesting set of candidate interactions that could be involved in the tissue organization of the cell types of interest. It should be noted that this method is a pure re-implementation of the original permutation-based test, and therefore retains all its caveats and should be interpreted accordingly.
Spatially variable genes with spatial autocorrelation statistics
Lastly, with Squidpy we can investigate spatial variability of gene expression.
squidpy.gr.spatial_autocorr()
conveniently wraps two
spatial autocorrelation statistics: Moran’s I and Geary’s C*.
They provide a score on the degree of spatial variability of gene expression.
The statistic as well as the p-value are computed for each gene, and FDR correction
is performed. For the purpose of this tutorial, let’s compute the Moran’s I score.
See Compute Moran’s I score for more details.
sq.gr.spatial_autocorr(adata, mode="moran")
adata.uns["moranI"].head(10)
I | pval_norm | var_norm | pval_norm_fdr_bh | |
---|---|---|---|---|
Ttr | 0.703289 | 0.0 | 0.000008 | 0.0 |
Plp1 | 0.531680 | 0.0 | 0.000008 | 0.0 |
Mbp | 0.495970 | 0.0 | 0.000008 | 0.0 |
Hpca | 0.490302 | 0.0 | 0.000008 | 0.0 |
Enpp2 | 0.455090 | 0.0 | 0.000008 | 0.0 |
1500015O10Rik | 0.453225 | 0.0 | 0.000008 | 0.0 |
Pcp4 | 0.428500 | 0.0 | 0.000008 | 0.0 |
Sst | 0.398053 | 0.0 | 0.000008 | 0.0 |
Ptgds | 0.385718 | 0.0 | 0.000008 | 0.0 |
Nrgn | 0.368533 | 0.0 | 0.000008 | 0.0 |
The results are stored in adata.uns["moranI"]
and we can visualize selected genes
with squidpy.pl.spatial_scatter()
.
sq.pl.spatial_scatter(
adata,
shape=None,
color=["Ttr", "Plp1", "Mbp", "Hpca", "Enpp2"],
size=0.1,
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`