Compute Sepal score
This example shows how to compute the Sepal score for spatially variable genes identification.
The Sepal score is a method that simulates a diffusion process to quantify spatial structure in tissue. See [Anderson and Lundeberg, 2021] for reference.
See also
See Compute co-occurrence probability and Compute Moran’s I score for other scores to identify spatially variable genes.
See Building spatial neighbors graph for general usage of
squidpy.gr.spatial_neighbors()
.
import squidpy as sq
adata = sq.datasets.visium_hne_adata()
adata
Out:
AnnData object with n_obs × n_vars = 2688 × 18078
obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts', 'leiden', 'cluster'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'spatial', 'umap'
obsm: 'X_pca', 'X_umap', 'spatial'
varm: 'PCs'
obsp: 'connectivities', 'distances'
We can compute the Sepal score with squidpy.gr.sepal()
.
there are 2 important aspects to consider when computing sepal:
The function only accepts grid-like spatial graphs. Make sure to specify the maximum number of neighbors in your data (6 for an hexagonal grid like Visium) with
max_neighs = 6
.It is useful to filter out genes that are expressed in very few observations and might be wrongly identified as being spatially variable. If you are performing pre-processing with Scanpy, there is a convenient function that can be used BEFORE normalization
scanpy.pp.calculate_qc_metrics()
. It computes several useful summary statistics on both observation and feature axis. We will be using then_cells
columns in adata.var to filter out genes that are expressed in less than 100 observations.
Before computing the Sepal score, we first need to compute a spatial graph with squidpy.gr.spatial_neighbors()
.
We will also subset the number of genes to evaluate for efficiency purposes.
sq.gr.spatial_neighbors(adata)
genes = adata.var_names[(adata.var.n_cells > 100) & adata.var.highly_variable][0:100]
sq.gr.sepal(adata, max_neighs=6, genes=genes, n_jobs=1)
adata.uns["sepal_score"].head(10)
Out:
0%| | 0/100 [00:00<?, ?/s]
1%|1 | 1/100 [00:07<11:46, 7.14s/]
2%|2 | 2/100 [00:08<05:39, 3.47s/]
3%|3 | 3/100 [00:08<03:18, 2.04s/]
4%|4 | 4/100 [00:08<02:17, 1.43s/]
5%|5 | 5/100 [00:09<01:39, 1.05s/]
6%|6 | 6/100 [00:09<01:27, 1.08/s]
7%|7 | 7/100 [00:10<01:13, 1.27/s]
8%|8 | 8/100 [00:10<00:56, 1.62/s]
9%|9 | 9/100 [00:11<00:52, 1.75/s]
10%|# | 10/100 [00:11<00:45, 1.96/s]
11%|#1 | 11/100 [00:11<00:36, 2.46/s]
12%|#2 | 12/100 [00:12<00:37, 2.34/s]
13%|#3 | 13/100 [00:12<00:34, 2.56/s]
14%|#4 | 14/100 [00:12<00:34, 2.48/s]
15%|#5 | 15/100 [00:13<00:28, 2.98/s]
16%|#6 | 16/100 [00:13<00:29, 2.85/s]
17%|#7 | 17/100 [00:13<00:30, 2.75/s]
18%|#8 | 18/100 [00:14<00:28, 2.87/s]
19%|#9 | 19/100 [00:14<00:36, 2.24/s]
20%|## | 20/100 [00:15<00:50, 1.60/s]
21%|##1 | 21/100 [00:16<00:38, 2.04/s]
22%|##2 | 22/100 [00:16<00:43, 1.80/s]
23%|##3 | 23/100 [00:19<01:31, 1.19s/]
24%|##4 | 24/100 [00:20<01:15, 1.00/s]
25%|##5 | 25/100 [00:20<01:05, 1.15/s]
26%|##6 | 26/100 [00:20<00:49, 1.49/s]
27%|##7 | 27/100 [00:21<00:39, 1.85/s]
28%|##8 | 28/100 [00:21<00:31, 2.32/s]
29%|##9 | 29/100 [00:21<00:31, 2.28/s]
30%|### | 30/100 [00:22<00:33, 2.11/s]
31%|###1 | 31/100 [00:22<00:34, 2.02/s]
32%|###2 | 32/100 [00:23<00:31, 2.13/s]
33%|###3 | 33/100 [00:23<00:27, 2.43/s]
34%|###4 | 34/100 [00:24<00:34, 1.91/s]
35%|###5 | 35/100 [00:24<00:30, 2.15/s]
36%|###6 | 36/100 [00:25<00:29, 2.15/s]
37%|###7 | 37/100 [00:25<00:28, 2.25/s]
38%|###8 | 38/100 [00:27<00:48, 1.27/s]
39%|###9 | 39/100 [00:27<00:44, 1.36/s]
40%|#### | 40/100 [00:27<00:33, 1.78/s]
41%|####1 | 41/100 [00:28<00:28, 2.08/s]
42%|####2 | 42/100 [00:28<00:27, 2.13/s]
43%|####3 | 43/100 [00:28<00:23, 2.45/s]
44%|####4 | 44/100 [00:28<00:18, 3.06/s]
45%|####5 | 45/100 [00:29<00:21, 2.53/s]
46%|####6 | 46/100 [00:30<00:28, 1.92/s]
47%|####6 | 47/100 [00:30<00:28, 1.84/s]
48%|####8 | 48/100 [00:32<00:41, 1.25/s]
49%|####9 | 49/100 [00:32<00:32, 1.56/s]
50%|##### | 50/100 [00:33<00:35, 1.41/s]
51%|#####1 | 51/100 [00:33<00:25, 1.88/s]
52%|#####2 | 52/100 [00:33<00:20, 2.39/s]
53%|#####3 | 53/100 [00:33<00:16, 2.92/s]
54%|#####4 | 54/100 [00:34<00:23, 1.92/s]
55%|#####5 | 55/100 [00:35<00:20, 2.21/s]
56%|#####6 | 56/100 [00:35<00:23, 1.86/s]
57%|#####6 | 57/100 [00:36<00:19, 2.18/s]
58%|#####8 | 58/100 [00:36<00:18, 2.33/s]
59%|#####8 | 59/100 [00:36<00:15, 2.61/s]
60%|###### | 60/100 [00:37<00:21, 1.88/s]
61%|######1 | 61/100 [00:37<00:17, 2.25/s]
62%|######2 | 62/100 [00:38<00:21, 1.80/s]
63%|######3 | 63/100 [00:39<00:18, 1.99/s]
64%|######4 | 64/100 [00:39<00:16, 2.18/s]
65%|######5 | 65/100 [00:39<00:13, 2.53/s]
66%|######6 | 66/100 [00:40<00:16, 2.01/s]
68%|######8 | 68/100 [00:40<00:12, 2.56/s]
69%|######9 | 69/100 [00:41<00:12, 2.46/s]
70%|####### | 70/100 [00:43<00:24, 1.25/s]
71%|#######1 | 71/100 [00:43<00:22, 1.30/s]
73%|#######3 | 73/100 [00:44<00:13, 2.00/s]
74%|#######4 | 74/100 [00:44<00:11, 2.20/s]
75%|#######5 | 75/100 [00:45<00:11, 2.23/s]
77%|#######7 | 77/100 [00:45<00:08, 2.87/s]
78%|#######8 | 78/100 [00:45<00:08, 2.75/s]
79%|#######9 | 79/100 [00:46<00:07, 2.87/s]
80%|######## | 80/100 [00:46<00:09, 2.21/s]
81%|########1 | 81/100 [00:47<00:07, 2.69/s]
82%|########2 | 82/100 [00:47<00:06, 2.61/s]
83%|########2 | 83/100 [00:47<00:05, 2.98/s]
84%|########4 | 84/100 [00:48<00:06, 2.42/s]
85%|########5 | 85/100 [00:49<00:08, 1.70/s]
86%|########6 | 86/100 [00:49<00:08, 1.75/s]
87%|########7 | 87/100 [00:50<00:06, 2.09/s]
88%|########8 | 88/100 [00:50<00:06, 1.86/s]
90%|######### | 90/100 [00:51<00:05, 1.77/s]
91%|#########1| 91/100 [00:52<00:04, 2.03/s]
92%|#########2| 92/100 [00:52<00:03, 2.40/s]
93%|#########3| 93/100 [00:52<00:02, 2.88/s]
94%|#########3| 94/100 [00:52<00:02, 2.82/s]
95%|#########5| 95/100 [00:53<00:01, 2.72/s]
96%|#########6| 96/100 [00:56<00:04, 1.05s/]
97%|#########7| 97/100 [00:56<00:02, 1.14/s]
98%|#########8| 98/100 [00:57<00:01, 1.28/s]
99%|#########9| 99/100 [00:57<00:00, 1.54/s]
100%|##########| 100/100 [00:57<00:00, 1.74/s]
100%|##########| 100/100 [00:57<00:00, 1.73/s]
We can visualize some of those genes with squidpy.pl.spatial_scatter()
.
sq.pl.spatial_scatter(adata, color=["Lct", "Ecel1", "Cfap65"])

Total running time of the script: ( 1 minutes 12.066 seconds)
Estimated memory usage: 475 MB