Launch binder

Compute Sepal score

This example shows how to compute the Sepal score for spatially variable genes identification.

The Sepal score is a method that simulates a diffusion process to quantify spatial structure in tissue. See [Anderson and Lundeberg, 2021] for reference.

See also

import scanpy as sc
import squidpy as sq

adata = sq.datasets.visium_hne_adata()
adata

Out:

  0%|          | 0.00/314M [00:00<?, ?B/s]
  0%|          | 56.0k/314M [00:00<13:34, 404kB/s]
  0%|          | 176k/314M [00:00<08:09, 672kB/s]
  0%|          | 600k/314M [00:00<03:07, 1.75MB/s]
  1%|          | 2.30M/314M [00:00<00:54, 5.98MB/s]
  2%|2         | 7.41M/314M [00:00<00:18, 17.3MB/s]
  4%|4         | 13.3M/314M [00:00<00:12, 26.0MB/s]
  6%|6         | 19.2M/314M [00:01<00:09, 31.6MB/s]
  8%|7         | 25.1M/314M [00:01<00:08, 35.2MB/s]
 10%|9         | 30.8M/314M [00:01<00:07, 37.3MB/s]
 12%|#1        | 36.7M/314M [00:01<00:07, 39.1MB/s]
 14%|#3        | 42.6M/314M [00:01<00:07, 40.5MB/s]
 15%|#5        | 48.4M/314M [00:01<00:06, 41.2MB/s]
 17%|#7        | 54.2M/314M [00:01<00:06, 41.5MB/s]
 19%|#9        | 59.9M/314M [00:02<00:06, 41.4MB/s]
 21%|##        | 65.7M/314M [00:02<00:06, 41.7MB/s]
 23%|##2       | 71.5M/314M [00:02<00:06, 42.0MB/s]
 25%|##4       | 77.3M/314M [00:02<00:05, 42.1MB/s]
 26%|##6       | 83.1M/314M [00:02<00:05, 42.3MB/s]
 28%|##8       | 89.0M/314M [00:02<00:05, 42.4MB/s]
 30%|###       | 94.6M/314M [00:02<00:05, 41.8MB/s]
 32%|###1      | 100M/314M [00:03<00:05, 41.6MB/s]
 34%|###3      | 106M/314M [00:03<00:05, 41.7MB/s]
 35%|###5      | 111M/314M [00:03<00:05, 41.5MB/s]
 37%|###7      | 117M/314M [00:03<00:04, 41.7MB/s]
 39%|###9      | 123M/314M [00:03<00:04, 41.8MB/s]
 41%|####      | 129M/314M [00:03<00:04, 41.6MB/s]
 43%|####2     | 134M/314M [00:03<00:04, 41.6MB/s]
 45%|####4     | 140M/314M [00:04<00:04, 41.9MB/s]
 46%|####6     | 146M/314M [00:04<00:04, 41.9MB/s]
 48%|####8     | 152M/314M [00:04<00:04, 42.3MB/s]
 50%|#####     | 157M/314M [00:04<00:03, 41.9MB/s]
 52%|#####1    | 163M/314M [00:04<00:03, 42.1MB/s]
 54%|#####3    | 169M/314M [00:04<00:03, 41.4MB/s]
 56%|#####5    | 174M/314M [00:04<00:03, 41.9MB/s]
 57%|#####7    | 180M/314M [00:05<00:03, 42.2MB/s]
 59%|#####9    | 186M/314M [00:05<00:03, 42.4MB/s]
 61%|######1   | 192M/314M [00:05<00:03, 42.3MB/s]
 63%|######2   | 198M/314M [00:05<00:02, 42.1MB/s]
 65%|######4   | 203M/314M [00:05<00:02, 42.4MB/s]
 67%|######6   | 209M/314M [00:05<00:02, 42.2MB/s]
 68%|######8   | 215M/314M [00:05<00:02, 42.0MB/s]
 70%|#######   | 220M/314M [00:06<00:02, 41.4MB/s]
 72%|#######1  | 226M/314M [00:06<00:02, 41.7MB/s]
 74%|#######3  | 232M/314M [00:06<00:02, 41.9MB/s]
 76%|#######5  | 238M/314M [00:06<00:01, 42.3MB/s]
 77%|#######7  | 243M/314M [00:06<00:01, 41.8MB/s]
 79%|#######9  | 249M/314M [00:06<00:01, 42.0MB/s]
 81%|########1 | 255M/314M [00:06<00:01, 42.0MB/s]
 83%|########2 | 260M/314M [00:07<00:01, 41.7MB/s]
 85%|########4 | 266M/314M [00:07<00:01, 41.6MB/s]
 87%|########6 | 272M/314M [00:07<00:01, 41.9MB/s]
 88%|########8 | 278M/314M [00:07<00:00, 41.9MB/s]
 90%|######### | 283M/314M [00:07<00:00, 41.7MB/s]
 92%|#########2| 289M/314M [00:07<00:00, 41.9MB/s]
 94%|#########3| 295M/314M [00:07<00:00, 41.8MB/s]
 96%|#########5| 300M/314M [00:08<00:00, 41.1MB/s]
 97%|#########7| 306M/314M [00:08<00:00, 41.2MB/s]
 99%|#########9| 312M/314M [00:08<00:00, 41.3MB/s]
100%|##########| 314M/314M [00:08<00:00, 39.5MB/s]

AnnData object with n_obs × n_vars = 2688 × 18078
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts', 'leiden', 'cluster'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'spatial', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'

We can compute the Sepal score with squidpy.gr.sepal(). there are 2 important aspects to consider when computing sepal:

  • The function only accepts grid-like spatial graphs. Make sure to specify the maximum number of neighbors in your data (6 for an hexagonal grid like Visium) with max_neighs = 6.

  • It is useful to filter out genes that are expressed in very few observations and might be wrongly identified as being spatially variable. If you are performing pre-processing with Scanpy, there is a convenient function that can be used BEFORE normalization scanpy.pp.calculate_qc_metrics(). It computes several useful summary statistics on both observation and feature axis. We will be using the n_cells columns in adata.var to filter out genes that are expressed in less than 100 observations.

Before computing the Sepal score, we first need to compute a spatial graph with squidpy.gr.spatial_neighbors(). We will also subset the number of genes to evaluate for efficiency purposes.

sq.gr.spatial_neighbors(adata)
genes = adata.var_names[(adata.var.n_cells > 100) & adata.var.highly_variable][0:100]
sq.gr.sepal(adata, max_neighs=6, genes=genes, n_jobs=1)
adata.uns["sepal_score"].head(10)

Out:

  0%|          | 0/100 [00:00<?, ?/s]
  1%|1         | 1/100 [00:03<05:44,  3.48s/]
  3%|3         | 3/100 [00:03<01:36,  1.01/s]
  4%|4         | 4/100 [00:04<01:14,  1.30/s]
  5%|5         | 5/100 [00:04<00:57,  1.65/s]
  6%|6         | 6/100 [00:04<00:53,  1.77/s]
  7%|7         | 7/100 [00:05<00:45,  2.03/s]
  8%|8         | 8/100 [00:05<00:36,  2.53/s]
  9%|9         | 9/100 [00:05<00:34,  2.64/s]
 10%|#         | 10/100 [00:05<00:31,  2.89/s]
 11%|#1        | 11/100 [00:06<00:24,  3.60/s]
 12%|#2        | 12/100 [00:06<00:25,  3.42/s]
 13%|#3        | 13/100 [00:06<00:22,  3.83/s]
 14%|#4        | 14/100 [00:06<00:23,  3.70/s]
 15%|#5        | 15/100 [00:06<00:19,  4.46/s]
 16%|#6        | 16/100 [00:07<00:18,  4.45/s]
 17%|#7        | 17/100 [00:07<00:18,  4.38/s]
 18%|#8        | 18/100 [00:07<00:18,  4.47/s]
 19%|#9        | 19/100 [00:08<00:24,  3.32/s]
 20%|##        | 20/100 [00:08<00:34,  2.31/s]
 21%|##1       | 21/100 [00:09<00:26,  2.95/s]
 22%|##2       | 22/100 [00:09<00:29,  2.61/s]
 23%|##3       | 23/100 [00:11<01:01,  1.25/s]
 24%|##4       | 24/100 [00:11<00:50,  1.51/s]
 25%|##5       | 25/100 [00:12<00:43,  1.72/s]
 26%|##6       | 26/100 [00:12<00:33,  2.20/s]
 27%|##7       | 27/100 [00:12<00:26,  2.71/s]
 28%|##8       | 28/100 [00:12<00:21,  3.36/s]
 29%|##9       | 29/100 [00:12<00:21,  3.24/s]
 30%|###       | 30/100 [00:13<00:23,  3.01/s]
 31%|###1      | 31/100 [00:13<00:24,  2.83/s]
 32%|###2      | 32/100 [00:13<00:22,  3.00/s]
 33%|###3      | 33/100 [00:14<00:19,  3.40/s]
 34%|###4      | 34/100 [00:14<00:24,  2.65/s]
 35%|###5      | 35/100 [00:14<00:21,  2.98/s]
 36%|###6      | 36/100 [00:15<00:21,  2.97/s]
 37%|###7      | 37/100 [00:15<00:20,  3.09/s]
 38%|###8      | 38/100 [00:16<00:35,  1.73/s]
 39%|###9      | 39/100 [00:17<00:32,  1.86/s]
 40%|####      | 40/100 [00:17<00:24,  2.43/s]
 41%|####1     | 41/100 [00:17<00:20,  2.84/s]
 42%|####2     | 42/100 [00:17<00:19,  2.91/s]
 43%|####3     | 43/100 [00:17<00:17,  3.34/s]
 44%|####4     | 44/100 [00:18<00:13,  4.17/s]
 45%|####5     | 45/100 [00:18<00:15,  3.46/s]
 46%|####6     | 46/100 [00:19<00:20,  2.63/s]
 47%|####6     | 47/100 [00:19<00:21,  2.52/s]
 48%|####8     | 48/100 [00:20<00:30,  1.71/s]
 49%|####9     | 49/100 [00:20<00:24,  2.12/s]
 50%|#####     | 50/100 [00:21<00:24,  2.00/s]
 52%|#####2    | 52/100 [00:21<00:14,  3.24/s]
 53%|#####3    | 53/100 [00:21<00:12,  3.82/s]
 54%|#####4    | 54/100 [00:22<00:16,  2.81/s]
 55%|#####5    | 55/100 [00:22<00:14,  3.20/s]
 56%|#####6    | 56/100 [00:22<00:15,  2.78/s]
 57%|#####6    | 57/100 [00:23<00:13,  3.30/s]
 58%|#####8    | 58/100 [00:23<00:11,  3.59/s]
 59%|#####8    | 59/100 [00:23<00:10,  4.07/s]
 60%|######    | 60/100 [00:23<00:13,  3.03/s]
 61%|######1   | 61/100 [00:24<00:10,  3.64/s]
 62%|######2   | 62/100 [00:24<00:12,  3.12/s]
 63%|######3   | 63/100 [00:24<00:11,  3.35/s]
 64%|######4   | 64/100 [00:25<00:09,  3.68/s]
 65%|######5   | 65/100 [00:25<00:08,  4.34/s]
 66%|######6   | 66/100 [00:25<00:09,  3.45/s]
 68%|######8   | 68/100 [00:25<00:07,  4.26/s]
 69%|######9   | 69/100 [00:26<00:07,  4.10/s]
 70%|#######   | 70/100 [00:27<00:15,  1.91/s]
 71%|#######1  | 71/100 [00:28<00:14,  1.94/s]
 73%|#######3  | 73/100 [00:28<00:08,  3.05/s]
 74%|#######4  | 74/100 [00:28<00:07,  3.37/s]
 75%|#######5  | 75/100 [00:28<00:07,  3.52/s]
 77%|#######7  | 77/100 [00:28<00:05,  4.35/s]
 78%|#######8  | 78/100 [00:29<00:05,  4.08/s]
 79%|#######9  | 79/100 [00:29<00:04,  4.36/s]
 80%|########  | 80/100 [00:29<00:05,  3.50/s]
 81%|########1 | 81/100 [00:29<00:04,  4.18/s]
 82%|########2 | 82/100 [00:30<00:04,  3.89/s]
 83%|########2 | 83/100 [00:30<00:03,  4.36/s]
 84%|########4 | 84/100 [00:30<00:04,  3.44/s]
 85%|########5 | 85/100 [00:31<00:06,  2.40/s]
 86%|########6 | 86/100 [00:31<00:05,  2.46/s]
 87%|########7 | 87/100 [00:32<00:04,  2.94/s]
 88%|########8 | 88/100 [00:32<00:04,  2.61/s]
 90%|######### | 90/100 [00:33<00:03,  2.51/s]
 91%|#########1| 91/100 [00:33<00:03,  2.86/s]
 92%|#########2| 92/100 [00:33<00:02,  3.35/s]
 93%|#########3| 93/100 [00:33<00:01,  4.00/s]
 94%|#########3| 94/100 [00:34<00:01,  3.93/s]
 95%|#########5| 95/100 [00:34<00:01,  3.79/s]
 96%|#########6| 96/100 [00:36<00:03,  1.32/s]
 97%|#########7| 97/100 [00:36<00:01,  1.59/s]
 98%|#########8| 98/100 [00:37<00:01,  1.78/s]
 99%|#########9| 99/100 [00:37<00:00,  2.14/s]
100%|##########| 100/100 [00:37<00:00,  2.43/s]
100%|##########| 100/100 [00:37<00:00,  2.65/s]
sepal_score
Lct 7.868
1500015O10Rik 7.085
Ecel1 5.274
Fzd5 4.694
Cfap65 4.095
C1ql2 3.144
Slc9a2 2.947
Gm17634 2.904
St18 2.568
Des 2.494


We can visualize some of those genes with scanpy.pl.spatial().

sc.pl.spatial(adata, color=["Lct", "Ecel1", "Cfap65"])
Lct, Ecel1, Cfap65

Total running time of the script: ( 1 minutes 0.315 seconds)

Estimated memory usage: 395 MB