Launch binder

Compute Sepal score

This example shows how to compute the Sepal score for spatially variable genes identification.

The Sepal score is a method that simulates a diffusion process to quantify spatial structure in tissue. See [Anderson and Lundeberg, 2021] for reference.

See also

import squidpy as sq

adata = sq.datasets.visium_hne_adata()
adata

Out:

AnnData object with n_obs × n_vars = 2688 × 18078
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts', 'leiden', 'cluster'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cluster_colors', 'hvg', 'leiden', 'leiden_colors', 'neighbors', 'pca', 'rank_genes_groups', 'spatial', 'umap'
    obsm: 'X_pca', 'X_umap', 'spatial'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'

We can compute the Sepal score with squidpy.gr.sepal(). there are 2 important aspects to consider when computing sepal:

  • The function only accepts grid-like spatial graphs. Make sure to specify the maximum number of neighbors in your data (6 for an hexagonal grid like Visium) with max_neighs = 6.

  • It is useful to filter out genes that are expressed in very few observations and might be wrongly identified as being spatially variable. If you are performing pre-processing with Scanpy, there is a convenient function that can be used BEFORE normalization scanpy.pp.calculate_qc_metrics(). It computes several useful summary statistics on both observation and feature axis. We will be using the n_cells columns in adata.var to filter out genes that are expressed in less than 100 observations.

Before computing the Sepal score, we first need to compute a spatial graph with squidpy.gr.spatial_neighbors(). We will also subset the number of genes to evaluate for efficiency purposes.

sq.gr.spatial_neighbors(adata)
genes = adata.var_names[(adata.var.n_cells > 100) & adata.var.highly_variable][0:100]
sq.gr.sepal(adata, max_neighs=6, genes=genes, n_jobs=1)
adata.uns["sepal_score"].head(10)

Out:

  0%|          | 0/100 [00:00<?, ?/s]
  1%|1         | 1/100 [00:07<11:46,  7.14s/]
  2%|2         | 2/100 [00:08<05:39,  3.47s/]
  3%|3         | 3/100 [00:08<03:18,  2.04s/]
  4%|4         | 4/100 [00:08<02:17,  1.43s/]
  5%|5         | 5/100 [00:09<01:39,  1.05s/]
  6%|6         | 6/100 [00:09<01:27,  1.08/s]
  7%|7         | 7/100 [00:10<01:13,  1.27/s]
  8%|8         | 8/100 [00:10<00:56,  1.62/s]
  9%|9         | 9/100 [00:11<00:52,  1.75/s]
 10%|#         | 10/100 [00:11<00:45,  1.96/s]
 11%|#1        | 11/100 [00:11<00:36,  2.46/s]
 12%|#2        | 12/100 [00:12<00:37,  2.34/s]
 13%|#3        | 13/100 [00:12<00:34,  2.56/s]
 14%|#4        | 14/100 [00:12<00:34,  2.48/s]
 15%|#5        | 15/100 [00:13<00:28,  2.98/s]
 16%|#6        | 16/100 [00:13<00:29,  2.85/s]
 17%|#7        | 17/100 [00:13<00:30,  2.75/s]
 18%|#8        | 18/100 [00:14<00:28,  2.87/s]
 19%|#9        | 19/100 [00:14<00:36,  2.24/s]
 20%|##        | 20/100 [00:15<00:50,  1.60/s]
 21%|##1       | 21/100 [00:16<00:38,  2.04/s]
 22%|##2       | 22/100 [00:16<00:43,  1.80/s]
 23%|##3       | 23/100 [00:19<01:31,  1.19s/]
 24%|##4       | 24/100 [00:20<01:15,  1.00/s]
 25%|##5       | 25/100 [00:20<01:05,  1.15/s]
 26%|##6       | 26/100 [00:20<00:49,  1.49/s]
 27%|##7       | 27/100 [00:21<00:39,  1.85/s]
 28%|##8       | 28/100 [00:21<00:31,  2.32/s]
 29%|##9       | 29/100 [00:21<00:31,  2.28/s]
 30%|###       | 30/100 [00:22<00:33,  2.11/s]
 31%|###1      | 31/100 [00:22<00:34,  2.02/s]
 32%|###2      | 32/100 [00:23<00:31,  2.13/s]
 33%|###3      | 33/100 [00:23<00:27,  2.43/s]
 34%|###4      | 34/100 [00:24<00:34,  1.91/s]
 35%|###5      | 35/100 [00:24<00:30,  2.15/s]
 36%|###6      | 36/100 [00:25<00:29,  2.15/s]
 37%|###7      | 37/100 [00:25<00:28,  2.25/s]
 38%|###8      | 38/100 [00:27<00:48,  1.27/s]
 39%|###9      | 39/100 [00:27<00:44,  1.36/s]
 40%|####      | 40/100 [00:27<00:33,  1.78/s]
 41%|####1     | 41/100 [00:28<00:28,  2.08/s]
 42%|####2     | 42/100 [00:28<00:27,  2.13/s]
 43%|####3     | 43/100 [00:28<00:23,  2.45/s]
 44%|####4     | 44/100 [00:28<00:18,  3.06/s]
 45%|####5     | 45/100 [00:29<00:21,  2.53/s]
 46%|####6     | 46/100 [00:30<00:28,  1.92/s]
 47%|####6     | 47/100 [00:30<00:28,  1.84/s]
 48%|####8     | 48/100 [00:32<00:41,  1.25/s]
 49%|####9     | 49/100 [00:32<00:32,  1.56/s]
 50%|#####     | 50/100 [00:33<00:35,  1.41/s]
 51%|#####1    | 51/100 [00:33<00:25,  1.88/s]
 52%|#####2    | 52/100 [00:33<00:20,  2.39/s]
 53%|#####3    | 53/100 [00:33<00:16,  2.92/s]
 54%|#####4    | 54/100 [00:34<00:23,  1.92/s]
 55%|#####5    | 55/100 [00:35<00:20,  2.21/s]
 56%|#####6    | 56/100 [00:35<00:23,  1.86/s]
 57%|#####6    | 57/100 [00:36<00:19,  2.18/s]
 58%|#####8    | 58/100 [00:36<00:18,  2.33/s]
 59%|#####8    | 59/100 [00:36<00:15,  2.61/s]
 60%|######    | 60/100 [00:37<00:21,  1.88/s]
 61%|######1   | 61/100 [00:37<00:17,  2.25/s]
 62%|######2   | 62/100 [00:38<00:21,  1.80/s]
 63%|######3   | 63/100 [00:39<00:18,  1.99/s]
 64%|######4   | 64/100 [00:39<00:16,  2.18/s]
 65%|######5   | 65/100 [00:39<00:13,  2.53/s]
 66%|######6   | 66/100 [00:40<00:16,  2.01/s]
 68%|######8   | 68/100 [00:40<00:12,  2.56/s]
 69%|######9   | 69/100 [00:41<00:12,  2.46/s]
 70%|#######   | 70/100 [00:43<00:24,  1.25/s]
 71%|#######1  | 71/100 [00:43<00:22,  1.30/s]
 73%|#######3  | 73/100 [00:44<00:13,  2.00/s]
 74%|#######4  | 74/100 [00:44<00:11,  2.20/s]
 75%|#######5  | 75/100 [00:45<00:11,  2.23/s]
 77%|#######7  | 77/100 [00:45<00:08,  2.87/s]
 78%|#######8  | 78/100 [00:45<00:08,  2.75/s]
 79%|#######9  | 79/100 [00:46<00:07,  2.87/s]
 80%|########  | 80/100 [00:46<00:09,  2.21/s]
 81%|########1 | 81/100 [00:47<00:07,  2.69/s]
 82%|########2 | 82/100 [00:47<00:06,  2.61/s]
 83%|########2 | 83/100 [00:47<00:05,  2.98/s]
 84%|########4 | 84/100 [00:48<00:06,  2.42/s]
 85%|########5 | 85/100 [00:49<00:08,  1.70/s]
 86%|########6 | 86/100 [00:49<00:08,  1.75/s]
 87%|########7 | 87/100 [00:50<00:06,  2.09/s]
 88%|########8 | 88/100 [00:50<00:06,  1.86/s]
 90%|######### | 90/100 [00:51<00:05,  1.77/s]
 91%|#########1| 91/100 [00:52<00:04,  2.03/s]
 92%|#########2| 92/100 [00:52<00:03,  2.40/s]
 93%|#########3| 93/100 [00:52<00:02,  2.88/s]
 94%|#########3| 94/100 [00:52<00:02,  2.82/s]
 95%|#########5| 95/100 [00:53<00:01,  2.72/s]
 96%|#########6| 96/100 [00:56<00:04,  1.05s/]
 97%|#########7| 97/100 [00:56<00:02,  1.14/s]
 98%|#########8| 98/100 [00:57<00:01,  1.28/s]
 99%|#########9| 99/100 [00:57<00:00,  1.54/s]
100%|##########| 100/100 [00:57<00:00,  1.74/s]
100%|##########| 100/100 [00:57<00:00,  1.73/s]
sepal_score
Lct 7.868
1500015O10Rik 7.085
Ecel1 5.274
Fzd5 4.694
Cfap65 4.095
C1ql2 3.144
Slc9a2 2.947
Gm17634 2.904
St18 2.568
Des 2.494


We can visualize some of those genes with squidpy.pl.spatial_scatter().

sq.pl.spatial_scatter(adata, color=["Lct", "Ecel1", "Cfap65"])
Lct, Ecel1, Cfap65

Total running time of the script: ( 1 minutes 12.066 seconds)

Estimated memory usage: 475 MB