# Cell-type deconvolution using Tangram

In this tutorial, we show how to leverage Squidpy’s squidpy.im.ImageContainer for cell-type deconvolution tasks.

Mapping single-cell atlases to spatial transcriptomics data is a crucial analysis steps to integrate cell-type annotation across technologies. Information on the number of nuclei under each spot can help cell-type deconvolution methods. Tangram , (code) is a cell-type deconvolution method that enables mapping of cell-types to single nuclei under each spot. We will show how to leverage the image container segmentation capabilities, together with Tangram, to map cell types of the mouse cortex from sc-RNA-seq data to Visium data.

To run the notebook locally, create a conda environment as conda env create -f tangram_environment.yml using this tangram_environment.yml.

First, let’s import some libraries.

[1]:

import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
from anndata import AnnData
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import skimage

# import tangram for spatial deconvolution
import tangram as tg

print(f"squidpy=={sq.__version__}")
print(f"tangram=={tg.__version__}")

/opt/homebrew/Caskroom/miniforge/base/envs/squidpy_3_9/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.21.6 scipy==1.8.0 pandas==1.4.2 scikit-learn==1.0.2 statsmodels==0.13.2 python-igraph==0.9.9 pynndescent==0.5.6
squidpy==1.2.2
tangram==1.0.2


We will load the public data available in Squidpy.

[2]:

adata_st = sq.datasets.visium_fluo_adata_crop()
adata_st.obs.cluster.isin([f"Cortex_{i}" for i in np.arange(1, 5)])
].copy()
img = sq.datasets.visium_fluo_image_crop()



Here, we subset the crop of the mouse brain to only contain clusters of the brain cortex. The pre-processed single cell dataset was taken from and pre-processed with standard scanpy functions. To start off, let’s visualize both spatial and single-cell datasets.

[3]:

fig, axs = plt.subplots(1, 2, figsize=(20, 5))
sq.pl.spatial_scatter(
)
sc.pl.umap(
adata_sc, color="cell_subclass", size=10, frameon=False, show=False, ax=axs[1]
)
plt.tight_layout()


## Nuclei segmentation and segmentation features

As mentioned before, we are interested in segmenting single nuclei under each spot in the Visium dataset. Squidpy makes it possible with two lines of Python code:

• squidpy.im.process applies smoothing as a pre-processing step.

• squidpy.im.segment computes segmentation masks with watershed algorithm.

[4]:

sq.im.process(img=img, layer="image", method="smooth")
sq.im.segment(
img=img,
layer="image_smooth",
method="watershed",
channel=0,
)


Let’s visualize the results for an inset:

[5]:

inset_y = 1500
inset_x = 1700
inset_sy = 400
inset_sx = 500

fig, axs = plt.subplots(1, 3, figsize=(30, 10))
sq.pl.spatial_scatter(
adata_st, color="cluster", alpha=0.7, frameon=False, ax=axs[0], title=""
)
axs[0].set_title("Clusters", fontdict={"fontsize": 20})
"tissue_hires_scalef"
]
rect = mpl.patches.Rectangle(
(inset_y * sf, inset_x * sf),
width=inset_sx * sf,
height=inset_sy * sf,
ec="yellow",
lw=4,
fill=False,
)

axs[0].axes.xaxis.label.set_visible(False)
axs[0].axes.yaxis.label.set_visible(False)

axs[1].imshow(
img["image"][inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx, 0, 0]
/ 65536,
interpolation="none",
)
axs[1].grid(False)
axs[1].set_xticks([])
axs[1].set_yticks([])
axs[1].set_title("DAPI", fontdict={"fontsize": 20})

crop = img["segmented_watershed"][
inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx
].values.squeeze(-1)
crop = skimage.segmentation.relabel_sequential(crop)[0]
cmap = plt.cm.plasma
cmap.set_under(color="black")
axs[2].imshow(crop, interpolation="none", cmap=cmap, vmin=0.001)
axs[2].grid(False)
axs[2].set_xticks([])
axs[2].set_yticks([])
axs[2].set_title("Nucleous segmentation", fontdict={"fontsize": 20})

[5]:

Text(0.5, 1.0, 'Nucleous segmentation')


We can appreciate that despite some false positive, the nuclei segmentation worked well.

We then need to extract some image features useful for the deconvolution task downstream. Specifically, we will need:

• the number of unique segmentation objects (i.e. nuclei) under each spot.

• the coordinates of the centroids of the segmentation object.

It will become clear later how we will be using them.

[6]:

# define image layer to use for segmentation
features_kwargs = {
"segmentation": {
"label_layer": "segmented_watershed",
"props": ["label", "centroid"],
"channels": [1, 2],
}
}
# calculate segmentation features
sq.im.calculate_image_features(
img,
layer="image",
features_kwargs=features_kwargs,
features="segmentation",
)

100%|██████████| 324/324 [00:25<00:00, 12.48/s]


We can visualize the total number of objects under each spot with squidpy.

[7]:

adata_st.obs["cell_count"] = adata_st.obsm["image_features"]["segmentation_label"]


## Deconvolution and mapping

At this stage, we have all we need for the deconvolution task. First, we need to find a set of common genes the single cell and spatial datasets. We will use the intersection of the highly variable genes.

[8]:

sc.tl.rank_genes_groups(adata_sc, groupby="cell_subclass", use_raw=False)

WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'

[9]:

markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
genes_sc = np.unique(markers_df.melt().value.values)
genes = list(set(genes_sc).intersection(set(genes_st)))
len(genes)

[9]:

1281

[10]:

tg.pp_adatas(adata_sc, adata_st, genes=genes)

INFO:root:1280 training genes are saved in unstraining_genes of both single cell and spatial Anndatas.
INFO:root:14785 overlapped genes are saved in unsoverlap_genes of both single cell and spatial Anndatas.
INFO:root:uniform based density prior is calculated and saved in obsuniform_density of the spatial Anndata.
INFO:root:rna count based density prior is calculated and saved in obsrna_count_based_density of the spatial Anndata.


Now we are ready to instantiate the model object and its hyper parameters. Note that we are loading torch and training the model on the GPU. However, it’s also possible to train it on the CPU, it will just be slower.

[11]:

ad_map = tg.map_cells_to_space(
mode="constrained",
num_epochs=1000,
device="cpu",
)

INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 1280 genes and customized density_prior in constrained mode...

Score: 0.613, KL reg: 0.125, Count reg: 5758.569, Lambda f reg: 4492.569
Score: 0.699, KL reg: 0.012, Count reg: 2.133, Lambda f reg: 734.911
Score: 0.702, KL reg: 0.012, Count reg: 0.638, Lambda f reg: 247.100
Score: 0.702, KL reg: 0.012, Count reg: 1.084, Lambda f reg: 171.527
Score: 0.702, KL reg: 0.012, Count reg: 0.805, Lambda f reg: 142.004
Score: 0.702, KL reg: 0.012, Count reg: 0.173, Lambda f reg: 126.206
Score: 0.702, KL reg: 0.012, Count reg: 1.130, Lambda f reg: 108.257
Score: 0.703, KL reg: 0.012, Count reg: 1.477, Lambda f reg: 99.011
Score: 0.703, KL reg: 0.012, Count reg: 0.960, Lambda f reg: 92.174
Score: 0.703, KL reg: 0.012, Count reg: 0.843, Lambda f reg: 83.730

INFO:root:Saving results..


As a first result, we can take the average of the mapped cells and computes proportions from it. The following functions computes the proportions from the Tangram object result, and store them in the spatial AnnData object.

We can appreciate how average results already give a sense of the success of the deconvolution step. Cortical layers are indeed at higher proportions in the correct regions in the tissue.

Of course some layers seems to be better resolved then others. A more refined gene selection step could be of help in this case.

[12]:

tg.project_cell_annotations(ad_map, adata_st, annotation="cell_subclass")

INFO:root:spatial prediction dataframe is saved in obsm tangram_ct_pred of the spatial AnnData.

[14]:

adata_st.obs = pd.concat([adata_st.obs, adata_st.obsm["tangram_ct_pred"]], axis=1)

sq.pl.spatial_scatter(
color=["L2/3 IT", "L4", "L5 IT", "L5 PT", "L6 CT", "L6 IT", "L6b"],
)


And here comes the key part, where we will use the results of the previous deconvolution steps. Previously, we computed the absolute numbers of unique segmentation objects under each spot, together with their centroids. Let’s extract them in the right format useful for Tangram.

In the resulting dataframe, each row represents a single segmentation object (therefore a single nuclei). We also have the image coordinates as well as the unique centroid ID, which is a string that contains both the spot ID and a numerical index.

Tangram provide a convenient function to export the mapping between spot ID and segmentation ID to adata.uns.

[16]:

tg.create_segment_cell_df(adata_st)

INFO:root:cell segmentation dataframe is saved in uns tangram_cell_segmentation of the spatial AnnData.
INFO:root:spot centroids is saved in obsm tangram_spot_centroids of the spatial AnnData.

[17]:

adata_st.uns["tangram_cell_segmentation"].head()

[17]:

spot_idx y x centroids
0 AAATGGCATGTCTTGT-1 5304.000000 731.000000 AAATGGCATGTCTTGT-1_0
1 AAATGGCATGTCTTGT-1 5320.947519 721.331554 AAATGGCATGTCTTGT-1_1
2 AAATGGCATGTCTTGT-1 5332.942342 717.447904 AAATGGCATGTCTTGT-1_2
3 AAATGGCATGTCTTGT-1 5348.865384 558.924248 AAATGGCATGTCTTGT-1_3
4 AAATGGCATGTCTTGT-1 5342.124989 567.208502 AAATGGCATGTCTTGT-1_4

We can then use tangram.count_cell_annotation() to map cell types as result of the deconvolution step to putative segmentation ID.

[18]:

tg.count_cell_annotations(
annotation="cell_subclass",
)

INFO:root:spatial cell count dataframe is saved in obsm tangram_ct_count of the spatial AnnData.

[19]:

adata_st.obsm["tangram_ct_count"].head()

[19]:

x y cell_n centroids Pvalb L4 Vip L2/3 IT Lamp5 NP ... L5 PT Astro L6b Endo Peri Meis2 Macrophage CR VLMC SMC
AAATGGCATGTCTTGT-1 641 5393 13 [AAATGGCATGTCTTGT-1_0, AAATGGCATGTCTTGT-1_1, A... 2 0 0 1 0 1 ... 2 0 0 0 0 0 1 0 0 0
AACAACTGGTAGTTGC-1 4208 1672 16 [AACAACTGGTAGTTGC-1_0, AACAACTGGTAGTTGC-1_1, A... 2 0 3 0 1 1 ... 2 1 0 0 0 0 0 0 0 0
AACAGGAAATCGAATA-1 1117 5117 28 [AACAGGAAATCGAATA-1_0, AACAGGAAATCGAATA-1_1, A... 3 0 2 0 1 1 ... 0 0 0 0 0 0 0 0 0 0
AACCCAGAGACGGAGA-1 1101 1274 5 [AACCCAGAGACGGAGA-1_0, AACCCAGAGACGGAGA-1_1, A... 1 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
AACCGTTGTGTTTGCT-1 399 4708 7 [AACCGTTGTGTTTGCT-1_0, AACCGTTGTGTTTGCT-1_1, A... 1 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 0 0

5 rows × 27 columns

And then finally export the results in a new AnnData object.

[20]:

adata_segment = tg.deconvolve_cell_annotations(adata_st)

[21]:

adata_segment.obs.head()

[21]:

y x centroids cluster
0 5304.000000 731.000000 AAATGGCATGTCTTGT-1_0 Pvalb
1 5320.947519 721.331554 AAATGGCATGTCTTGT-1_1 Pvalb
2 5332.942342 717.447904 AAATGGCATGTCTTGT-1_2 L2/3 IT
3 5348.865384 558.924248 AAATGGCATGTCTTGT-1_3 NP
4 5342.124989 567.208502 AAATGGCATGTCTTGT-1_4 L6 CT

Note that the AnnData object does not contain counts, but only cell type annotations, as results of the Tangram mapping. Nevertheless, it’s convenient to create such AnnData object for visualization purposes.

Below you can appreciate how each dot is now not a Visium spot anymore, but a single unique segmentation object, with the mapped cell type.

[27]:

fig, ax = plt.subplots(1, 1, figsize=(20, 20))
sq.pl.spatial_scatter(