Cell-type deconvolution using Tangram

In this tutorial, we show how to leverage Squidpy’s squidpy.im.ImageContainer for cell-type deconvolution tasks.

Mapping single-cell atlases to spatial transcriptomics data is a crucial analysis steps to integrate cell-type annotation across technologies. Information on the number of nuclei under each spot can help cell-type deconvolution methods. Tangram [Biancalani et al., 2020], (code) is a cell-type deconvolution method that enables mapping of cell-types to single nuclei under each spot. We will show how to leverage the image container segmentation capabilities, together with Tangram, to map cell types of the mouse cortex from sc-RNA-seq data to Visium data.

To run the notebook locally, create a conda environment as conda env create -f tangram_environment.yml using this tangram_environment.yml.

First, let’s import some libraries.

[1]:
import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
from anndata import AnnData
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import skimage

# import tangram for spatial deconvolution
import tangram as tg

sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
print(f"tangram=={tg.__version__}")
/opt/homebrew/Caskroom/miniforge/base/envs/squidpy_3_9/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.21.6 scipy==1.8.0 pandas==1.4.2 scikit-learn==1.0.2 statsmodels==0.13.2 python-igraph==0.9.9 pynndescent==0.5.6
squidpy==1.2.2
tangram==1.0.2

We will load the public data available in Squidpy.

[2]:
adata_st = sq.datasets.visium_fluo_adata_crop()
adata_st = adata_st[
    adata_st.obs.cluster.isin([f"Cortex_{i}" for i in np.arange(1, 5)])
].copy()
img = sq.datasets.visium_fluo_image_crop()

adata_sc = sq.datasets.sc_mouse_cortex()

Here, we subset the crop of the mouse brain to only contain clusters of the brain cortex. The pre-processed single cell dataset was taken from [Tasic et al., 2018] and pre-processed with standard scanpy functions. To start off, let’s visualize both spatial and single-cell datasets.

[3]:
fig, axs = plt.subplots(1, 2, figsize=(20, 5))
sq.pl.spatial_scatter(
    adata_st, color="cluster", alpha=0.7, frameon=False, ax=axs[0]
)
sc.pl.umap(
    adata_sc, color="cell_subclass", size=10, frameon=False, show=False, ax=axs[1]
)
plt.tight_layout()
../_images/external_tutorials_tutorial_tangram_5_0.png

Nuclei segmentation and segmentation features

As mentioned before, we are interested in segmenting single nuclei under each spot in the Visium dataset. Squidpy makes it possible with two lines of Python code:

  • squidpy.im.process applies smoothing as a pre-processing step.

  • squidpy.im.segment computes segmentation masks with watershed algorithm.

[4]:
sq.im.process(img=img, layer="image", method="smooth")
sq.im.segment(
    img=img,
    layer="image_smooth",
    method="watershed",
    channel=0,
)

Let’s visualize the results for an inset:

[5]:
inset_y = 1500
inset_x = 1700
inset_sy = 400
inset_sx = 500

fig, axs = plt.subplots(1, 3, figsize=(30, 10))
sq.pl.spatial_scatter(
    adata_st, color="cluster", alpha=0.7, frameon=False, ax=axs[0], title=""
)
axs[0].set_title("Clusters", fontdict={"fontsize": 20})
sf = adata_st.uns["spatial"]["V1_Adult_Mouse_Brain_Coronal_Section_2"]["scalefactors"][
    "tissue_hires_scalef"
]
rect = mpl.patches.Rectangle(
    (inset_y * sf, inset_x * sf),
    width=inset_sx * sf,
    height=inset_sy * sf,
    ec="yellow",
    lw=4,
    fill=False,
)
axs[0].add_patch(rect)

axs[0].axes.xaxis.label.set_visible(False)
axs[0].axes.yaxis.label.set_visible(False)

axs[1].imshow(
    img["image"][inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx, 0, 0]
    / 65536,
    interpolation="none",
)
axs[1].grid(False)
axs[1].set_xticks([])
axs[1].set_yticks([])
axs[1].set_title("DAPI", fontdict={"fontsize": 20})

crop = img["segmented_watershed"][
    inset_y : inset_y + inset_sy, inset_x : inset_x + inset_sx
].values.squeeze(-1)
crop = skimage.segmentation.relabel_sequential(crop)[0]
cmap = plt.cm.plasma
cmap.set_under(color="black")
axs[2].imshow(crop, interpolation="none", cmap=cmap, vmin=0.001)
axs[2].grid(False)
axs[2].set_xticks([])
axs[2].set_yticks([])
axs[2].set_title("Nucleous segmentation", fontdict={"fontsize": 20})
[5]:
Text(0.5, 1.0, 'Nucleous segmentation')
../_images/external_tutorials_tutorial_tangram_9_1.png

We can appreciate that despite some false positive, the nuclei segmentation worked well.

We then need to extract some image features useful for the deconvolution task downstream. Specifically, we will need:

  • the number of unique segmentation objects (i.e. nuclei) under each spot.

  • the coordinates of the centroids of the segmentation object.

It will become clear later how we will be using them.

[6]:
# define image layer to use for segmentation
features_kwargs = {
    "segmentation": {
        "label_layer": "segmented_watershed",
        "props": ["label", "centroid"],
        "channels": [1, 2],
    }
}
# calculate segmentation features
sq.im.calculate_image_features(
    adata_st,
    img,
    layer="image",
    key_added="image_features",
    features_kwargs=features_kwargs,
    features="segmentation",
    mask_circle=True,
)
100%|██████████| 324/324 [00:25<00:00, 12.48/s]

We can visualize the total number of objects under each spot with squidpy.

[7]:
adata_st.obs["cell_count"] = adata_st.obsm["image_features"]["segmentation_label"]
sq.pl.spatial_scatter(adata_st, color=["cluster", "cell_count"], frameon=False)
../_images/external_tutorials_tutorial_tangram_13_0.png

Deconvolution and mapping

At this stage, we have all we need for the deconvolution task. First, we need to find a set of common genes the single cell and spatial datasets. We will use the intersection of the highly variable genes.

[8]:
sc.tl.rank_genes_groups(adata_sc, groupby="cell_subclass", use_raw=False)
WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
[9]:
markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
genes_sc = np.unique(markers_df.melt().value.values)
genes_st = adata_st.var_names.values
genes = list(set(genes_sc).intersection(set(genes_st)))
len(genes)
[9]:
1281
[10]:
tg.pp_adatas(adata_sc, adata_st, genes=genes)
INFO:root:1280 training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.
INFO:root:14785 overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.
INFO:root:uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata.
INFO:root:rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata.

Now we are ready to instantiate the model object and its hyper parameters. Note that we are loading torch and training the model on the GPU. However, it’s also possible to train it on the CPU, it will just be slower.

[11]:
ad_map = tg.map_cells_to_space(
    adata_sc,
    adata_st,
    mode="constrained",
    target_count=adata_st.obs.cell_count.sum(),
    density_prior=np.array(adata_st.obs.cell_count) / adata_st.obs.cell_count.sum(),
    num_epochs=1000,
    device="cpu",
)
INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 1280 genes and customized density_prior in constrained mode...
Score: 0.613, KL reg: 0.125, Count reg: 5758.569, Lambda f reg: 4492.569
Score: 0.699, KL reg: 0.012, Count reg: 2.133, Lambda f reg: 734.911
Score: 0.702, KL reg: 0.012, Count reg: 0.638, Lambda f reg: 247.100
Score: 0.702, KL reg: 0.012, Count reg: 1.084, Lambda f reg: 171.527
Score: 0.702, KL reg: 0.012, Count reg: 0.805, Lambda f reg: 142.004
Score: 0.702, KL reg: 0.012, Count reg: 0.173, Lambda f reg: 126.206
Score: 0.702, KL reg: 0.012, Count reg: 1.130, Lambda f reg: 108.257
Score: 0.703, KL reg: 0.012, Count reg: 1.477, Lambda f reg: 99.011
Score: 0.703, KL reg: 0.012, Count reg: 0.960, Lambda f reg: 92.174
Score: 0.703, KL reg: 0.012, Count reg: 0.843, Lambda f reg: 83.730
INFO:root:Saving results..

As a first result, we can take the average of the mapped cells and computes proportions from it. The following functions computes the proportions from the Tangram object result, and store them in the spatial AnnData object.

We can appreciate how average results already give a sense of the success of the deconvolution step. Cortical layers are indeed at higher proportions in the correct regions in the tissue.

Of course some layers seems to be better resolved then others. A more refined gene selection step could be of help in this case.

[12]:
tg.project_cell_annotations(ad_map, adata_st, annotation="cell_subclass")
INFO:root:spatial prediction dataframe is saved in `obsm` `tangram_ct_pred` of the spatial AnnData.
[14]:
adata_st.obs = pd.concat([adata_st.obs, adata_st.obsm["tangram_ct_pred"]], axis=1)

sq.pl.spatial_scatter(
    adata_st,
    color=["L2/3 IT", "L4", "L5 IT", "L5 PT", "L6 CT", "L6 IT", "L6b"],
)
../_images/external_tutorials_tutorial_tangram_22_0.png

And here comes the key part, where we will use the results of the previous deconvolution steps. Previously, we computed the absolute numbers of unique segmentation objects under each spot, together with their centroids. Let’s extract them in the right format useful for Tangram.

In the resulting dataframe, each row represents a single segmentation object (therefore a single nuclei). We also have the image coordinates as well as the unique centroid ID, which is a string that contains both the spot ID and a numerical index.

Tangram provide a convenient function to export the mapping between spot ID and segmentation ID to adata.uns.

[16]:
tg.create_segment_cell_df(adata_st)
INFO:root:cell segmentation dataframe is saved in `uns` `tangram_cell_segmentation` of the spatial AnnData.
INFO:root:spot centroids is saved in `obsm` `tangram_spot_centroids` of the spatial AnnData.
[17]:
adata_st.uns["tangram_cell_segmentation"].head()
[17]:
spot_idx y x centroids
0 AAATGGCATGTCTTGT-1 5304.000000 731.000000 AAATGGCATGTCTTGT-1_0
1 AAATGGCATGTCTTGT-1 5320.947519 721.331554 AAATGGCATGTCTTGT-1_1
2 AAATGGCATGTCTTGT-1 5332.942342 717.447904 AAATGGCATGTCTTGT-1_2
3 AAATGGCATGTCTTGT-1 5348.865384 558.924248 AAATGGCATGTCTTGT-1_3
4 AAATGGCATGTCTTGT-1 5342.124989 567.208502 AAATGGCATGTCTTGT-1_4

We can then use tangram.count_cell_annotation() to map cell types as result of the deconvolution step to putative segmentation ID.

[18]:
tg.count_cell_annotations(
    ad_map,
    adata_sc,
    adata_st,
    annotation="cell_subclass",
)
INFO:root:spatial cell count dataframe is saved in `obsm` `tangram_ct_count` of the spatial AnnData.
[19]:
adata_st.obsm["tangram_ct_count"].head()
[19]:
x y cell_n centroids Pvalb L4 Vip L2/3 IT Lamp5 NP ... L5 PT Astro L6b Endo Peri Meis2 Macrophage CR VLMC SMC
AAATGGCATGTCTTGT-1 641 5393 13 [AAATGGCATGTCTTGT-1_0, AAATGGCATGTCTTGT-1_1, A... 2 0 0 1 0 1 ... 2 0 0 0 0 0 1 0 0 0
AACAACTGGTAGTTGC-1 4208 1672 16 [AACAACTGGTAGTTGC-1_0, AACAACTGGTAGTTGC-1_1, A... 2 0 3 0 1 1 ... 2 1 0 0 0 0 0 0 0 0
AACAGGAAATCGAATA-1 1117 5117 28 [AACAGGAAATCGAATA-1_0, AACAGGAAATCGAATA-1_1, A... 3 0 2 0 1 1 ... 0 0 0 0 0 0 0 0 0 0
AACCCAGAGACGGAGA-1 1101 1274 5 [AACCCAGAGACGGAGA-1_0, AACCCAGAGACGGAGA-1_1, A... 1 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
AACCGTTGTGTTTGCT-1 399 4708 7 [AACCGTTGTGTTTGCT-1_0, AACCGTTGTGTTTGCT-1_1, A... 1 0 0 1 0 0 ... 0 0 0 1 0 0 0 0 0 0

5 rows × 27 columns

And then finally export the results in a new AnnData object.

[20]:
adata_segment = tg.deconvolve_cell_annotations(adata_st)
[21]:
adata_segment.obs.head()
[21]:
y x centroids cluster
0 5304.000000 731.000000 AAATGGCATGTCTTGT-1_0 Pvalb
1 5320.947519 721.331554 AAATGGCATGTCTTGT-1_1 Pvalb
2 5332.942342 717.447904 AAATGGCATGTCTTGT-1_2 L2/3 IT
3 5348.865384 558.924248 AAATGGCATGTCTTGT-1_3 NP
4 5342.124989 567.208502 AAATGGCATGTCTTGT-1_4 L6 CT

Note that the AnnData object does not contain counts, but only cell type annotations, as results of the Tangram mapping. Nevertheless, it’s convenient to create such AnnData object for visualization purposes.

Below you can appreciate how each dot is now not a Visium spot anymore, but a single unique segmentation object, with the mapped cell type.

[27]:
fig, ax = plt.subplots(1, 1, figsize=(20, 20))
sq.pl.spatial_scatter(
    adata_segment,
    color="cluster",
    size=0.4,
    frameon=False,
    img_alpha=0.2,
    legend_fontsize=20,
    ax=ax,
)
../_images/external_tutorials_tutorial_tangram_33_0.png