%matplotlib inline

Calculate distances to a user-defined anchor point

This example shows how to use squidpy.tl.var_by_distance() to calculate the minimum distances of all observations to a user-defined anchor point, store the results in anndata.AnnData.obsm and plot the expression by distance. using squidpy.pl.var_by_distance().

import squidpy as sq

First, let’s download the MIBI-TOF dataset.

adata = sq.datasets.mibitof()

This data set contains a cell type annotation in anndata.AnnData.obs["Cluster"] and a slide annotation in anndata.AnnData.obs["library_id"]

adata.obs
row_num point cell_id X1 center_rowcoord center_colcoord cell_size category donor Cluster batch library_id
3034-0 3086 23 2 60316.0 269.0 7.0 408.0 carcinoma 21d7 Epithelial 0 point23
3035-0 3087 23 3 60317.0 294.0 6.0 408.0 carcinoma 21d7 Epithelial 0 point23
3036-0 3088 23 4 60318.0 338.0 4.0 304.0 carcinoma 21d7 Imm_other 0 point23
3037-0 3089 23 6 60320.0 372.0 6.0 219.0 carcinoma 21d7 Myeloid_CD11c 0 point23
3038-0 3090 23 8 60322.0 417.0 5.0 303.0 carcinoma 21d7 Myeloid_CD11c 0 point23
... ... ... ... ... ... ... ... ... ... ... ... ...
47342-2 48953 16 1103 2779.0 143.0 1016.0 283.0 carcinoma 90de Fibroblast 2 point16
47343-2 48954 16 1104 2780.0 814.0 1017.0 147.0 carcinoma 90de Fibroblast 2 point16
47344-2 48955 16 1105 2781.0 874.0 1018.0 142.0 carcinoma 90de Imm_other 2 point16
47345-2 48956 16 1106 2782.0 257.0 1019.0 108.0 carcinoma 90de Fibroblast 2 point16
47346-2 48957 16 1107 2783.0 533.0 1019.0 111.0 carcinoma 90de Fibroblast 2 point16

3309 rows × 12 columns

For each slide we now want to calculate the distance of all observations to the closest Epithelial cell. In addition we want to include the condition of the donors and the donor id in the resulting design matrix As we don’t create a copy, the result will be stored in anndata.AnnData.obsm.

sq.tl.var_by_distance(
    adata=adata,
    groups="Epithelial",
    cluster_key="Cluster",
    library_key="library_id",
    covariates=["category", "donor"],
)

Since we didn’t specify a name, the resulting data frame is called “design_matrix”. NaN values indicate, that the observation belongs to an anchor point or that the coordinates for this observation weren’t available from the beginning on.

adata.obsm["design_matrix"]
Cluster library_id Epithelial Epithelial_raw category donor
3034-0 Epithelial point23 NaN 0.000000 carcinoma 21d7
3035-0 Epithelial point23 NaN 0.000000 carcinoma 21d7
3036-0 Imm_other point23 0.043157 33.105891 carcinoma 21d7
3037-0 Myeloid_CD11c point23 0.066190 50.774009 carcinoma 21d7
3038-0 Myeloid_CD11c point23 0.109999 84.380092 carcinoma 21d7
... ... ... ... ... ... ...
47342-2 Fibroblast point16 0.849905 651.958588 carcinoma 90de
47343-2 Fibroblast point16 0.418362 320.923667 carcinoma 90de
47344-2 Imm_other point16 0.439758 337.336627 carcinoma 90de
47345-2 Fibroblast point16 0.724977 556.126784 carcinoma 90de
47346-2 Fibroblast point16 0.473780 363.435001 carcinoma 90de

3309 rows × 6 columns

We now want to visualize the results and plot the expression of CD98 by distance to the closest Epithelial cell. In addition we want to differentiate between two expression trends by specifying a covariate.

sq.pl.var_by_distance(
    adata=adata,
    design_matrix_key="design_matrix",
    var="CD98",
    anchor_key="Epithelial",
    covariate="donor",
    line_palette=["blue", "orange"],
    show_scatter=False,
    figsize=(5, 4),
)
../../../_images/777af5c512223df31a3413793c66bc7a2cd45094ddfb6b3da15031e5c0832100.png