%matplotlib inline

Analyse your spatial data using sliding windows

This example shows how to use squidpy.tl.sliding_window() to divide the obs of an anndata.AnnData object into adjecent, potentially overlapping, windows.

import matplotlib.pyplot as plt

import squidpy as sq

First, let’s download the MIBI-TOF dataset.

adata = sq.datasets.mibitof()

This data set contains a cell type annotation in anndata.AnnData.obs\ ["Cluster"] and a slide annotation in anndata.AnnData.obs\ ["library_id"]

adata.obs
row_num point cell_id X1 center_rowcoord center_colcoord cell_size category donor Cluster batch library_id
3034-0 3086 23 2 60316.0 269.0 7.0 408.0 carcinoma 21d7 Epithelial 0 point23
3035-0 3087 23 3 60317.0 294.0 6.0 408.0 carcinoma 21d7 Epithelial 0 point23
3036-0 3088 23 4 60318.0 338.0 4.0 304.0 carcinoma 21d7 Imm_other 0 point23
3037-0 3089 23 6 60320.0 372.0 6.0 219.0 carcinoma 21d7 Myeloid_CD11c 0 point23
3038-0 3090 23 8 60322.0 417.0 5.0 303.0 carcinoma 21d7 Myeloid_CD11c 0 point23
... ... ... ... ... ... ... ... ... ... ... ... ...
47342-2 48953 16 1103 2779.0 143.0 1016.0 283.0 carcinoma 90de Fibroblast 2 point16
47343-2 48954 16 1104 2780.0 814.0 1017.0 147.0 carcinoma 90de Fibroblast 2 point16
47344-2 48955 16 1105 2781.0 874.0 1018.0 142.0 carcinoma 90de Imm_other 2 point16
47345-2 48956 16 1106 2782.0 257.0 1019.0 108.0 carcinoma 90de Fibroblast 2 point16
47346-2 48957 16 1107 2783.0 533.0 1019.0 111.0 carcinoma 90de Fibroblast 2 point16

3309 rows × 12 columns

Stratified by library, we now want to assign each cell to a sliding window of a given size.

sq.tl.sliding_window(
    adata=adata,
    library_key="library_id",  # to stratify by sample
    window_size=300,
    overlap=0,
    copy=False,  # we modify in place
)

Let’s inspect the column that the function has added to our data.

adata.obs["sliding_window_assignment"]
3034-0      point23_window_0
3035-0      point23_window_0
3036-0      point23_window_1
3037-0      point23_window_1
3038-0      point23_window_1
                 ...        
47342-2    point16_window_12
47343-2    point16_window_14
47344-2    point16_window_14
47345-2    point16_window_12
47346-2    point16_window_13
Name: sliding_window_assignment, Length: 3309, dtype: category
Categories (48, object): ['point23_window_0' < 'point8_window_0' < 'point16_window_0' < 'point23_window_1' ... 'point16_window_14' < 'point23_window_15' < 'point8_window_15' < 'point16_window_15']

We see that each observation has been assigned to a window, which is defined by the sliding_window_assignment column. We can visualise this using squidpy.pl.spatial_scatter().

sq.pl.spatial_scatter(
    adata, color="sliding_window_assignment", library_key="library_id", figsize=(10, 10)
)
../../../_images/8e62fe4a5b5cc257490cd3ee3bf15df2c22ef8350a94276b9d04b8376bdaa07d.png

Optionally, we can also look at a specific sample:

sq.pl.spatial_scatter(
    adata,
    color="sliding_window_assignment",
    library_key="library_id",
    library_id=["point8"],
    figsize=(10, 10),
)
../../../_images/d5ed54ed460a705f539d4879dd0235be857ae1d00e576c880cd1f47dafae82ae.png

We see that the function has created 16 windows, this is based on the window_size of 200 and an overlap of 0. The behaviour of the function changes when we use an overlap, since then observations will be assigned to multiple windows. This information can no longer be stored in a single column. Let’s try this out.

adata = sq.datasets.mibitof()  # fresh copy

sq.tl.sliding_window(
    adata=adata,
    library_key="library_id",  # to stratify by sample
    window_size=300,
    overlap=50,
    copy=False,  # we modify in place
)

When now inspecting the anndata.AnnData.obs, we see that several columns have been added, each indicating whether an observation is a member of a specific window, stratified by library_key.

Due to the overlapping assignments, we now have more “true” assignments than observations. This is because each observation can be a member of multiple windows.

total_cells = 0
for lib_key in ["point8", "point16", "point23"]:
    cols_in_lib = adata.obs.columns[adata.obs.columns.str.contains(lib_key)]
    total_cells_in_lib = sum(adata.obs[col].sum() for col in cols_in_lib)
    total_cells += total_cells_in_lib
    print(f"Total cells in {lib_key}: {total_cells_in_lib}")

print(f"Total cells: {total_cells}")
Total cells in point8: 1421
Total cells in point16: 1421
Total cells in point23: 1727
Total cells: 4569

We can also illustrate these overlapping windows.

adata_point8 = adata[adata.obs["library_id"] == "point8"]
cols = adata.obs.columns[adata.obs.columns.str.contains("point8")]

# convert True/False to category so we can vizualize it
adata_point8.obs[cols] = adata_point8.obs[cols].astype("category")

sq.pl.spatial_scatter(
    adata_point8,
    color=cols,
    library_key="library_id",
    library_id="point8",
    legend_loc=None,
)
plt.tight_layout()
../../../_images/e3a0a7380eb47fee0c7398caaeec4831b8f9f72eae36321d1c11221b9380a7ba.png

Finally, we see that these specific parameters result in tiny windows with very few cells at the bottom and right corner. We can drop these with the parameter drop_partial_windows.

adata = sq.datasets.mibitof()  # fresh copy

sq.tl.sliding_window(
    adata=adata,
    library_key="library_id",  # to stratify by sample
    window_size=300,
    overlap=50,
    copy=False,  # we modify in place
    drop_partial_windows=True,
)

adata_point8 = adata[adata.obs["library_id"] == "point8"]
cols = adata.obs.columns[adata.obs.columns.str.contains("point8")]

# convert True/False to category so we can vizualize it
adata_point8.obs[cols] = adata_point8.obs[cols].astype("category")

sq.pl.spatial_scatter(
    adata_point8,
    color=cols,
    library_key="library_id",
    library_id="point8",
    legend_loc=None,
)
plt.tight_layout()
../../../_images/ea057a98be39984ff2951bb31c1134012540edeb5a4ad677c9cab326a1a03414.png

If desired, in-place modifications can be avoided by using copy=True. This then returns a pandas.DataFrame with the assignments.

adata = sq.datasets.mibitof()  # fresh copy

assignment = sq.tl.sliding_window(
    adata=adata,
    library_key="library_id",  # to stratify by sample
    window_size=300,
    overlap=50,
    copy=True,  # we modify in place
    drop_partial_windows=True,
)

assignment
sliding_window_assignment_point23_window_0 sliding_window_assignment_point23_window_1 sliding_window_assignment_point23_window_2 sliding_window_assignment_point23_window_3 sliding_window_assignment_point23_window_4 sliding_window_assignment_point23_window_5 sliding_window_assignment_point23_window_6 sliding_window_assignment_point23_window_7 sliding_window_assignment_point23_window_8 sliding_window_assignment_point8_window_0 ... sliding_window_assignment_point16_window_1 sliding_window_assignment_point16_window_2 sliding_window_assignment_point16_window_3 sliding_window_assignment_point16_window_4 sliding_window_assignment_point16_window_5 sliding_window_assignment_point16_window_6 sliding_window_assignment_point16_window_7 sliding_window_assignment_point16_window_8 globalX globalY
3034-0 True False False False False False False False False False ... False False False False False False False False 7.0 269.0
3035-0 True True False False False False False False False False ... False False False False False False False False 6.0 294.0
3036-0 False True False False False False False False False False ... False False False False False False False False 4.0 338.0
3037-0 False True False False False False False False False False ... False False False False False False False False 6.0 372.0
3038-0 False True False False False False False False False False ... False False False False False False False False 5.0 417.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
47342-2 False False False False False False False False False False ... False False False False False False False False 1016.0 143.0
47343-2 False False False False False False False False False False ... False False False False False False False False 1017.0 814.0
47344-2 False False False False False False False False False False ... False False False False False False False False 1018.0 874.0
47345-2 False False False False False False False False False False ... False False False False False False False False 1019.0 257.0
47346-2 False False False False False False False False False False ... False False False False False False False False 1019.0 533.0

3309 rows × 29 columns

For reproducibility

import spatialdata

import numpy
import pandas

import matplotlib

import scanpy
import squidpy

%load_ext watermark
%watermark -v -m -p numpy,pandas,matplotlib,scanpy,squidpy,spatialdata
Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.3.0

numpy      : 1.23.4
pandas     : 2.2.2
matplotlib : 3.9.2
scanpy     : 1.10.2
squidpy    : 1.6.2.dev34+gb4a49c9.d20241030
spatialdata: 0.2.2

Compiler    : Clang 15.0.7 
OS          : Darwin
Release     : 22.2.0
Machine     : arm64
Processor   : arm
CPU cores   : 8
Architecture: 64bit