Feature extraction using CellProfiler

In this tutorial, we show how to use Squidpy with functions from CellProfiler pipelines for image processing and feature extraction.

We’ll go through the following steps:

Load Visium fluorescence data.
Segment cells in Squidpy.
Calculate CellProfiler’s granularity features for image crops of Visium spots.
Compute clustering on the image features in Squidpy.

CellProfiler is typically used via its GUI interface to build image processing pipelines. First, download and install CellProfiler from the download page. Check the issues on CellProfiler Github in case of installation problems (can be tricky).

Note: In the future, CellProfiler functions will also be accessible via Python directly (according to the announcements of the CellProfiler team). Since this is not yet publicly well documented, we’ll restrict this tutorial to the laborious way of saving intermediate files to bridge Squidpy and CellProfiler. For information on how to use the cellprofiler-core package for Python integration, the following link might be helpful.

Import packages & data

import itertools
from pathlib import Path

from PIL import Image

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import anndata as ad
import scanpy as sc
import squidpy as sq

sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
sc.set_figure_params(facecolor="white", figsize=(8, 8))

# load the pre-processed dataset
img = sq.datasets.visium_fluo_image_crop()
adata = sq.datasets.visium_fluo_adata_crop()

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.21.6 scipy==1.8.0 pandas==1.4.2 scikit-learn==1.0.2 statsmodels==0.13.2 python-igraph==0.9.9 pynndescent==0.5.6
squidpy==1.2.2

sq.pl.spatial_scatter(adata, color="cluster")

../../_images/eb23e5b2a9e5079067b938cd9441c1c1b5f00cddab5e5c95708e9b78ca42fd8c.png

Setup files

Create some arbitrary empty base directory BASE_DIR and a folder imgs_dir = BASE_DIR + "tmp_imgs" in it.

BASE_DIR = "./squidpy_tutorial_cellprofiler/"
imgs_dir = BASE_DIR + "tmp_imgs/"
Path(imgs_dir).mkdir(parents=True, exist_ok=True)

Image segmentation

sq.im.process(
    img=img,
    layer="image",
    method="smooth",
    sigma=[2, 2, 0, 0],
)

sq.im.segment(
    img=img, layer="image_smooth", method="watershed", channel_ids=0, chunks=1000
)

# plot the resulting segmentation
fig, ax = plt.subplots(1, 2)
img_crop = img.crop_corner(2000, 2000, size=500)
img_crop.show(layer="image", channel=0, ax=ax[0])
img_crop.show(
    layer="segmented_watershed",
    channel=0,
    ax=ax[1],
)

../../_images/8eda3ab987a66a80ba07ac5f2e4041cc28274f771366c6df7f146821a8845272.png

Save image crops and segmentations

For each Visium spot we generate crops:

for crop, obs in itertools.islice(
    img.generate_spot_crops(
        adata, obs_names=adata.obs_names, return_obs=True, as_array=False
    ),
    2,
):
    fig, axes = plt.subplots(1, 2)
    crop.show("image_smooth", ax=axes[0])
    crop.show("segmented_watershed", ax=axes[1], cmap="Greys")
    plt.show()

../../_images/93aeecc507581ef4f85f2f5ebc9154c8541fb0a34ea557c23f3807f299a6cf36.png

../../_images/7dca466af3ae3169387899ecb9cc225d706213808ca00ea070d739b536f2b9a8.png

Save the fluorescence image and the segmentations of each crop.

for crop, obs in img.generate_spot_crops(
    adata, obs_names=adata.obs_names, return_obs=True, as_array=False
):
    Image.fromarray(
        (np.array(crop["image_smooth"]) * 255).squeeze().astype(np.uint8)
    ).save(imgs_dir + f"image_{obs}.png")
    Image.fromarray((np.array(crop["segmented_watershed"][:, :, 0])).squeeze()).save(
        imgs_dir + f"segments_{obs}.tif"
    )

CellProfiler Pipeline: Calculate Image Features

1. Open the CellProfiler GUI App. A new project should open automatically.

2. Save the project in `BASE_DIR`.

3. CP-Pipeline `Images`: Drag and Drop the folder `imgs_dir` into CellProfiler.

4. CP-Pipeline `NamesAndTypes`: Declare to load crops as color images and segmentations as objects. Crops and segmentations files are aligned automatically.

5. Convert image crops to gray images: Add `ColorToGray` module and define parameters.

6. Measure CellProfiler’s Granularity features within segments for each crop: Add `MeasureGranularity` module and define parameters.

7. Export results to csv.

8. Run CellProfiler Pipeline (takes several minutes).

You can now delete the images in imgs_dir by uncommenting the cell below:

# !rm -rf $imgs_dir

Cluster features

Load CellProfiler output of Granularity features of each Visium spot and rearrange the output for adata.obsm.

# load CellProfiler output
df = pd.read_csv(BASE_DIR + "/squidpy_Image.csv")
# set obs names as index
df.index = df["FileName_Spot_Crop"].apply(lambda s: s.split("_")[1].split(".")[0])
df.index.name = "obs"
# Get the measured Granularity features and rename them
features = [
    f"{stat}_Segments_Granularity_{i}_SpotGray"
    for stat in ["Mean", "StDev"]
    for i in range(1, 17)
]
df = df[features]
df.columns = [s.split("_")[0] + s.split("_")[2] + s.split("_")[3] for s in df.columns]

df.head()

	MeanGranularity1	MeanGranularity2	MeanGranularity3	MeanGranularity4	MeanGranularity5	MeanGranularity6	MeanGranularity7	MeanGranularity8	MeanGranularity9	MeanGranularity10	...	StDevGranularity7	StDevGranularity8	StDevGranularity9	StDevGranularity10	StDevGranularity11	StDevGranularity12	StDevGranularity13	StDevGranularity14	StDevGranularity15	StDevGranularity16
obs
AAACGAGACGGTTGAT-1	35.513010	8.562676	26.546550	14.187602	7.629247	3.334912	2.057867	0.623025	0.258596	0.336359	...	1.615079	0.490024	0.284772	0.251887	0.000000	0.000000	0.000000	0.196002	0.000000	0.0
AAAGGGATGTAGCAAG-1	34.115594	24.218906	20.290728	4.457829	0.832083	4.670038	3.027418	1.872175	0.694657	0.642282	...	4.840615	2.934531	1.024722	1.045698	0.174283	0.783332	0.156181	0.156181	0.156181	0.0
AAATGGCATGTCTTGT-1	35.680015	15.708321	15.551889	8.760072	5.392920	7.575340	5.474114	0.876559	0.121634	0.000000	...	2.082841	1.640197	0.227598	0.000000	0.392862	0.785725	1.172769	0.234967	0.000000	0.0
AAATGGTCAATGTGCC-1	27.243626	6.490712	9.602902	3.852946	3.829664	0.000000	0.000000	7.289044	8.814914	12.036803	...	0.000000	9.020724	6.604276	3.711367	1.988622	1.063700	0.303914	0.075979	0.151957	0.0
AAATTAACGGGTAGCT-1	33.575662	12.159623	21.226989	19.807115	8.313644	2.021116	0.463706	0.716149	0.283679	0.283679	...	0.328582	0.264752	0.219188	0.219188	0.000000	0.000000	0.219188	0.000000	0.000000	0.0

5 rows × 32 columns

Load image features into adata.obsm["features"] and compute Leiden clustering.

def cluster_features(features: pd.DataFrame):
    """Calculate leiden clustering of features."""
    # create temporary adata to calculate the clustering
    adata = ad.AnnData(features)
    # important - feature values are not scaled, so need to scale them before PCA
    sc.pp.scale(adata)
    # calculate leiden clustering
    sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)

    return adata.obs["leiden"]


# Save feature in adata.obsm
adata.obsm["features"] = df.loc[adata.obs_names].copy()

# Calculate leiden clustering
adata.obs["features_granularity_cluster"] = cluster_features(adata.obsm["features"])

# Plot clusterings
sq.pl.spatial_scatter(
    adata,
    color=[
        "features_granularity_cluster",
        "cluster",
    ],
    ncols=2,
)

/var/folders/pz/fw_kwtn10t52fps52c0jxm940000gn/T/ipykernel_8107/2650860513.py:5: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  adata = ad.AnnData(features)

../../_images/d1715e2d91a46cf458094b19adc9503ab171fec862679743dd32310f4f04b9d2.png

We see that e.g. the blue cluster is prominent in the Hippocampus and the Thalamus, and the brown cluster is prominent in dense regions like Dentate gyrus.

This tutorial is an example on how to integrate CellProfiler within Squidpy pipelines. You can adapt this scheme to other task distributions: e.g. segmentation with CellProfiler and image feature calculation in Squidpy, or process images with CellProfiler, followed by a Squidpy pipeline etc.