```
%matplotlib inline
```

# Analyze Merfish data

This tutorial shows how to apply Squidpy for the analysis of Merfish data.

The data used here was obtained from [Moffitt *et al.*, 2018].
We provide a pre-processed subset of the data, in `anndata.AnnData`

format.
For details on how it was pre-processed, please refer to the original paper.

## Import packages & data

To run the notebook locally, create a conda environment as *conda env create -f environment.yml* using this
`environment.yml <https://github.com/scverse/squidpy_notebooks/blob/main/environment.yml>`

_.

```
import scanpy as sc
import squidpy as sq
sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
# load the pre-processed dataset
adata = sq.datasets.merfish()
adata
```

```
scanpy==1.9.2 anndata==0.8.0 umap==0.5.3 numpy==1.21.0 scipy==1.9.3 pandas==1.5.1 scikit-learn==1.1.3 statsmodels==0.13.2 python-igraph==0.10.2 pynndescent==0.5.7
squidpy==1.2.2
```

```
100%|██████████| 49.2M/49.2M [00:07<00:00, 7.05MB/s]
```

```
AnnData object with n_obs × n_vars = 73655 × 161
obs: 'Cell_ID', 'Animal_ID', 'Animal_sex', 'Behavior', 'Bregma', 'Centroid_X', 'Centroid_Y', 'Cell_class', 'Neuron_cluster_ID', 'batch'
uns: 'Cell_class_colors'
obsm: 'spatial', 'spatial3d'
```

This datasets consists of consecutive slices from the mouse hypothalamic preoptic region.
It represents an interesting example of how to work with 3D spatial data in Squidpy.
Let’s start with visualization: we can either visualize the 3D stack of slides
using `scanpy.pl.embedding()`

:

```
sc.pl.embedding(adata, basis="spatial3d", projection="3d", color="Cell_class")
```

Or visualize a single slide with `squidpy.pl.spatial_scatter()`

. Here the slide identifier
is stored in `adata.obs["Bregma"]`

, see original paper for definition.

```
sq.pl.spatial_scatter(
adata[adata.obs.Bregma == -9], shape=None, color="Cell_class", size=1
)
```

```
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
```

## Neighborhood enrichment analysis in 3D

It is important to consider whether the analysis should be performed on the 3D
spatial coordinates or the 2D coordinates for a single slice. Functions that
make use of the spatial graph can already support 3D coordinates, but it is important
to consider that the z-stack coordinate is in the same unit metrics as the x, y coordinates.
Let’s start with the neighborhood enrichment score. You can read more on the function
in the docs at Building spatial neighbors graph.
First, we need to compute a neighbor graph with `squidpy.gr.spatial_neighbors()`

.
If we want to compute the neighbor graph on the 3D coordinate space,
we need to specify `spatial_key = "spatial3d"`

.
Then we can use `squidpy.gr.nhood_enrichment()`

to compute the score, and visualize
it with `squidpy.pl.nhood_enrichment()`

.

```
sq.gr.spatial_neighbors(adata, coord_type="generic", spatial_key="spatial3d")
sq.gr.nhood_enrichment(adata, cluster_key="Cell_class")
sq.pl.nhood_enrichment(
adata, cluster_key="Cell_class", method="single", cmap="inferno", vmin=-50, vmax=100
)
```

```
100%|██████████| 1000/1000 [00:11<00:00, 88.86/s]
```

We can visualize some of the co-enriched clusters with `scanpy.pl.embedding()`

.
We will set `na_colors=(1,1,1,0)`

to make transparent the other observations,
in order to better visualize the clusters of interests across z-stacks.

```
sc.pl.embedding(
adata,
basis="spatial3d",
groups=["OD Mature 1", "OD Mature 2", "OD Mature 4"],
na_color=(1, 1, 1, 0),
projection="3d",
color="Cell_class",
)
```

We can also visualize gene expression in 3D coordinates. Let’s perform differential
expression testing with `scanpy.tl.rank_genes_groups()`

and visualize the results

```
sc.tl.rank_genes_groups(adata, groupby="Cell_class")
sc.pl.rank_genes_groups(adata, groupby="Cell_class")
```

```
WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
```

and the expression in 3D.

```
sc.pl.embedding(adata, basis="spatial3d", projection="3d", color=["Gad1", "Mlc1"])
```

If the same analysis should be performed on a single slice, then it is advisable to
copy the sample of interest in a new `anndata.AnnData`

and use it as
a standard 2D spatial data object.

```
adata_slice = adata[adata.obs.Bregma == -9].copy()
sq.gr.spatial_neighbors(adata_slice, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="Cell_class")
sq.pl.spatial_scatter(
adata_slice,
color="Cell_class",
shape=None,
groups=[
"Ependymal",
"Pericytes",
"Endothelial 2",
],
size=10,
)
```

```
100%|██████████| 1000/1000 [00:11<00:00, 87.51/s]
```

```
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
```

## Spatially variable genes with spatial autocorrelation statistics

With Squidpy we can investigate spatial variability of gene expression.
This is an example of a function that only supports 2D data.
`squidpy.gr.spatial_autocorr()`

conveniently wraps two
spatial autocorrelation statistics: *Moran’s I* and *Geary’s C*.
They provide a score on the degree of spatial variability of gene expression.
The statistic as well as the p-value are computed for each gene, and FDR correction
is performed. For the purpose of this tutorial, let’s compute the *Moran’s I* score.
The results are stored in `adata.uns['moranI']`

and we can visualize selected genes
with `squidpy.pl.spatial_scatter()`

.

```
sq.gr.spatial_autocorr(adata_slice, mode="moran")
adata_slice.uns["moranI"].head()
sq.pl.spatial_scatter(
adata_slice, shape=None, color=["Cd24a", "Necab1", "Mlc1"], size=3
)
```

```
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
```