squidpy.gr.ripley

squidpy.gr.ripley(adata, cluster_key, mode='F', spatial_key='spatial', metric='euclidean', n_neigh=2, n_simulations=100, n_observations=1000, max_dist=None, n_steps=50, seed=None, copy=False, *, table_key=None)[source]

Calculate various Ripley’s statistics for point processes.

Changed in version 1.8.4: Every permutation now uses an independent numpy.random.Generator spawned from a numpy.random.SeedSequence. Consequently the permutation-based results no longer depend on n_jobs / backend, but results obtained with a given seed differ from those produced by squidpy < 1.8.4. See #1232 and #1233.

According to the ‘mode’ argument, it calculates one of the following Ripley’s statistics: ‘F’, ‘G’ or ‘L’ statistics.

‘F’, ‘G’ are defined as:

\[F(t),G(t)=P( d_{i,j} \le t )\]

Where \(d_{i,j}\) represents:

distances to a random Spatial Poisson Point Process for ‘F’.

distances to any other point of the dataset for ‘G’.

‘L’ we first need to compute \(K(t)\), which is defined as:

\[K(t) = \frac{1}{\lambda} \sum_{i \ne j} \frac{I(d_{i,j}<t)}{n}\]

and then we apply a variance-stabilizing transformation:

\[L(t) = (\frac{K(t)}{\pi})^{1/2}\]

Parameters:

adata (AnnData | SpatialData) – Annotated data object.
table_key (str | None) – Key in spatialdata.SpatialData.tables where the table is stored. Required when adata is a spatialdata.SpatialData object and ignored otherwise.
cluster_key (str) – Key in anndata.AnnData.obs where clustering is stored.
mode (Literal['F', 'G', 'L']) – Which Ripley’s statistic to compute.
spatial_key (str) – Key in anndata.AnnData.obsm where spatial coordinates are stored.
metric (str) – Which metric to use for computing distances. For available metrics, check out sklearn.metrics.DistanceMetric. For Ripley’s L specifically, only metrics supported by sklearn.neighbors.KDTree are valid (see its valid_metrics attribute).
n_neigh (int) – Number of neighbors to consider for the KNN graph.
n_simulations (int) – How many simulations to run for computing p-values.
n_observations (int) – How many observations to generate for the Spatial Poisson Point Process.
max_dist (float | None) – Maximum distances for the support. If None, max_dist=\(\sqrt{area \over 2}\).
n_steps (int) – Number of steps for the support.
seed (int | None) – Random seed for reproducibility.
copy (bool) – If True, return the result, otherwise save it to the adata object.

Return type:

dict[str, DataFrame | ndarray[tuple[Any, ...], dtype[Any]]]

Returns:

If copy = True, returns a dict with following keys:

’{mode}_stat’ - pandas.DataFrame containing the statistics of choice for the real observations.

’sims_stat’ - pandas.DataFrame containing the statistics of choice for the simulations.

’bins’ - numpy.ndarray containing the support.

’pvalues’ - numpy.ndarray containing the p-values for the statistics of interest.

Otherwise, modifies the adata object with the following key:

anndata.AnnData.uns ['{key_added}'] - the above mentioned dict.

Statistics and p-values are computed for each cluster anndata.AnnData.obs ['{cluster_key}'] separately.

References

For reference, check out Wikipedia or [Baddeley et al., 2015].