squidpy.gr.ripley

squidpy.gr.ripley(adata, cluster_key, mode='F', spatial_key='spatial', metric='euclidean', n_neigh=2, n_simulations=100, n_observations=1000, max_dist=None, n_steps=50, seed=None, copy=False)[source]

Calculate various Ripley’s statistics for point processes.

According to the ‘mode’ argument, it calculates one of the following Ripley’s statistics: ‘F’, ‘G’ or ‘L’ statistics.

‘F’, ‘G’ are defined as:

\[F(t),G(t)=P( d_{i,j} \le t )\]

Where \(d_{i,j}\) represents:

  • distances to a random Spatial Poisson Point Process for ‘F’.

  • distances to any other point of the dataset for ‘G’.

‘L’ we first need to compute \(K(t)\), which is defined as:

\[K(t) = \frac{1}{\lambda} \sum_{i \ne j} \frac{I(d_{i,j}<t)}{n}\]

and then we apply a variance-stabilizing transformation:

\[L(t) = (\frac{K(t)}{\pi})^{1/2}\]
Parameters:
  • adata (AnnData | SpatialData) – Annotated data object.

  • cluster_key (str) – Key in anndata.AnnData.obs where clustering is stored.

  • mode (Literal['F', 'G', 'L']) – Which Ripley’s statistic to compute.

  • spatial_key (str) – Key in anndata.AnnData.obsm where spatial coordinates are stored.

  • metric (str) – Which metric to use for computing distances. For available metrics, check out sklearn.neighbors.DistanceMetric.

  • n_neigh (int) – Number of neighbors to consider for the KNN graph.

  • n_simulations (int) – How many simulations to run for computing p-values.

  • n_observations (int) – How many observations to generate for the Spatial Poisson Point Process.

  • max_dist (Optional[float]) – Maximum distances for the support. If None, max_dist=\(\sqrt{area \over 2}\).

  • n_steps (int) – Number of steps for the support.

  • seed (Optional[int]) – Random seed for reproducibility.

  • copy (bool) – If True, return the result, otherwise save it to the adata object.

Return type:

dict[str, DataFrame | ndarray[Any, dtype[Any]]]

Returns:

: If copy = True, returns a dict with following keys:

  • ’{mode}_stat’ - pandas.DataFrame containing the statistics of choice for the real observations.

  • ’sims_stat’ - pandas.DataFrame containing the statistics of choice for the simulations.

  • ’bins’ - numpy.ndarray containing the support.

  • ’pvalues’ - numpy.ndarray containing the p-values for the statistics of interest.

Otherwise, modifies the adata object with the following key:

Statistics and p-values are computed for each cluster anndata.AnnData.obs ['{cluster_key}'] separately.

References

For reference, check out Wikipedia or [Baddeley et al., 2015].