Effective Dimensionality Estimation

This module implements effective dimensionality estimation based on participation ratio and Rényi entropy methods.

Functions

driada.dimensionality.effective.eff_dim(data, enable_correction, q=2, **correction_kwargs)[source]

Compute effective dimension of multivariate data.

Parameters:
  • data (array-like of shape (n_samples, n_features)) – Input data matrix where rows are samples and columns are features. This follows the standard scikit-learn convention.

  • enable_correction (bool) – Whether to apply finite-sample spectrum correction to eigenvalues.

  • q (float, default=2) – Order of Renyi entropy (see _eff_dim for details).

  • **correction_kwargs – Additional arguments for spectrum correction (see correct_cov_spectrum).

Returns:

Effective dimension of the data.

Return type:

float

Raises:

ValueError – If data has invalid shape or if computation fails.

Notes

When n_features/n_samples > 0.01, spectrum correction is recommended to account for finite-sample biases in eigenvalue estimation.

The data should be organized with samples as rows and features as columns, following the standard machine learning convention. For neural data, this means timepoints as rows and neurons as columns.

Examples

>>> import numpy as np
>>> # Generate data with 3 effective dimensions
>>> n_samples, n_features = 1000, 50
>>> np.random.seed(42)  # For reproducible results
>>> latent = np.random.randn(n_samples, 3)
>>> mixing = np.random.randn(3, n_features)
>>> data = latent @ mixing + 0.1 * np.random.randn(n_samples, n_features)
>>>
>>> # Compute effective dimension
>>> eff_d = eff_dim(data, enable_correction=False, q=2)
>>> print(f"Effective dimension: {eff_d:.2f}")  # Should be close to 3
Effective dimension: 3.00

Usage Examples

Participation Ratio Method

from driada.dimensionality import eff_dim
import numpy as np

# Neural population data
neural_data = np.random.randn(1000, 100)  # 1000 timepoints, 100 neurons

# Basic effective dimension (no correction)
d_eff = eff_dim(neural_data, enable_correction=False)
print(f"Effective dimensionality: {d_eff:.2f}")

# With bias correction (recommended for finite samples)
d_eff_corrected = eff_dim(neural_data, enable_correction=True)
print(f"Corrected effective dim: {d_eff_corrected:.2f}")

Theory

Effective dimensionality quantifies how many dimensions are “effectively” used by the data:

Participation Ratio:

\[D_{eff} = \frac{(\sum_i \lambda_i)^2}{\sum_i \lambda_i^2}\]

where \(\lambda_i\) are the eigenvalues of the covariance matrix.

Rényi Entropy:

\[D_{\alpha} = \frac{1}{1-\alpha} \log \sum_i p_i^{\alpha}\]

The participation ratio is a special case of Rényi entropy dimension with \(\alpha = 2\).

Interpretation

  • Low effective dimension (e.g., 2-5): Data lies on a low-dimensional manifold

  • Medium dimension (e.g., 10-20): Moderate complexity, typical for many neural recordings

  • High dimension (approaching number of features): Data spans the full space, possibly noise-dominated

The corrected version accounts for finite sample bias, providing more accurate estimates for small datasets.