Manifold Metrics
Spatial Correspondence Metrics for Manifold Preservation Evaluation
This module provides comprehensive metrics for evaluating how well dimensionality reduction methods preserve manifold structure.
Key metric categories:
Neighborhood preservation: How well are local neighborhoods preserved?
Distance preservation: How well are geodesic distances preserved?
Topology preservation: How well is the global structure preserved?
Shape matching: How similar are the shapes after optimal alignment?
KNN-based Metrics Comparison:
This module provides three complementary k-nearest neighbor metrics:
knn_preservation_rate: Simple intersection-based metric
Measures: intersection(neighbors_original, neighbors_embedding) / k
Symmetric: Treats false positives and false negatives equally
Use when: You want a simple, interpretable overall score
trustworthiness: Focuses on avoiding false neighbors
Measures: How much can we trust that embedded neighbors are true neighbors?
Penalizes: Points that appear close in embedding but were far in original
Use when: False patterns in embedding would be problematic (e.g., clustering)
continuity: Focuses on preserving true neighbors
Measures: How well are original neighborhoods preserved in embedding?
Penalizes: True neighbors that become separated in embedding
Use when: Losing connections would miss important structure (e.g., manifolds)
For comprehensive evaluation, use all three metrics or combine trustworthiness and continuity, as they capture complementary aspects of embedding quality.
- driada.dim_reduction.manifold_metrics.compute_distance_matrix(X, metric='euclidean')[source]
Compute pairwise distance matrix.
- Parameters:
X (np.ndarray) – Data matrix of shape (n_samples, n_features)
metric (str) – Distance metric to use (default: ‘euclidean’)
- Returns:
Symmetric distance matrix of shape (n_samples, n_samples)
- Return type:
np.ndarray
- Raises:
ValueError – If X is not a 2D array
Notes
For empty arrays, returns a (1, 1) matrix due to scipy’s squareform behavior.
- driada.dim_reduction.manifold_metrics.knn_preservation_rate(X_high, X_low, k=10, flexible=False, flexibility_factor=2.0)[source]
Compute k-nearest neighbor preservation rate.
This metric measures what fraction of k nearest neighbors in the original high-dimensional space are preserved in the low-dimensional embedding. It provides a simple, symmetric measure of neighborhood preservation.
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
k (int) – Number of nearest neighbors to consider
flexible (bool) – If True, check if k-NN are within (k * flexibility_factor)-NN in embedding
flexibility_factor (float) – Factor to multiply k for flexible matching (default: 2.0)
- Returns:
Preservation rate between 0 and 1. Higher values indicate better neighborhood preservation.
- Return type:
Notes
This metric differs from trustworthiness and continuity in that it:
Treats false positives and false negatives equally
Uses exact neighborhood matching (or flexible matching if enabled)
Does not consider the ranking of points beyond the k-th neighbor
Use this metric when:
You want a simple, interpretable measure of neighborhood preservation
Both types of errors (missing neighbors and false neighbors) are equally important
You don’t need to distinguish between different types of embedding errors
Mathematical formulation:
preservation_rate = |N_k(i, high) ∩ N_k(i, low)| / kwhere N_k(i, space) is the set of k nearest neighbors of point i in that space.See also
trustworthinessFocuses on avoiding false neighbors in the embedding
continuityFocuses on preserving true neighbors from the original space
- driada.dim_reduction.manifold_metrics.trustworthiness(X_high, X_low, k=10)[source]
Compute trustworthiness of the embedding.
Trustworthiness measures how much we can trust that points nearby in the embedding are truly neighbors in the original space. It penalizes “false neighbors” - points that appear close in the embedding but were far apart in the original space.
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
k (int) – Number of nearest neighbors to consider
- Returns:
Trustworthiness score between 0 and 1. Higher values indicate that neighbors in the embedding can be trusted (few false neighbors).
- Return type:
Notes
Trustworthiness focuses on precision: are the neighbors we see in the embedding actually neighbors in the original space? This is important when false neighbors could lead to incorrect interpretations.
Use trustworthiness when:
You want to avoid spurious patterns in the embedding
False neighbors (points incorrectly appearing close) are problematic
You’re using the embedding for neighbor-based analysis or clustering
Mathematical formulation:
T(k) = 1 - (2/(Nk(2N-3k-1))) * sum_i sum_{j in U_k(i)} (r(i,j) - k)where
U_k(i)is the set of points among k-NN of i in the embedding but not in the original space, andr(i,j)is the rank of j as neighbor of i in the original space.See also
continuityComplementary metric focusing on preserving true neighbors
knn_preservation_rateSimple symmetric measure of neighborhood preservation
References
Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19(6-7), 889-899.
- driada.dim_reduction.manifold_metrics.continuity(X_high, X_low, k=10)[source]
Compute continuity of the embedding.
Continuity measures how well the embedding preserves the neighborhoods from the original space. It penalizes “missing neighbors” - points that were close in the original space but are far apart in the embedding.
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
k (int) – Number of nearest neighbors to consider
- Returns:
Continuity score between 0 and 1. Higher values indicate that original neighbors are preserved (few missing neighbors).
- Return type:
Notes
Continuity focuses on recall: are the true neighbors from the original space preserved in the embedding? This is important when losing important connections would miss critical structure.
Use continuity when:
You want to preserve all important relationships from the original data
Missing neighbors (losing true connections) is problematic
You’re studying the continuity of manifolds or connected structures
Mathematical formulation:
C(k) = 1 - (2/N*k*(2N-3k-1)) * sum_i sum_{j in Vk(i)} (r'(i,j) - k)where Vk(i) is the set of k-NN of i in the original but not in embedding, and r’(i,j) is the rank of j as neighbor of i in the embedding.Together with trustworthiness:
High trustworthiness + High continuity = Excellent embedding
High trustworthiness + Low continuity = Embedding compresses neighborhoods
Low trustworthiness + High continuity = Embedding creates false neighborhoods
Low trustworthiness + Low continuity = Poor embedding quality
See also
trustworthinessComplementary metric focusing on avoiding false neighbors
knn_preservation_rateSimple symmetric measure of neighborhood preservation
References
Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19(6-7), 889-899.
- driada.dim_reduction.manifold_metrics.geodesic_distance_correlation(X_high, X_low, k_neighbors=10, method='spearman')[source]
Compute correlation between geodesic distances on the manifold and Euclidean distances in the embedding.
Uses k-NN graph to approximate geodesic distances via shortest paths.
- Parameters:
- Returns:
Correlation coefficient between -1 and 1. Returns 0.0 if correlation cannot be computed (e.g., all distances are infinite).
- Return type:
- Raises:
ValueError – If X_high and X_low have different number of samples If k_neighbors >= n_samples
- driada.dim_reduction.manifold_metrics.stress(X_high, X_low, normalized=True)[source]
Compute stress (sum of squared differences in distances).
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
normalized (bool) – If True, normalize by sum of squared distances
- Returns:
Stress value (lower is better).
- Return type:
- Raises:
ValueError – If X_high and X_low have different number of samples. If normalized=True and all distances in X_high are zero (degenerate data).
Notes
Stress = Σ(d_ij^high - d_ij^low)² / Σ(d_ij^high)² if normalized, otherwise just Σ(d_ij^high - d_ij^low)²
- driada.dim_reduction.manifold_metrics.circular_structure_preservation(X_low, true_angles=None, k_neighbors=3)[source]
Evaluate preservation of circular structure in embedding.
- Parameters:
X_low (np.ndarray) – Low-dimensional embedding (n_samples, 2)
true_angles (np.ndarray, optional) – True angles if known (for synthetic data)
k_neighbors (int) – Number of neighbors for consecutive preservation
- Returns:
Dictionary containing various circular preservation metrics: - distance_cv: coefficient of variation of distances from center - consecutive_preservation: fraction with circular neighbors preserved - circular_correlation (if true_angles provided)
- Return type:
- Raises:
ValueError – If X_low is not 2D (shape[1] != 2) If k_neighbors >= n_samples If all points are at the center (degenerate circle)
- driada.dim_reduction.manifold_metrics.procrustes_analysis(X, Y, scaling=True, reflection=True)[source]
Perform Procrustes analysis to find optimal alignment.
- Parameters:
- Return type:
- Returns:
Y_aligned (np.ndarray) – Aligned version of Y
disparity (float) – Procrustes distance after alignment
transform_info (dict) – Dictionary containing transformation parameters: - ‘scale_factor’: float, the scaling factor applied - ‘is_reflected’: bool, whether reflection was detected - ‘rotation_matrix’: np.ndarray, the rotation matrix R
Notes
When reflection=False and a reflection is detected, the rotation matrix is corrected using SVD decomposition to remove the reflection component.
- driada.dim_reduction.manifold_metrics.manifold_preservation_score(X_high, X_low, k_neighbors=10, weights=None)[source]
Compute comprehensive manifold preservation score.
Combines multiple metrics into an overall assessment of how well the embedding preserves manifold structure.
- Parameters:
- Returns:
Dictionary containing individual metrics and overall score
- Return type:
Notes
Input validation is performed by the individual metric functions. NaN values in geodesic_correlation are replaced with 0.0.
- driada.dim_reduction.manifold_metrics.circular_distance(angles1, angles2)[source]
Compute circular distance between two sets of angles.
Calculates the shortest angular distance between angles on a circle, accounting for the circular nature where 0 and 2π are the same point. The result is always in [0, π].
- Parameters:
angles1 (np.ndarray) – First array of angles in radians.
angles2 (np.ndarray) – Second array of angles in radians. Must be broadcastable with angles1.
- Returns:
Circular distances between corresponding angles, in range [0, π].
- Return type:
np.ndarray
Notes
Uses the formula:
abs(arctan2(sin(θ₁-θ₂), cos(θ₁-θ₂)))to ensure the result is the shortest distance on the circle.
- driada.dim_reduction.manifold_metrics.circular_diff(angles)[source]
Compute differences between consecutive angles, handling circular wrapping.
Calculates angle[i+1] - angle[i] for each consecutive pair, ensuring the result represents the shortest angular distance with proper sign.
- Parameters:
angles (np.ndarray) – 1D array of angles in radians.
- Returns:
Array of wrapped differences in [-π, π]. Length is len(angles) - 1. Positive values indicate counter-clockwise movement.
- Return type:
np.ndarray
- Raises:
ValueError – If angles is empty or not 1D.
Examples
>>> angles = np.array([0, 3*np.pi/2, np.pi/4]) # 0°, 270°, 45° >>> diffs = circular_diff(angles) >>> # First diff: 270° - 0° = -90° (shortest path) >>> # Second diff: 45° - 270° = 135° (forward wrap)
- driada.dim_reduction.manifold_metrics.extract_angles_from_embedding(embedding)[source]
Extract angular positions from a 2D embedding.
Computes the angle (in radians) from the centroid to each point in a 2D embedding. Useful for detecting circular structure.
- Parameters:
embedding (np.ndarray) – 2D embedding with shape (n_samples, 2) where each row is a point in 2D space.
- Returns:
Array of angles in radians [-π, π] from the centroid to each point. Uses atan2 convention where 0 is along positive x-axis.
- Return type:
np.ndarray
- Raises:
ValueError – If embedding does not have exactly 2 dimensions (columns) If embedding is empty
- driada.dim_reduction.manifold_metrics.find_optimal_circular_alignment(true_angles, reconstructed_angles, allow_rotation=True, allow_reflection=True)[source]
Find optimal rotation and reflection to align circular data.
- Parameters:
- Return type:
- Returns:
optimal_offset (float) – Optimal rotation offset in radians
min_error (float) – Minimum mean circular distance after alignment
is_reflected (bool) – Whether reflection was applied
Notes
Uses grid search with 1-degree resolution for rotation offset. For higher precision, consider using scipy.optimize.minimize_scalar.
- driada.dim_reduction.manifold_metrics.compute_circular_correlation(angles1, angles2, offset=0.0)[source]
Compute circular correlation coefficient between two angular datasets.
Measures the similarity between two sets of circular data using complex representation. The result is invariant to common rotation.
- Parameters:
angles1 (np.ndarray) – First array of angles in radians.
angles2 (np.ndarray) – Second array of angles in radians. Must have same length as angles1.
offset (float, default=0.0) – Rotation offset to apply to angles2 before correlation.
- Returns:
Circular correlation coefficient in [0, 1]. Higher values indicate stronger circular correlation.
- Return type:
- Raises:
ValueError – If angles1 and angles2 have different lengths. If either array is empty. If there is no variation in the data (all values identical).
Notes
Uses complex representation z = exp(i*angle) and computes correlation in complex plane. Result is the absolute value of complex correlation.
- driada.dim_reduction.manifold_metrics.compute_reconstruction_error(embedding, true_variable, manifold_type='circular', allow_rotation=True, allow_reflection=True, allow_scaling=True)[source]
Compute reconstruction error between embedding and ground truth.
- Parameters:
embedding (np.ndarray) – Low-dimensional embedding
true_variable (np.ndarray) – Ground truth variable (angles or positions)
manifold_type (str) – Type of manifold (‘circular’ or ‘spatial’)
allow_rotation (bool) – Whether to allow rotation/translation
allow_reflection (bool) – Whether to allow reflection
allow_scaling (bool) – Whether to allow scaling
- Returns:
Dictionary containing: - error: reconstruction error - correlation: correlation after alignment - rotation_offset: optimal rotation (for circular) - is_reflected: whether reflection was applied - scale_factor: optimal scale (if applicable)
- Return type:
- Raises:
ValueError – If manifold_type is not ‘circular’ or ‘spatial’.
- driada.dim_reduction.manifold_metrics.compute_embedding_alignment_metrics(embedding, true_variable, manifold_type='circular', allow_rotation=True, allow_reflection=True, allow_scaling=True)[source]
Compute comprehensive alignment metrics between embedding and true variable.
- Parameters:
embedding (np.ndarray) – The embedding to evaluate. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – The true variable.
manifold_type (str, optional) – Type of manifold: ‘circular’ or ‘spatial’. Default is ‘circular’.
allow_rotation (bool, optional) – Whether to allow rotation when finding alignment. Default is True.
allow_reflection (bool, optional) – Whether to allow reflection when finding alignment. Default is True.
allow_scaling (bool, optional) – Whether to allow scaling when finding alignment. Default is True.
- Returns:
Dictionary containing all metrics from compute_reconstruction_error, plus for circular manifolds: - ‘velocity_correlation’: Correlation between angular velocities - ‘variance_ratio’: Ratio of embedding to true variance
- Return type:
- Raises:
ValueError – For circular manifolds: - If fewer than 3 points (cannot compute velocity correlation) - If velocity correlation is NaN (no variation in velocities) - If true variable has zero circular variance (all points at same angle)
- driada.dim_reduction.manifold_metrics.train_simple_decoder(embedding, true_variable, manifold_type='circular')[source]
Train a simple decoder to reconstruct true variable from embedding.
- Parameters:
embedding (np.ndarray) – The embedding features. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – The true variable to reconstruct. For circular manifolds, should be angles in radians. For spatial manifolds, assumes 1D array.
manifold_type (str, optional) – Type of manifold: ‘circular’ or ‘spatial’. Default is ‘circular’.
- Returns:
A decoder function that takes an embedding and returns reconstructed variable. The decoder expects input of the same shape as the training embedding.
- Return type:
callable
- Raises:
ValueError – If embedding and true_variable have different numbers of samples, or if manifold_type is not ‘circular’ or ‘spatial’.
Notes
For circular manifolds, trains separate regressors for sin and cos components. For spatial manifolds, performs direct regression. The embedding is always standardized before training.
- driada.dim_reduction.manifold_metrics.compute_embedding_quality(embedding, true_variable, manifold_type='circular', train_fraction=0.8, allow_rotation=True, allow_reflection=True, allow_scaling=True, random_state=42)[source]
Evaluate embedding quality using train/test split.
- Parameters:
embedding (np.ndarray) – The embedding to evaluate. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – The true variable.
manifold_type (str, optional) – Type of manifold: ‘circular’ or ‘spatial’. Default is ‘circular’.
train_fraction (float, optional) – Fraction of data to use for training. Default is 0.8.
allow_rotation (bool, optional) – Whether to allow rotation when finding alignment. Default is True.
allow_reflection (bool, optional) – Whether to allow reflection when finding alignment. Default is True.
allow_scaling (bool, optional) – Whether to allow scaling when finding alignment. Default is True.
random_state (int, optional) – Random seed for train/test split reproducibility. Default is 42.
- Returns:
Dictionary containing:
’train_error’: Reconstruction error on training set
’test_error’: Reconstruction error on test set
’generalization_gap’: Difference between test and train error (can be negative if test error is lower)
- Return type:
Notes
Uses random (not sequential) split of data to avoid domain shift issues that can occur with temporal/spatial data. The alignment is computed separately for train and test sets.
- driada.dim_reduction.manifold_metrics.compute_decoding_accuracy(embedding, true_variable, manifold_type='circular', train_fraction=0.8, random_state=42)[source]
Compute decoding accuracy using simple linear decoder.
This function trains a decoder to map from the embedding space to the true variables and measures how well it generalizes.
- Parameters:
embedding (np.ndarray) – Low-dimensional embedding. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – Ground truth variable (angles for circular, positions for spatial). Shape: (n_samples,) for circular or 1D spatial.
manifold_type (str, optional) – Type of manifold (‘circular’ or ‘spatial’). Default is ‘circular’.
train_fraction (float, optional) – Fraction of data to use for training. Default is 0.8. Must be between 0 and 1.
random_state (int, optional) – Random seed for train/test split reproducibility. Default is 42.
- Returns:
Dictionary containing: - ‘train_error’: float, training reconstruction error - ‘test_error’: float, testing reconstruction error - ‘test_r2’: float, proper R² score on test set - ‘generalization_gap’: float, difference (test_error - train_error)
- Return type:
Notes
Uses random (not sequential) split of data to avoid domain shift issues that can occur with temporal/spatial data. For reproducibility, the random_state parameter controls the split.
- driada.dim_reduction.manifold_metrics.manifold_reconstruction_score(embedding, true_variable, manifold_type='circular', weights=None, allow_rotation=True, allow_reflection=True, allow_scaling=True, random_state=42)[source]
Compute comprehensive manifold reconstruction score.
- Parameters:
embedding (np.ndarray) – Low-dimensional embedding. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – Ground truth variable (angles or positions).
manifold_type (str, optional) – Type of manifold (‘circular’ or ‘spatial’). Default is ‘circular’.
weights (dict, optional) – Weights for combining metrics. Keys should be ‘reconstruction_error’, ‘correlation’, and ‘decoding_accuracy’. If None, uses default weights that sum to 1.0.
allow_rotation (bool, optional) – Whether to allow rotation when finding alignment. Default is True.
allow_reflection (bool, optional) – Whether to allow reflection when finding alignment. Default is True.
allow_scaling (bool, optional) – Whether to allow scaling when finding alignment. Default is True.
random_state (int, optional) – Random seed for train/test split reproducibility. Default is 42.
- Returns:
Dictionary containing: - ‘reconstruction_error’: float, reconstruction error after alignment - ‘correlation’: float, correlation after alignment - ‘rotation_offset’: float, rotation offset applied - ‘is_reflected’: bool, whether reflection was applied - ‘decoding_train_error’: float, decoder training error - ‘decoding_test_error’: float, decoder test error - ‘generalization_gap’: float, decoder generalization gap - ‘overall_reconstruction_score’: float, weighted combination of metrics
- Return type:
Notes
For spatial manifolds, assumes data is normalized to unit scale for error normalization. Negative correlations are treated as 0 in scoring. The overall score is normalized to [0, 1] range where 1 is perfect.
Tools for evaluating the quality of dimensionality reduction embeddings.
Distance and Structure Metrics
- driada.dim_reduction.manifold_metrics.compute_distance_matrix(X, metric='euclidean')[source]
Compute pairwise distance matrix.
- Parameters:
X (np.ndarray) – Data matrix of shape (n_samples, n_features)
metric (str) – Distance metric to use (default: ‘euclidean’)
- Returns:
Symmetric distance matrix of shape (n_samples, n_samples)
- Return type:
np.ndarray
- Raises:
ValueError – If X is not a 2D array
Notes
For empty arrays, returns a (1, 1) matrix due to scipy’s squareform behavior.
- driada.dim_reduction.manifold_metrics.circular_distance(angles1, angles2)[source]
Compute circular distance between two sets of angles.
Calculates the shortest angular distance between angles on a circle, accounting for the circular nature where 0 and 2π are the same point. The result is always in [0, π].
- Parameters:
angles1 (np.ndarray) – First array of angles in radians.
angles2 (np.ndarray) – Second array of angles in radians. Must be broadcastable with angles1.
- Returns:
Circular distances between corresponding angles, in range [0, π].
- Return type:
np.ndarray
Notes
Uses the formula:
abs(arctan2(sin(θ₁-θ₂), cos(θ₁-θ₂)))to ensure the result is the shortest distance on the circle.
- driada.dim_reduction.manifold_metrics.circular_diff(angles)[source]
Compute differences between consecutive angles, handling circular wrapping.
Calculates angle[i+1] - angle[i] for each consecutive pair, ensuring the result represents the shortest angular distance with proper sign.
- Parameters:
angles (np.ndarray) – 1D array of angles in radians.
- Returns:
Array of wrapped differences in [-π, π]. Length is len(angles) - 1. Positive values indicate counter-clockwise movement.
- Return type:
np.ndarray
- Raises:
ValueError – If angles is empty or not 1D.
Examples
>>> angles = np.array([0, 3*np.pi/2, np.pi/4]) # 0°, 270°, 45° >>> diffs = circular_diff(angles) >>> # First diff: 270° - 0° = -90° (shortest path) >>> # Second diff: 45° - 270° = 135° (forward wrap)
Preservation Metrics
- driada.dim_reduction.manifold_metrics.knn_preservation_rate(X_high, X_low, k=10, flexible=False, flexibility_factor=2.0)[source]
Compute k-nearest neighbor preservation rate.
This metric measures what fraction of k nearest neighbors in the original high-dimensional space are preserved in the low-dimensional embedding. It provides a simple, symmetric measure of neighborhood preservation.
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
k (int) – Number of nearest neighbors to consider
flexible (bool) – If True, check if k-NN are within (k * flexibility_factor)-NN in embedding
flexibility_factor (float) – Factor to multiply k for flexible matching (default: 2.0)
- Returns:
Preservation rate between 0 and 1. Higher values indicate better neighborhood preservation.
- Return type:
Notes
This metric differs from trustworthiness and continuity in that it:
Treats false positives and false negatives equally
Uses exact neighborhood matching (or flexible matching if enabled)
Does not consider the ranking of points beyond the k-th neighbor
Use this metric when:
You want a simple, interpretable measure of neighborhood preservation
Both types of errors (missing neighbors and false neighbors) are equally important
You don’t need to distinguish between different types of embedding errors
Mathematical formulation:
preservation_rate = |N_k(i, high) ∩ N_k(i, low)| / kwhere N_k(i, space) is the set of k nearest neighbors of point i in that space.See also
trustworthinessFocuses on avoiding false neighbors in the embedding
continuityFocuses on preserving true neighbors from the original space
- driada.dim_reduction.manifold_metrics.trustworthiness(X_high, X_low, k=10)[source]
Compute trustworthiness of the embedding.
Trustworthiness measures how much we can trust that points nearby in the embedding are truly neighbors in the original space. It penalizes “false neighbors” - points that appear close in the embedding but were far apart in the original space.
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
k (int) – Number of nearest neighbors to consider
- Returns:
Trustworthiness score between 0 and 1. Higher values indicate that neighbors in the embedding can be trusted (few false neighbors).
- Return type:
Notes
Trustworthiness focuses on precision: are the neighbors we see in the embedding actually neighbors in the original space? This is important when false neighbors could lead to incorrect interpretations.
Use trustworthiness when:
You want to avoid spurious patterns in the embedding
False neighbors (points incorrectly appearing close) are problematic
You’re using the embedding for neighbor-based analysis or clustering
Mathematical formulation:
T(k) = 1 - (2/(Nk(2N-3k-1))) * sum_i sum_{j in U_k(i)} (r(i,j) - k)where
U_k(i)is the set of points among k-NN of i in the embedding but not in the original space, andr(i,j)is the rank of j as neighbor of i in the original space.See also
continuityComplementary metric focusing on preserving true neighbors
knn_preservation_rateSimple symmetric measure of neighborhood preservation
References
Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19(6-7), 889-899.
- driada.dim_reduction.manifold_metrics.continuity(X_high, X_low, k=10)[source]
Compute continuity of the embedding.
Continuity measures how well the embedding preserves the neighborhoods from the original space. It penalizes “missing neighbors” - points that were close in the original space but are far apart in the embedding.
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
k (int) – Number of nearest neighbors to consider
- Returns:
Continuity score between 0 and 1. Higher values indicate that original neighbors are preserved (few missing neighbors).
- Return type:
Notes
Continuity focuses on recall: are the true neighbors from the original space preserved in the embedding? This is important when losing important connections would miss critical structure.
Use continuity when:
You want to preserve all important relationships from the original data
Missing neighbors (losing true connections) is problematic
You’re studying the continuity of manifolds or connected structures
Mathematical formulation:
C(k) = 1 - (2/N*k*(2N-3k-1)) * sum_i sum_{j in Vk(i)} (r'(i,j) - k)where Vk(i) is the set of k-NN of i in the original but not in embedding, and r’(i,j) is the rank of j as neighbor of i in the embedding.Together with trustworthiness:
High trustworthiness + High continuity = Excellent embedding
High trustworthiness + Low continuity = Embedding compresses neighborhoods
Low trustworthiness + High continuity = Embedding creates false neighborhoods
Low trustworthiness + Low continuity = Poor embedding quality
See also
trustworthinessComplementary metric focusing on avoiding false neighbors
knn_preservation_rateSimple symmetric measure of neighborhood preservation
References
Venna, J., & Kaski, S. (2006). Local multidimensional scaling. Neural Networks, 19(6-7), 889-899.
- driada.dim_reduction.manifold_metrics.geodesic_distance_correlation(X_high, X_low, k_neighbors=10, method='spearman')[source]
Compute correlation between geodesic distances on the manifold and Euclidean distances in the embedding.
Uses k-NN graph to approximate geodesic distances via shortest paths.
- Parameters:
- Returns:
Correlation coefficient between -1 and 1. Returns 0.0 if correlation cannot be computed (e.g., all distances are infinite).
- Return type:
- Raises:
ValueError – If X_high and X_low have different number of samples If k_neighbors >= n_samples
- driada.dim_reduction.manifold_metrics.stress(X_high, X_low, normalized=True)[source]
Compute stress (sum of squared differences in distances).
- Parameters:
X_high (np.ndarray) – Original high-dimensional data (n_samples, n_features_high)
X_low (np.ndarray) – Low-dimensional embedding (n_samples, n_features_low)
normalized (bool) – If True, normalize by sum of squared distances
- Returns:
Stress value (lower is better).
- Return type:
- Raises:
ValueError – If X_high and X_low have different number of samples. If normalized=True and all distances in X_high are zero (degenerate data).
Notes
Stress = Σ(d_ij^high - d_ij^low)² / Σ(d_ij^high)² if normalized, otherwise just Σ(d_ij^high - d_ij^low)²
- driada.dim_reduction.manifold_metrics.manifold_preservation_score(X_high, X_low, k_neighbors=10, weights=None)[source]
Compute comprehensive manifold preservation score.
Combines multiple metrics into an overall assessment of how well the embedding preserves manifold structure.
- Parameters:
- Returns:
Dictionary containing individual metrics and overall score
- Return type:
Notes
Input validation is performed by the individual metric functions. NaN values in geodesic_correlation are replaced with 0.0.
Circular Manifold Analysis
- driada.dim_reduction.manifold_metrics.circular_structure_preservation(X_low, true_angles=None, k_neighbors=3)[source]
Evaluate preservation of circular structure in embedding.
- Parameters:
X_low (np.ndarray) – Low-dimensional embedding (n_samples, 2)
true_angles (np.ndarray, optional) – True angles if known (for synthetic data)
k_neighbors (int) – Number of neighbors for consecutive preservation
- Returns:
Dictionary containing various circular preservation metrics: - distance_cv: coefficient of variation of distances from center - consecutive_preservation: fraction with circular neighbors preserved - circular_correlation (if true_angles provided)
- Return type:
- Raises:
ValueError – If X_low is not 2D (shape[1] != 2) If k_neighbors >= n_samples If all points are at the center (degenerate circle)
- driada.dim_reduction.manifold_metrics.extract_angles_from_embedding(embedding)[source]
Extract angular positions from a 2D embedding.
Computes the angle (in radians) from the centroid to each point in a 2D embedding. Useful for detecting circular structure.
- Parameters:
embedding (np.ndarray) – 2D embedding with shape (n_samples, 2) where each row is a point in 2D space.
- Returns:
Array of angles in radians [-π, π] from the centroid to each point. Uses atan2 convention where 0 is along positive x-axis.
- Return type:
np.ndarray
- Raises:
ValueError – If embedding does not have exactly 2 dimensions (columns) If embedding is empty
- driada.dim_reduction.manifold_metrics.find_optimal_circular_alignment(true_angles, reconstructed_angles, allow_rotation=True, allow_reflection=True)[source]
Find optimal rotation and reflection to align circular data.
- Parameters:
- Return type:
- Returns:
optimal_offset (float) – Optimal rotation offset in radians
min_error (float) – Minimum mean circular distance after alignment
is_reflected (bool) – Whether reflection was applied
Notes
Uses grid search with 1-degree resolution for rotation offset. For higher precision, consider using scipy.optimize.minimize_scalar.
- driada.dim_reduction.manifold_metrics.compute_circular_correlation(angles1, angles2, offset=0.0)[source]
Compute circular correlation coefficient between two angular datasets.
Measures the similarity between two sets of circular data using complex representation. The result is invariant to common rotation.
- Parameters:
angles1 (np.ndarray) – First array of angles in radians.
angles2 (np.ndarray) – Second array of angles in radians. Must have same length as angles1.
offset (float, default=0.0) – Rotation offset to apply to angles2 before correlation.
- Returns:
Circular correlation coefficient in [0, 1]. Higher values indicate stronger circular correlation.
- Return type:
- Raises:
ValueError – If angles1 and angles2 have different lengths. If either array is empty. If there is no variation in the data (all values identical).
Notes
Uses complex representation z = exp(i*angle) and computes correlation in complex plane. Result is the absolute value of complex correlation.
Reconstruction and Alignment
- driada.dim_reduction.manifold_metrics.compute_reconstruction_error(embedding, true_variable, manifold_type='circular', allow_rotation=True, allow_reflection=True, allow_scaling=True)[source]
Compute reconstruction error between embedding and ground truth.
- Parameters:
embedding (np.ndarray) – Low-dimensional embedding
true_variable (np.ndarray) – Ground truth variable (angles or positions)
manifold_type (str) – Type of manifold (‘circular’ or ‘spatial’)
allow_rotation (bool) – Whether to allow rotation/translation
allow_reflection (bool) – Whether to allow reflection
allow_scaling (bool) – Whether to allow scaling
- Returns:
Dictionary containing: - error: reconstruction error - correlation: correlation after alignment - rotation_offset: optimal rotation (for circular) - is_reflected: whether reflection was applied - scale_factor: optimal scale (if applicable)
- Return type:
- Raises:
ValueError – If manifold_type is not ‘circular’ or ‘spatial’.
- driada.dim_reduction.manifold_metrics.compute_embedding_alignment_metrics(embedding, true_variable, manifold_type='circular', allow_rotation=True, allow_reflection=True, allow_scaling=True)[source]
Compute comprehensive alignment metrics between embedding and true variable.
- Parameters:
embedding (np.ndarray) – The embedding to evaluate. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – The true variable.
manifold_type (str, optional) – Type of manifold: ‘circular’ or ‘spatial’. Default is ‘circular’.
allow_rotation (bool, optional) – Whether to allow rotation when finding alignment. Default is True.
allow_reflection (bool, optional) – Whether to allow reflection when finding alignment. Default is True.
allow_scaling (bool, optional) – Whether to allow scaling when finding alignment. Default is True.
- Returns:
Dictionary containing all metrics from compute_reconstruction_error, plus for circular manifolds: - ‘velocity_correlation’: Correlation between angular velocities - ‘variance_ratio’: Ratio of embedding to true variance
- Return type:
- Raises:
ValueError – For circular manifolds: - If fewer than 3 points (cannot compute velocity correlation) - If velocity correlation is NaN (no variation in velocities) - If true variable has zero circular variance (all points at same angle)
- driada.dim_reduction.manifold_metrics.procrustes_analysis(X, Y, scaling=True, reflection=True)[source]
Perform Procrustes analysis to find optimal alignment.
- Parameters:
- Return type:
- Returns:
Y_aligned (np.ndarray) – Aligned version of Y
disparity (float) – Procrustes distance after alignment
transform_info (dict) – Dictionary containing transformation parameters: - ‘scale_factor’: float, the scaling factor applied - ‘is_reflected’: bool, whether reflection was detected - ‘rotation_matrix’: np.ndarray, the rotation matrix R
Notes
When reflection=False and a reflection is detected, the rotation matrix is corrected using SVD decomposition to remove the reflection component.
- driada.dim_reduction.manifold_metrics.manifold_reconstruction_score(embedding, true_variable, manifold_type='circular', weights=None, allow_rotation=True, allow_reflection=True, allow_scaling=True, random_state=42)[source]
Compute comprehensive manifold reconstruction score.
- Parameters:
embedding (np.ndarray) – Low-dimensional embedding. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – Ground truth variable (angles or positions).
manifold_type (str, optional) – Type of manifold (‘circular’ or ‘spatial’). Default is ‘circular’.
weights (dict, optional) – Weights for combining metrics. Keys should be ‘reconstruction_error’, ‘correlation’, and ‘decoding_accuracy’. If None, uses default weights that sum to 1.0.
allow_rotation (bool, optional) – Whether to allow rotation when finding alignment. Default is True.
allow_reflection (bool, optional) – Whether to allow reflection when finding alignment. Default is True.
allow_scaling (bool, optional) – Whether to allow scaling when finding alignment. Default is True.
random_state (int, optional) – Random seed for train/test split reproducibility. Default is 42.
- Returns:
Dictionary containing: - ‘reconstruction_error’: float, reconstruction error after alignment - ‘correlation’: float, correlation after alignment - ‘rotation_offset’: float, rotation offset applied - ‘is_reflected’: bool, whether reflection was applied - ‘decoding_train_error’: float, decoder training error - ‘decoding_test_error’: float, decoder test error - ‘generalization_gap’: float, decoder generalization gap - ‘overall_reconstruction_score’: float, weighted combination of metrics
- Return type:
Notes
For spatial manifolds, assumes data is normalized to unit scale for error normalization. Negative correlations are treated as 0 in scoring. The overall score is normalized to [0, 1] range where 1 is perfect.
Decoding and Quality Assessment
- driada.dim_reduction.manifold_metrics.compute_decoding_accuracy(embedding, true_variable, manifold_type='circular', train_fraction=0.8, random_state=42)[source]
Compute decoding accuracy using simple linear decoder.
This function trains a decoder to map from the embedding space to the true variables and measures how well it generalizes.
- Parameters:
embedding (np.ndarray) – Low-dimensional embedding. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – Ground truth variable (angles for circular, positions for spatial). Shape: (n_samples,) for circular or 1D spatial.
manifold_type (str, optional) – Type of manifold (‘circular’ or ‘spatial’). Default is ‘circular’.
train_fraction (float, optional) – Fraction of data to use for training. Default is 0.8. Must be between 0 and 1.
random_state (int, optional) – Random seed for train/test split reproducibility. Default is 42.
- Returns:
Dictionary containing: - ‘train_error’: float, training reconstruction error - ‘test_error’: float, testing reconstruction error - ‘test_r2’: float, proper R² score on test set - ‘generalization_gap’: float, difference (test_error - train_error)
- Return type:
Notes
Uses random (not sequential) split of data to avoid domain shift issues that can occur with temporal/spatial data. For reproducibility, the random_state parameter controls the split.
- driada.dim_reduction.manifold_metrics.compute_embedding_quality(embedding, true_variable, manifold_type='circular', train_fraction=0.8, allow_rotation=True, allow_reflection=True, allow_scaling=True, random_state=42)[source]
Evaluate embedding quality using train/test split.
- Parameters:
embedding (np.ndarray) – The embedding to evaluate. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – The true variable.
manifold_type (str, optional) – Type of manifold: ‘circular’ or ‘spatial’. Default is ‘circular’.
train_fraction (float, optional) – Fraction of data to use for training. Default is 0.8.
allow_rotation (bool, optional) – Whether to allow rotation when finding alignment. Default is True.
allow_reflection (bool, optional) – Whether to allow reflection when finding alignment. Default is True.
allow_scaling (bool, optional) – Whether to allow scaling when finding alignment. Default is True.
random_state (int, optional) – Random seed for train/test split reproducibility. Default is 42.
- Returns:
Dictionary containing:
’train_error’: Reconstruction error on training set
’test_error’: Reconstruction error on test set
’generalization_gap’: Difference between test and train error (can be negative if test error is lower)
- Return type:
Notes
Uses random (not sequential) split of data to avoid domain shift issues that can occur with temporal/spatial data. The alignment is computed separately for train and test sets.
- driada.dim_reduction.manifold_metrics.train_simple_decoder(embedding, true_variable, manifold_type='circular')[source]
Train a simple decoder to reconstruct true variable from embedding.
- Parameters:
embedding (np.ndarray) – The embedding features. Shape: (n_samples, n_dims).
true_variable (np.ndarray) – The true variable to reconstruct. For circular manifolds, should be angles in radians. For spatial manifolds, assumes 1D array.
manifold_type (str, optional) – Type of manifold: ‘circular’ or ‘spatial’. Default is ‘circular’.
- Returns:
A decoder function that takes an embedding and returns reconstructed variable. The decoder expects input of the same shape as the training embedding.
- Return type:
callable
- Raises:
ValueError – If embedding and true_variable have different numbers of samples, or if manifold_type is not ‘circular’ or ‘spatial’.
Notes
For circular manifolds, trains separate regressors for sin and cos components. For spatial manifolds, performs direct regression. The embedding is always standardized before training.