Dimensionality Reduction Algorithms

Core algorithms and method registry for dimensionality reduction.

Method Registry

driada.dim_reduction.dr_base.METHODS_DICT = dict

Dictionary mapping method names to their DRMethod configurations.

Available methods:

  • pca - Principal Component Analysis

  • mds - Multi-dimensional Scaling

  • isomap - Isometric Feature Mapping

  • lle - Locally Linear Embedding

  • hlle - Hessian Locally Linear Embedding

  • le - Laplacian Eigenmaps

  • dmaps - Diffusion Maps

  • mvu - Maximum Variance Unfolding

  • tsne - t-Distributed Stochastic Neighbor Embedding

  • umap - Uniform Manifold Approximation and Projection

  • ae - Autoencoder

  • vae - Variational Autoencoder

Base Classes

class driada.dim_reduction.dr_base.DRMethod(is_linear, requires_graph, requires_distmat, nn_based, handles_disconnected_graphs=0, default_params=None, default_graph_params=None, default_metric_params=None)[source]

Dimensionality reduction method configuration.

is_linear

Whether the method is linear

Type:

bool

requires_graph

Whether the method requires a proximity graph

Type:

bool

requires_distmat

Whether the method requires a distance matrix

Type:

bool

nn_based

Whether the method is neural network based

Type:

bool

handles_disconnected_graphs

Whether the method can handle disconnected graphs without preprocessing

Type:

bool

default_params

Default parameters for the embedding method

Type:

dict

default_graph_params

Default graph construction parameters (if requires_graph)

Type:

dict or None

default_metric_params

Default metric parameters (if requires weights)

Type:

dict or None

Notes

Boolean attributes are stored internally but accept 0/1 integer values for backward compatibility.

__init__(is_linear, requires_graph, requires_distmat, nn_based, handles_disconnected_graphs=0, default_params=None, default_graph_params=None, default_metric_params=None)[source]

Initialize a DRMethod configuration object.

Parameters:
  • is_linear (int or bool) – Whether the method is linear (1/True) or nonlinear (0/False).

  • requires_graph (int or bool) – Whether the method requires a proximity graph (1/True) or not (0/False).

  • requires_distmat (int or bool) – Whether the method requires a distance matrix (1/True) or not (0/False).

  • nn_based (int or bool) – Whether the method is neural network based (1/True) or not (0/False).

  • handles_disconnected_graphs (int or bool, default=0) – Whether the method can handle disconnected graphs without preprocessing.

  • default_params (dict or None, default=None) – Default parameters for the embedding method. If None, uses empty dict.

  • default_graph_params (dict or None, default=None) – Default graph construction parameters (if requires_graph).

  • default_metric_params (dict or None, default=None) – Default metric parameters (if requires weights).

Sequential Processing

driada.dim_reduction.sequences.dr_sequence(data, steps, logger=None, keep_intermediate=False, validate_compatibility=True)[source]

Perform sequential dimensionality reduction with improved validation and error handling.

Applies multiple dimensionality reduction steps in sequence, where each step operates on the output of the previous step.

Parameters:
  • data (MVData) – Initial high-dimensional data

  • steps (List[Union[Tuple[str, Dict], str]]) – List of reduction steps. Each step can be: - A tuple of (method_name, parameters_dict) - A string method name (uses default parameters)

  • logger (logging.Logger, optional) – Logger for tracking progress

  • keep_intermediate (bool, default False) – If True, returns a tuple of (final_embedding, intermediate_embeddings). If False, only returns final embedding to save memory.

  • validate_compatibility (bool, default True) – If True, validates dimension compatibility between consecutive steps.

Returns:

If keep_intermediate=False: Final embedding after all reduction steps If keep_intermediate=True: (final_embedding, list_of_intermediate_embeddings)

Return type:

Embedding or Tuple[Embedding, List[Embedding]]

Raises:
  • ValueError – If steps list is empty. If any step has invalid format (not string or (method, params) tuple). If method name is not recognized. If dimension compatibility check fails between steps.

  • RuntimeError – If any step fails during execution, with context about which step failed.

Examples

Create sample data and perform sequential reduction:

>>> import numpy as np
>>> from driada.dim_reduction import MVData, dr_sequence
>>> np.random.seed(42)
>>>
>>> # Create sample high-dimensional data (100 samples, 50 features)
>>> data = np.random.randn(50, 100)
>>> mvdata = MVData(data)
>>>
>>> # Simple two-step reduction: PCA then t-SNE
>>> import logging
>>> # Suppress output for clean doctest
>>> null_logger = logging.getLogger('null')
>>> null_logger.setLevel(logging.CRITICAL)
>>> embedding = dr_sequence(
...     mvdata,
...     steps=[
...         ('pca', {'dim': 10}),
...         ('tsne', {'dim': 2, 'perplexity': 20, 'random_state': 42})
...     ],
...     logger=null_logger
... )  
Calculating PCA embedding...
>>> embedding.coords.shape
(2, 100)

Using default parameters with a simpler sequence:

>>> # Just PCA reduction
>>> embedding_pca = dr_sequence(
...     mvdata,
...     steps=['pca'],
...     logger=null_logger
... )  
Calculating PCA embedding...
>>> embedding_pca.coords.shape  # Default is 2D
(2, 100)

Keep intermediate results for analysis:

>>> # Two-step reduction keeping intermediates
>>> final_emb, intermediates = dr_sequence(
...     mvdata,
...     steps=[('pca', {'dim': 20}), ('pca', {'dim': 3})],
...     keep_intermediate=True,
...     logger=null_logger
... )  
Calculating PCA embedding...
Calculating PCA embedding...
>>> len(intermediates)
2
>>> intermediates[0].coords.shape
(20, 100)
>>> final_emb.coords.shape
(3, 100)

Notes

  • Intermediate results converted to MVData between steps

  • Progress logged with actual dimensions for each step

  • Pre-validates all method names before execution

  • Optional dimension compatibility checking available

  • Memory-efficient by default (keep_intermediate=False)

Helper Functions

driada.dim_reduction.dr_base.merge_params_with_defaults(method_name, user_params=None)[source]

Merge user parameters with method defaults.

Parameters:
  • method_name (str) – Name of the DR method. Must be one of the keys in METHODS_DICT.

  • user_params (dict or None) – User-provided parameters. Can contain ‘e_params’, ‘g_params’, ‘m_params’ keys for structured format, or direct parameter values for flat format.

Returns:

Dictionary with ‘e_params’, ‘g_params’, ‘m_params’ keys containing merged parameters.

Return type:

dict

Raises:

ValueError – If method_name is not found in METHODS_DICT.

Notes

The function supports two input formats:

  1. Structured format with explicit parameter groups: {‘e_params’: {…}, ‘g_params’: {…}, ‘m_params’: {…}}

  2. Flat format where parameters are auto-distributed: - ‘n_neighbors’ → g_params[‘nn’] - ‘metric’ → m_params[‘metric_name’] - ‘sigma’ → m_params[‘sigma’] - ‘max_deleted_nodes’ → g_params[‘max_deleted_nodes’] - All others → e_params

The function also sets graph_preprocessing based on the method’s handles_disconnected_graphs property: - If True: graph_preprocessing = None - If False: graph_preprocessing = ‘giant_cc’

driada.dim_reduction.dr_base.e_param_filter(para)[source]

Filter parameters to keep only those relevant for the embedding method.

Different dimensionality reduction methods require different parameters. This function ensures only the appropriate parameters are passed to avoid errors or warnings from unused parameters.

Parameters:

para (dict) – Dictionary containing all embedding parameters. Must include ‘e_method_name’ key.

Returns:

Filtered dictionary containing only parameters relevant to the specified embedding method.

Return type:

dict

Raises:

KeyError – If ‘e_method_name’ key is missing from para dict.

Notes

All methods support: ‘e_method’, ‘e_method_name’, ‘dim’ (target dimension).

Method-specific parameters:

  • ‘umap’: adds ‘min_dist’ (minimum distance in low-dimensional space)

  • ‘dmaps’, ‘auto_dmaps’: adds ‘dm_alpha’ (diffusion maps alpha parameter) and ‘dm_t’ (diffusion time)

Unknown methods are accepted and will receive only the base parameters.

driada.dim_reduction.dr_base.g_param_filter(para)[source]

Filter parameters to keep only those relevant for the graph method.

Different graph construction methods require different parameters. This function ensures only the appropriate parameters are passed to avoid errors or warnings from unused parameters.

Parameters:

para (dict) – Dictionary containing all graph construction parameters. Must include ‘g_method_name’ key.

Returns:

Filtered dictionary containing only parameters relevant to the specified graph construction method.

Return type:

dict

Raises:

KeyError – If ‘g_method_name’ key is missing from para dict.

Notes

Supported graph methods and their specific parameters: - ‘knn’, ‘auto_knn’, ‘umap’: requires ‘nn’ (number of neighbors) - ‘eps’: requires ‘eps’ (radius) and ‘min_density’ (minimum graph density) - ‘eknn’: requires ‘eps’, ‘min_density’, and ‘nn’ - ‘tsne’: requires ‘perplexity’

All methods support: ‘g_method_name’, ‘max_deleted_nodes’, ‘weighted’, ‘dist_to_aff’, ‘graph_preprocessing’, ‘seed’.

Unknown methods are accepted and will receive only the base parameters.

driada.dim_reduction.dr_base.m_param_filter(para)[source]

This function prunes parameters that are excessive for chosen distance matrix construction method.

Parameters:

para (dict) – Dictionary with metric parameters including: - metric_name: str or callable - name of metric or custom metric function - sigma: float or None - bandwidth parameter - p: float - parameter for minkowski metric - Other metric-specific parameters

Returns:

Filtered parameters appropriate for the chosen metric

Return type:

dict

Raises:
  • KeyError – If ‘metric_name’ key is missing from para dict.

  • ValueError – If metric_name is unknown (not in named_distances, not ‘hyperbolic’, and not callable).

Notes

The special metric ‘hyperbolic’ is supported in addition to the standard pynndescent named_distances. Custom callable metrics are also supported.