Dimensionality Reduction Algorithms
Core algorithms and method registry for dimensionality reduction.
Method Registry
- driada.dim_reduction.dr_base.METHODS_DICT = dict
Dictionary mapping method names to their DRMethod configurations.
Available methods:
pca- Principal Component Analysismds- Multi-dimensional Scalingisomap- Isometric Feature Mappinglle- Locally Linear Embeddinghlle- Hessian Locally Linear Embeddingle- Laplacian Eigenmapsdmaps- Diffusion Mapsmvu- Maximum Variance Unfoldingtsne- t-Distributed Stochastic Neighbor Embeddingumap- Uniform Manifold Approximation and Projectionae- Autoencodervae- Variational Autoencoder
Base Classes
- class driada.dim_reduction.dr_base.DRMethod(is_linear, requires_graph, requires_distmat, nn_based, handles_disconnected_graphs=0, default_params=None, default_graph_params=None, default_metric_params=None)[source]
Dimensionality reduction method configuration.
- handles_disconnected_graphs
Whether the method can handle disconnected graphs without preprocessing
- Type:
Notes
Boolean attributes are stored internally but accept 0/1 integer values for backward compatibility.
- __init__(is_linear, requires_graph, requires_distmat, nn_based, handles_disconnected_graphs=0, default_params=None, default_graph_params=None, default_metric_params=None)[source]
Initialize a DRMethod configuration object.
- Parameters:
is_linear (int or bool) – Whether the method is linear (1/True) or nonlinear (0/False).
requires_graph (int or bool) – Whether the method requires a proximity graph (1/True) or not (0/False).
requires_distmat (int or bool) – Whether the method requires a distance matrix (1/True) or not (0/False).
nn_based (int or bool) – Whether the method is neural network based (1/True) or not (0/False).
handles_disconnected_graphs (int or bool, default=0) – Whether the method can handle disconnected graphs without preprocessing.
default_params (dict or None, default=None) – Default parameters for the embedding method. If None, uses empty dict.
default_graph_params (dict or None, default=None) – Default graph construction parameters (if requires_graph).
default_metric_params (dict or None, default=None) – Default metric parameters (if requires weights).
Sequential Processing
- driada.dim_reduction.sequences.dr_sequence(data, steps, logger=None, keep_intermediate=False, validate_compatibility=True)[source]
Perform sequential dimensionality reduction with improved validation and error handling.
Applies multiple dimensionality reduction steps in sequence, where each step operates on the output of the previous step.
- Parameters:
data (MVData) – Initial high-dimensional data
steps (List[Union[Tuple[str, Dict], str]]) – List of reduction steps. Each step can be: - A tuple of (method_name, parameters_dict) - A string method name (uses default parameters)
logger (logging.Logger, optional) – Logger for tracking progress
keep_intermediate (bool, default False) – If True, returns a tuple of (final_embedding, intermediate_embeddings). If False, only returns final embedding to save memory.
validate_compatibility (bool, default True) – If True, validates dimension compatibility between consecutive steps.
- Returns:
If keep_intermediate=False: Final embedding after all reduction steps If keep_intermediate=True: (final_embedding, list_of_intermediate_embeddings)
- Return type:
- Raises:
ValueError – If steps list is empty. If any step has invalid format (not string or (method, params) tuple). If method name is not recognized. If dimension compatibility check fails between steps.
RuntimeError – If any step fails during execution, with context about which step failed.
Examples
Create sample data and perform sequential reduction:
>>> import numpy as np >>> from driada.dim_reduction import MVData, dr_sequence >>> np.random.seed(42) >>> >>> # Create sample high-dimensional data (100 samples, 50 features) >>> data = np.random.randn(50, 100) >>> mvdata = MVData(data) >>> >>> # Simple two-step reduction: PCA then t-SNE >>> import logging >>> # Suppress output for clean doctest >>> null_logger = logging.getLogger('null') >>> null_logger.setLevel(logging.CRITICAL) >>> embedding = dr_sequence( ... mvdata, ... steps=[ ... ('pca', {'dim': 10}), ... ('tsne', {'dim': 2, 'perplexity': 20, 'random_state': 42}) ... ], ... logger=null_logger ... ) Calculating PCA embedding... >>> embedding.coords.shape (2, 100)
Using default parameters with a simpler sequence:
>>> # Just PCA reduction >>> embedding_pca = dr_sequence( ... mvdata, ... steps=['pca'], ... logger=null_logger ... ) Calculating PCA embedding... >>> embedding_pca.coords.shape # Default is 2D (2, 100)
Keep intermediate results for analysis:
>>> # Two-step reduction keeping intermediates >>> final_emb, intermediates = dr_sequence( ... mvdata, ... steps=[('pca', {'dim': 20}), ('pca', {'dim': 3})], ... keep_intermediate=True, ... logger=null_logger ... ) Calculating PCA embedding... Calculating PCA embedding... >>> len(intermediates) 2 >>> intermediates[0].coords.shape (20, 100) >>> final_emb.coords.shape (3, 100)
Notes
Intermediate results converted to MVData between steps
Progress logged with actual dimensions for each step
Pre-validates all method names before execution
Optional dimension compatibility checking available
Memory-efficient by default (keep_intermediate=False)
Helper Functions
- driada.dim_reduction.dr_base.merge_params_with_defaults(method_name, user_params=None)[source]
Merge user parameters with method defaults.
- Parameters:
- Returns:
Dictionary with ‘e_params’, ‘g_params’, ‘m_params’ keys containing merged parameters.
- Return type:
- Raises:
ValueError – If method_name is not found in METHODS_DICT.
Notes
The function supports two input formats:
Structured format with explicit parameter groups: {‘e_params’: {…}, ‘g_params’: {…}, ‘m_params’: {…}}
Flat format where parameters are auto-distributed: - ‘n_neighbors’ → g_params[‘nn’] - ‘metric’ → m_params[‘metric_name’] - ‘sigma’ → m_params[‘sigma’] - ‘max_deleted_nodes’ → g_params[‘max_deleted_nodes’] - All others → e_params
The function also sets graph_preprocessing based on the method’s handles_disconnected_graphs property: - If True: graph_preprocessing = None - If False: graph_preprocessing = ‘giant_cc’
- driada.dim_reduction.dr_base.e_param_filter(para)[source]
Filter parameters to keep only those relevant for the embedding method.
Different dimensionality reduction methods require different parameters. This function ensures only the appropriate parameters are passed to avoid errors or warnings from unused parameters.
- Parameters:
para (dict) – Dictionary containing all embedding parameters. Must include ‘e_method_name’ key.
- Returns:
Filtered dictionary containing only parameters relevant to the specified embedding method.
- Return type:
- Raises:
KeyError – If ‘e_method_name’ key is missing from para dict.
Notes
All methods support: ‘e_method’, ‘e_method_name’, ‘dim’ (target dimension).
Method-specific parameters:
‘umap’: adds ‘min_dist’ (minimum distance in low-dimensional space)
‘dmaps’, ‘auto_dmaps’: adds ‘dm_alpha’ (diffusion maps alpha parameter) and ‘dm_t’ (diffusion time)
Unknown methods are accepted and will receive only the base parameters.
- driada.dim_reduction.dr_base.g_param_filter(para)[source]
Filter parameters to keep only those relevant for the graph method.
Different graph construction methods require different parameters. This function ensures only the appropriate parameters are passed to avoid errors or warnings from unused parameters.
- Parameters:
para (dict) – Dictionary containing all graph construction parameters. Must include ‘g_method_name’ key.
- Returns:
Filtered dictionary containing only parameters relevant to the specified graph construction method.
- Return type:
- Raises:
KeyError – If ‘g_method_name’ key is missing from para dict.
Notes
Supported graph methods and their specific parameters: - ‘knn’, ‘auto_knn’, ‘umap’: requires ‘nn’ (number of neighbors) - ‘eps’: requires ‘eps’ (radius) and ‘min_density’ (minimum graph density) - ‘eknn’: requires ‘eps’, ‘min_density’, and ‘nn’ - ‘tsne’: requires ‘perplexity’
All methods support: ‘g_method_name’, ‘max_deleted_nodes’, ‘weighted’, ‘dist_to_aff’, ‘graph_preprocessing’, ‘seed’.
Unknown methods are accepted and will receive only the base parameters.
- driada.dim_reduction.dr_base.m_param_filter(para)[source]
This function prunes parameters that are excessive for chosen distance matrix construction method.
- Parameters:
para (dict) – Dictionary with metric parameters including: - metric_name: str or callable - name of metric or custom metric function - sigma: float or None - bandwidth parameter - p: float - parameter for minkowski metric - Other metric-specific parameters
- Returns:
Filtered parameters appropriate for the chosen metric
- Return type:
- Raises:
KeyError – If ‘metric_name’ key is missing from para dict.
ValueError – If metric_name is unknown (not in named_distances, not ‘hyperbolic’, and not callable).
Notes
The special metric ‘hyperbolic’ is supported in addition to the standard pynndescent named_distances. Custom callable metrics are also supported.