INTENSE Pipelines

driada.intense.pipelines.substitute_circular_with_2d(feat_ids, exp, verbose=False)[source]

Substitute circular features with their _2d (cos, sin) counterparts.

For features detected as circular that have a corresponding {name}_2d MultiTimeSeries in the experiment, replaces the feature ID with the _2d version.

Parameters:
  • feat_ids (list) – List of feature IDs (strings or tuples for multi-features).

  • exp (Experiment) – Experiment object containing dynamic_features.

  • verbose (bool, default=False) – If True, print substitution information.

Returns:

(new_feat_ids, substitutions) where substitutions is a list of (original, substituted) tuples.

Return type:

tuple

Examples

>>> # Assuming exp has circular feature 'headdirection' with _2d version
>>> feat_ids = ['headdirection', 'speed']  
>>> new_ids, subs = substitute_circular_with_2d(feat_ids, exp)  
>>> new_ids  
['headdirection_2d', 'speed']
driada.intense.pipelines.compute_cell_feat_significance(exp, cell_bunch=None, feat_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, skip_delays=[], shift_window=2, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, with_disentanglement=False, feat_feat_pval_thr=0.01, multifeature_map=None, duplicate_behavior='ignore', engine='auto', store_random_shifts=False, profile=False, pre_filter_func=None, post_filter_func=None, filter_kwargs=None, remove_anti_selective=True, use_circular_2d=True)[source]

Calculates significant neuron-feature pairs

Parameters:
  • exp (Experiment) – Experiment object to read and write data from

  • cell_bunch (int, iterable or None, optional) – Neuron indices. By default, (cell_bunch=None), all neurons will be taken

  • feat_bunch (str, iterable or None, optional) – Feature names. By default, (feat_bunch=None), all single features will be taken

  • data_type (str, optional) – Data type used for INTENSE computations. Can be ‘calcium’ or ‘spikes’. Default is ‘calcium’

  • metric (str, optional) – Similarity metric between TimeSeries. Default is ‘mi’

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ or ‘ksg’. Default is ‘gcmi’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str, optional) –

    Computation mode. 3 modes are available:

    • 'stage1': perform preliminary scanning with “n_shuffles_stage1” shuffles only. Rejects strictly non-significant neuron-feature pairs, does not give definite results about significance of the others.

    • 'stage2': skip stage 1 and perform full-scale scanning (“n_shuffles_stage2” shuffles) of all neuron-feature pairs. Gives definite results, but can be very time-consuming. Also reduces statistical power of multiple comparison tests, since the number of hypotheses is very high.

    • 'two_stage': prune non-significant pairs during stage 1 and perform thorough testing for the rest during stage 2. Recommended mode.

    Default is ‘two_stage’

  • n_shuffles_stage1 (int, optional) – Number of shuffles for first stage. Default is 100

  • n_shuffles_stage2 (int, optional) – Number of shuffles for second stage. Default is 10000

  • metric_distr_type (str, optional) –

    Distribution type for shuffled metric null distribution. Options:

    • ’gamma_zi’ (default): Zero-inflated gamma distribution. Explicitly models the probability mass at zero that commonly occurs in MI null distributions. Provides superior goodness-of-fit and accurate parameter estimation without requiring artificial noise. Recommended for all analyses.

    • ’gamma’: Standard gamma distribution with small noise added (noise_ampl) to handle zeros. Provided for backward compatibility. Less statistically principled than ‘gamma_zi’.

    • Other scipy.stats distributions: ‘lognorm’, ‘norm’, etc. are supported but not recommended for MI distributions.

    Recommendation: Use ‘gamma_zi’ (default) for new analyses. It achieves equivalent detection performance while providing statistically correct goodness-of-fit and accurate parameter recovery.

    Default: ‘gamma_zi’

  • noise_ampl (float, optional) – Small noise amplitude added to MI values for numerical stability (only used with metric_distr_type=’gamma’). When using ‘gamma_zi’, this parameter is automatically set to 0 since zero-inflated gamma handles zeros explicitly without requiring artificial noise. Default: 1e-3

  • ds (int, optional) – Downsampling constant. Every “ds” point will be taken from the data time series. Reduces the computational load, but needs caution since with large “ds” some important information may be lost. Experiment class performs an internal check for this effect. Default is 1

  • use_precomputed_stats (bool, optional) – Whether to use stats saved in Experiment instance. Stats are accumulated separately for stage1 and stage2. Notes on stats data rewriting (if save_computed_stats=True): If you want to recalculate stage1 results only, use “use_precomputed_stats=False” and “mode=’stage1’”. Stage 2 stats will be erased since they will become irrelevant. If you want to recalculate stage2 results only, use “use_precomputed_stats=True” and “mode=’stage2’” or “mode=’two-stage’” If you want to recalculate everything, use “use_precomputed_stats=False” and “mode=’two-stage’”. Default is True

  • save_computed_stats (bool, optional) – Whether to save computed stats to Experiment instance. Default is True

  • force_update (bool, optional) – Whether to force saved statistics data update in case the collision between actual data hashes and saved stats data hashes is found (for example, if neuronal or behavior data has been changed externally). Default is False

  • topk1 (int, optional) – True MI for stage 1 should be among topk1 MI shuffles. Default is 1

  • topk2 (int, optional) – True MI for stage 2 should be among topk2 MI shuffles. Default is 5

  • multicomp_correction (str or None, optional) – Type of multiple comparison correction. Supported types are None (no correction), “bonferroni”, “holm”, and “fdr_bh” (Benjamini-Hochberg FDR). Default is ‘holm’

  • pval_thr (float, optional) – P-value threshold. If multicomp_correction=None, this is a p-value for a single pair. Otherwise it is a FWER significance level. Default is 0.01

  • find_optimal_delays (bool, optional) – Allows slight shifting (not more than +- shift_window) of time series, selects a shift with the highest MI as default. Default is True

  • skip_delays (list, optional) – List of features for which delays are not applied (set to 0). Only features that exist in feat_bunch will be processed. Has no effect if find_optimal_delays = False. Default is []

  • shift_window (int, optional) – Window for optimal shift search (seconds). Optimal shift (in frames) will lie in the range -shift_window*fps <= opt_shift <= shift_window*fps. Has no effect if find_optimal_delays = False. Default is 2

  • verbose (bool, optional) – Whether to print progress messages. Default is True

  • enable_parallelization (bool, optional) – Whether to enable parallel processing. Default is True

  • n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default is -1

  • seed (int, optional) – Random seed for reproducibility. Default is 42

  • with_disentanglement (bool, optional) –

    If True, performs a full INTENSE pipeline with mixed selectivity analysis:

    1. Computes behavioral feature-feature significance

    2. Computes neuron-feature significance

    3. Disentangles mixed selectivities using behavioral correlations.

    Default is False

  • feat_feat_pval_thr (float, optional) – P-value threshold for feature-feature significance testing during disentanglement. Separate from cell-feat pval_thr because the number of feature pairs (~100-200) is much smaller than neuron-feature pairs (thousands), so a stricter threshold is unnecessary. Only used when with_disentanglement=True. Default is 0.01

  • multifeature_map (dict or None, optional) – Mapping from multifeature tuples to aggregated names for disentanglement. If None, uses DEFAULT_MULTIFEATURE_MAP from disentanglement module. Only used when with_disentanglement=True. Default is None

  • duplicate_behavior (str, optional) –

    How to handle duplicate TimeSeries in neuron or feature bunches.

    • ’ignore’: Process duplicates normally (default)

    • ’raise’: Raise an error if duplicates are found

    • ’warn’: Print a warning but continue processing.

    Default is ‘ignore’

  • engine ({'auto', 'fft', 'loop'}, optional) –

    Computation engine for MI shuffles:

    • ’auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50)

    • ’fft’: Force FFT (raises error if not applicable)

    • ’loop’: Force per-shift loop (original behavior)

    FFT provides ~100x speedup for Stage 2. Default is ‘auto’

  • store_random_shifts (bool, optional) – Whether to store the random shift indices used during shuffle computation. When False (default), random_shifts1 and random_shifts2 arrays are not stored, saving significant memory (~400MB for typical datasets with N=500, M=20). Set to True if you need the shift indices for debugging or reproducibility analysis. Default is False

  • profile (bool, optional) –

    Whether to collect internal timing information. When True, info[‘timings’] will contain execution times (in seconds) for:

    • ’stage1_delay_optimization’: delay optimization (if find_optimal_delays=True)

    • ’stage1_pair_scanning’: stage 1 pair scanning

    • ’stage2_pair_scanning’: stage 2 pair scanning (if applicable)

    • ’fft_type_counts’: Dictionary of FFT type usage counts

    • ’disentanglement’: disentanglement analysis (if with_disentanglement=True)

    • ’total’: sum of all timing sections

    Default is False

  • pre_filter_func (callable or None, optional) –

    Population-level filter function (or composed filter) to run BEFORE disentanglement parallel processing. Only used when with_disentanglement=True. The filter mutates neuron selectivities and pre-computes pair decisions.

    Signature:

    def pre_filter_func(
        neuron_selectivities,    # dict: {neuron_id: [feat1, feat2, ...]} - MUTATE
        pair_decisions,          # dict: {neuron_id: {(f1, f2): 0/0.5/1}} - MUTATE
        renames,                 # dict: {neuron_id: {new_name: (old1, old2)}} - MUTATE
        cell_feat_stats,         # Pre-computed MI values (READ ONLY)
        feat_feat_significance,  # Binary matrix (READ ONLY)
        feat_names,              # List of feature names (READ ONLY)
        **kwargs,                # User-provided extra arguments from filter_kwargs
    ):
        ...
    

    Default: None (no filtering).

  • post_filter_func (callable or None, optional) –

    Population-level filter function to run AFTER disentanglement parallel processing. Can modify pair results (e.g., tie-breaking). Only used when with_disentanglement=True.

    Signature:

    def post_filter_func(
        per_neuron_disent,       # dict: {nid: {'pairs': {...}, ...}} - MUTATE
        cell_feat_stats,         # Pre-computed MI values (READ ONLY)
        feat_names,              # List of feature names (READ ONLY)
        **kwargs,                # User-provided extra arguments
    ):
        ...
    

    Default: None (no post-filtering).

  • filter_kwargs (dict or None, optional) – Dictionary of keyword arguments to pass to pre_filter_func and post_filter_func. Can include pre-extracted data like calcium_data, feature_data, thresholds, etc. Only used when with_disentanglement=True. Default: None.

  • use_circular_2d (bool, default=True) – If True, automatically substitute circular features with their _2d counterparts (cos, sin representation) for MI computation. This improves MI estimation accuracy for circular variables like head direction. Requires that create_circular_2d=True was used during experiment loading.

Return type:

tuple

Returns:

  • stats (dict of dict of dicts) – Outer dict: cells, inner dict: dynamic features, last dict: stats. Can be easily converted to pandas DataFrame by pd.DataFrame(stats)

  • significance (dict of dict of bools) – Significance results for each neuron-feature pair

  • info (dict) – Additional information from compute_me_stats

  • intense_res (IntenseResults) – Complete results object

  • disentanglement_results (dict (only if with_disentanglement=True)) – Contains:

    • ’feat_feat_significance’: Feature-feature significance matrix

    • ’disent_matrix’: Disentanglement results matrix

    • ’count_matrix’: Count matrix from disentanglement

    • ’per_neuron_disent’: Per-neuron detailed results dict mapping neuron_id to ‘pairs’, ‘renames’, and ‘final_sels’ sub-dicts.

    • ’feature_names’: List of feature names

    • ’summary’: Summary statistics from disentanglement

Raises:

ValueError – If data_type is not ‘calcium’ or ‘spikes’ If features are not found in experiment

Notes

  • shift_window is converted from seconds to frames using exp.fps

  • Updates exp.optimal_nf_delays as a side effect

  • Relative MI values are computed using appropriate neural data entropy

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> import numpy as np
>>>
>>> # Create small test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Basic neuron-feature analysis (stage1 for speed)
>>> stats, sig, info, res, _ = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     feat_bunch=['d_feat_0'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>> len(stats)  # Number of neurons analyzed
2
>>> 'd_feat_0' in stats[0]  # Feature present in results
True
>>>
>>> # With disentanglement analysis
>>> result = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     with_disentanglement=True,
...     verbose=False
... )  
...
>>> len(result)  # Returns 5 values with disentanglement
5
>>> stats, sig, info, res, disent = result
>>> 'disent_matrix' in disent
True
driada.intense.pipelines.compute_feat_feat_significance(exp, feat_bunch='all', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', engine='auto', profile=False)[source]

Compute pairwise significance between all behavioral features.

This function calculates pairwise similarity (e.g., mutual information) between all behavioral features using the two-stage INTENSE approach. The diagonal elements are set to zero as self-similarity is prevented by the check_for_coincidence mechanism in get_mi.

Parameters:
  • exp (Experiment) – Experiment object containing behavioral data.

  • feat_bunch (str, list or None) – Feature names to analyze. Default: ‘all’ (all features including multifeatures). Can be a list of specific feature names.

  • metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.

  • n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.

  • n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.

  • metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.

  • noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.

  • ds (int, optional) – Downsampling factor. Default: 1.

  • topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.

  • topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.

  • multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.

  • pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.

  • verbose (bool, optional) – Whether to print progress information. Default: True.

  • enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.

  • n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.

  • seed (int, optional) – Random seed for reproducibility. Default: 42.

  • duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.

  • engine (str, optional) – Computation engine for MI calculation: - ‘auto’: Automatically select FFT when beneficial (default) - ‘fft’: Force FFT-based computation - ‘loop’: Force loop-based computation (useful for comparison/debugging) Default: ‘auto’.

  • profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

  • similarity_matrix (ndarray) – Matrix of similarity values between features. Element [i,j] contains the similarity between feature i and feature j. Diagonal is zero.

  • significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.

  • p_value_matrix (ndarray) – Matrix of p-values for each comparison.

  • feature_names (list) – List of feature names corresponding to matrix indices. May include tuples for multifeatures (e.g., (‘x’, ‘y’)).

  • info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

  • Uses the two-stage INTENSE approach for efficient significance testing

  • Diagonal elements are zero (self-similarity check prevents computation)

  • The function handles both discrete and continuous variables

  • Supports MultiTimeSeries (e.g., place fields from x,y coordinates)

  • For mutual information, values are in bits

  • No optimal delay search is performed (delays are set to 0)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>>
>>> # Create test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=2, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Compute feature-feature correlations
>>> sim_mat, sig_mat, pval_mat, features, info = compute_feat_feat_significance(
...     exp,
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (4, 4)  # 2 discrete + 2 continuous features
True
>>> np.allclose(np.diag(sim_mat), 0)  # Diagonal is zero
True
>>>
>>> # Analyze specific features only
>>> sim_mat2, sig_mat2, pval_mat2, features2, info2 = compute_feat_feat_significance(
...     exp,
...     feat_bunch=['d_feat_0', 'd_feat_1'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat2.shape == (2, 2)
True
Raises:

ValueError – If features are not found in experiment

Return type:

tuple

Notes

  • Only upper triangle is computed for efficiency (matrix is symmetric)

  • Diagonal elements are always zero (self-similarity prevented)

  • No delay optimization is performed between features

  • Supports both discrete and continuous features

  • Multifeatures are created using aggregate_multiple_ts

  • When called with circular features, feat_bunch should contain _2d-substituted names (e.g., headdirection_2d) to match the experiment’s dynamic features after circular substitution.

driada.intense.pipelines.compute_cell_cell_significance(exp, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', profile=False)[source]

Compute pairwise functional correlations between neurons using INTENSE.

This function calculates pairwise similarity (e.g., mutual information) between all neurons using the two-stage INTENSE approach. This can reveal functionally correlated neurons that may form assemblies or functional modules.

Parameters:
  • exp (Experiment) – Experiment object containing neural data.

  • cell_bunch (int, list or None, optional) – Neuron indices to analyze. Default: None (all neurons).

  • data_type (str, optional) – Type of neural data: ‘calcium’ or ‘spikes’. Default: ‘calcium’.

  • metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.

  • n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.

  • n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.

  • metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.

  • noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.

  • ds (int, optional) – Downsampling factor. Default: 1.

  • topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.

  • topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.

  • multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.

  • pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.

  • verbose (bool, optional) – Whether to print progress information. Default: True.

  • enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.

  • n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.

  • seed (int, optional) – Random seed for reproducibility. Default: 42.

  • duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.

  • profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

  • similarity_matrix (ndarray) – Matrix of similarity values between neurons. Element [i,j] contains the similarity between neuron i and neuron j. Diagonal is zero.

  • significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.

  • p_value_matrix (ndarray) – Matrix of p-values for each comparison.

  • cell_ids (list) – List of cell IDs corresponding to matrix indices.

  • info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

  • Uses the two-stage INTENSE approach for efficient significance testing

  • Diagonal elements are zero (self-similarity check prevents computation)

  • For calcium imaging data, considers temporal dynamics

  • For spike data, uses discrete MI formulation

  • Can identify functional assemblies through graph analysis of significant pairs

  • No optimal delay search is performed (synchronous activity assumed)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from driada.information.info_base import TimeSeries
>>> import numpy as np
>>>
>>> # Create experiment with correlated neurons
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Make neurons 0 and 1 correlated
>>> noise = np.random.RandomState(42).randn(len(exp.neurons[0].ca.data)) * 0.1
>>> exp.neurons[1].ca = TimeSeries(
...     exp.neurons[0].ca.data + noise, discrete=False
... )
>>>
>>> # Compute neuron-neuron correlations
>>> sim_mat, sig_mat, pval_mat, cells, info = compute_cell_cell_significance(
...     exp,
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (3, 3)
True
>>> np.allclose(np.diag(sim_mat), 0)  # Self-correlation is zero
True
>>> sim_mat[0, 1] > sim_mat[0, 2]  # Neurons 0,1 more correlated than 0,2
True
Raises:

ValueError – If data_type is not ‘calcium’ or ‘spikes’ If spike data is missing for requested neurons

Return type:

tuple

Notes

  • Only upper triangle is computed for efficiency (matrix is symmetric)

  • Warns if all neurons have identical spike data

  • Computes network statistics when verbose=True

  • Synchronous activity assumed (no delay optimization)

driada.intense.pipelines.compute_embedding_selectivity(exp, embedding_methods=None, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, shift_window=2, remove_anti_selective=True, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42)[source]

Compute INTENSE selectivity between neurons and dimensionality reduction embeddings.

This function treats each embedding component as a dynamic feature and computes the mutual information between neural activity and embedding dimensions. This reveals how individual neurons contribute to the population-level manifold structure.

Parameters:
  • exp (Experiment) – Experiment object with stored embeddings

  • embedding_methods (str, list or None) – Names of embedding methods to analyze. If None, analyzes all stored embeddings.

  • cell_bunch (int, iterable or None) – Neuron indices. By default (None), all neurons will be taken

  • data_type (str) – Data type used for embeddings and INTENSE (‘calcium’ or ‘spikes’)

  • metric (str) – Similarity metric between TimeSeries (default: ‘mi’)

  • mi_estimator (str) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str) – Computation mode: ‘stage1’, ‘stage2’, or ‘two_stage’ (default)

  • n_shuffles_stage1 (int) – Number of shuffles for first stage (default: 100)

  • n_shuffles_stage2 (int) – Number of shuffles for second stage (default: 10000)

  • metric_distr_type (str) – Distribution type for shuffled metric distribution fit (default: ‘norm’)

  • noise_ampl (float) – Small noise amplitude added to improve numerical fit (default: 1e-3)

  • ds (int) – Downsampling constant (default: 1)

  • use_precomputed_stats (bool) – Whether to use stats saved in Experiment instance (default: True)

  • save_computed_stats (bool) – Whether to save computed stats to Experiment instance (default: True)

  • force_update (bool) – Force update saved statistics if data hash collision found (default: False)

  • topk1 (int) – True MI for stage 1 should be among topk1 MI shuffles (default: 1)

  • topk2 (int) – True MI for stage 2 should be among topk2 MI shuffles (default: 5)

  • multicomp_correction (str or None) – Multiple comparison correction type: None, ‘bonferroni’, or ‘holm’ (default)

  • pval_thr (float) – P-value threshold (default: 0.01)

  • find_optimal_delays (bool) – Find optimal temporal delays between neural activity and embeddings (default: True)

  • shift_window (int) – Window for optimal shift search in seconds (default: 2)

  • verbose (bool) – Print progress information (default: True)

  • enable_parallelization (bool) – Enable parallel computation (default: True)

  • n_jobs (int) – Number of parallel jobs, -1 for all cores (default: -1)

  • seed (int) – Random seed (default: 42)

Returns:

results – Dictionary with keys as embedding method names, each containing: - ‘stats’: Statistics for each neuron-component pair - ‘significance’: Significance results - ‘info’: Additional information from compute_me_stats - ‘intense_results’: Full IntenseResults object from INTENSE computation - ‘significant_neurons’: Dict of neurons significantly selective to embedding components - ‘n_components’: Number of embedding components - ‘component_selectivity’: For each component, list of selective neurons

Return type:

dict

Raises:

ValueError – If no embeddings found for specified data_type If embedding method not found

Notes

  • Temporarily adds embedding components as dynamic features

  • Forces use_precomputed_stats=False for temporary features

  • Component names follow pattern “{method}_comp{index}”

  • Cleanup in finally block ensures experiment state restored

  • Only stage2 significance is considered for results

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from sklearn.decomposition import PCA
>>> import numpy as np
>>>
>>> # Create experiment
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=5,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Create and store PCA embedding
>>> neural_data = np.array([exp.neurons[i].ca.data for i in range(5)]).T
>>> pca = PCA(n_components=2, random_state=42)
>>> embedding = pca.fit_transform(neural_data)
>>> exp.store_embedding(embedding, method_name='pca', data_type='calcium')
>>>
>>> # Compute embedding selectivity
>>> results = compute_embedding_selectivity(
...     exp,
...     embedding_methods=['pca'],
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>>
>>> 'pca' in results
True
>>> results['pca']['n_components']
2
>>> 'component_selectivity' in results['pca']
True

See also

compute_cell_feat_significance

Compute selectivity for behavioral features

get_functional_organization

Analyze organization in embeddings

compare_embeddings

Compare multiple embedding methods

High-level analysis pipelines for computing statistical significance of neural selectivity.

All pipeline functions are importable from the top-level driada.intense namespace:

>>> from driada.intense import compute_cell_feat_significance
>>> from driada.intense import compute_cell_cell_significance
>>> from driada.intense import compute_embedding_selectivity
>>> import inspect
>>> 'n_shuffles_stage1' in inspect.signature(compute_cell_feat_significance).parameters
True
>>> 'n_shuffles_stage2' in inspect.signature(compute_cell_cell_significance).parameters
True

compute_cell_cell_significance produces pairwise similarity and significance matrices. The significance matrix can be wrapped in a Network for spectral and topological analysis:

sim_mat, sig_mat, pval_mat, cell_ids, info = compute_cell_cell_significance(
    exp, n_shuffles_stage1=100, n_shuffles_stage2=1000, ds=5
)

import scipy.sparse as sp
from driada.network import Network

net = Network(adj=sp.csr_matrix(sig_mat), preprocessing='giant_cc')
net.diagonalize(mode='nlap')
spectrum = net.get_spectrum('nlap')

Main Functions

driada.intense.pipelines.compute_cell_feat_significance(exp, cell_bunch=None, feat_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, skip_delays=[], shift_window=2, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, with_disentanglement=False, feat_feat_pval_thr=0.01, multifeature_map=None, duplicate_behavior='ignore', engine='auto', store_random_shifts=False, profile=False, pre_filter_func=None, post_filter_func=None, filter_kwargs=None, remove_anti_selective=True, use_circular_2d=True)[source]

Calculates significant neuron-feature pairs

Parameters:
  • exp (Experiment) – Experiment object to read and write data from

  • cell_bunch (int, iterable or None, optional) – Neuron indices. By default, (cell_bunch=None), all neurons will be taken

  • feat_bunch (str, iterable or None, optional) – Feature names. By default, (feat_bunch=None), all single features will be taken

  • data_type (str, optional) – Data type used for INTENSE computations. Can be ‘calcium’ or ‘spikes’. Default is ‘calcium’

  • metric (str, optional) – Similarity metric between TimeSeries. Default is ‘mi’

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ or ‘ksg’. Default is ‘gcmi’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str, optional) –

    Computation mode. 3 modes are available:

    • 'stage1': perform preliminary scanning with “n_shuffles_stage1” shuffles only. Rejects strictly non-significant neuron-feature pairs, does not give definite results about significance of the others.

    • 'stage2': skip stage 1 and perform full-scale scanning (“n_shuffles_stage2” shuffles) of all neuron-feature pairs. Gives definite results, but can be very time-consuming. Also reduces statistical power of multiple comparison tests, since the number of hypotheses is very high.

    • 'two_stage': prune non-significant pairs during stage 1 and perform thorough testing for the rest during stage 2. Recommended mode.

    Default is ‘two_stage’

  • n_shuffles_stage1 (int, optional) – Number of shuffles for first stage. Default is 100

  • n_shuffles_stage2 (int, optional) – Number of shuffles for second stage. Default is 10000

  • metric_distr_type (str, optional) –

    Distribution type for shuffled metric null distribution. Options:

    • ’gamma_zi’ (default): Zero-inflated gamma distribution. Explicitly models the probability mass at zero that commonly occurs in MI null distributions. Provides superior goodness-of-fit and accurate parameter estimation without requiring artificial noise. Recommended for all analyses.

    • ’gamma’: Standard gamma distribution with small noise added (noise_ampl) to handle zeros. Provided for backward compatibility. Less statistically principled than ‘gamma_zi’.

    • Other scipy.stats distributions: ‘lognorm’, ‘norm’, etc. are supported but not recommended for MI distributions.

    Recommendation: Use ‘gamma_zi’ (default) for new analyses. It achieves equivalent detection performance while providing statistically correct goodness-of-fit and accurate parameter recovery.

    Default: ‘gamma_zi’

  • noise_ampl (float, optional) – Small noise amplitude added to MI values for numerical stability (only used with metric_distr_type=’gamma’). When using ‘gamma_zi’, this parameter is automatically set to 0 since zero-inflated gamma handles zeros explicitly without requiring artificial noise. Default: 1e-3

  • ds (int, optional) – Downsampling constant. Every “ds” point will be taken from the data time series. Reduces the computational load, but needs caution since with large “ds” some important information may be lost. Experiment class performs an internal check for this effect. Default is 1

  • use_precomputed_stats (bool, optional) – Whether to use stats saved in Experiment instance. Stats are accumulated separately for stage1 and stage2. Notes on stats data rewriting (if save_computed_stats=True): If you want to recalculate stage1 results only, use “use_precomputed_stats=False” and “mode=’stage1’”. Stage 2 stats will be erased since they will become irrelevant. If you want to recalculate stage2 results only, use “use_precomputed_stats=True” and “mode=’stage2’” or “mode=’two-stage’” If you want to recalculate everything, use “use_precomputed_stats=False” and “mode=’two-stage’”. Default is True

  • save_computed_stats (bool, optional) – Whether to save computed stats to Experiment instance. Default is True

  • force_update (bool, optional) – Whether to force saved statistics data update in case the collision between actual data hashes and saved stats data hashes is found (for example, if neuronal or behavior data has been changed externally). Default is False

  • topk1 (int, optional) – True MI for stage 1 should be among topk1 MI shuffles. Default is 1

  • topk2 (int, optional) – True MI for stage 2 should be among topk2 MI shuffles. Default is 5

  • multicomp_correction (str or None, optional) – Type of multiple comparison correction. Supported types are None (no correction), “bonferroni”, “holm”, and “fdr_bh” (Benjamini-Hochberg FDR). Default is ‘holm’

  • pval_thr (float, optional) – P-value threshold. If multicomp_correction=None, this is a p-value for a single pair. Otherwise it is a FWER significance level. Default is 0.01

  • find_optimal_delays (bool, optional) – Allows slight shifting (not more than +- shift_window) of time series, selects a shift with the highest MI as default. Default is True

  • skip_delays (list, optional) – List of features for which delays are not applied (set to 0). Only features that exist in feat_bunch will be processed. Has no effect if find_optimal_delays = False. Default is []

  • shift_window (int, optional) – Window for optimal shift search (seconds). Optimal shift (in frames) will lie in the range -shift_window*fps <= opt_shift <= shift_window*fps. Has no effect if find_optimal_delays = False. Default is 2

  • verbose (bool, optional) – Whether to print progress messages. Default is True

  • enable_parallelization (bool, optional) – Whether to enable parallel processing. Default is True

  • n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default is -1

  • seed (int, optional) – Random seed for reproducibility. Default is 42

  • with_disentanglement (bool, optional) –

    If True, performs a full INTENSE pipeline with mixed selectivity analysis:

    1. Computes behavioral feature-feature significance

    2. Computes neuron-feature significance

    3. Disentangles mixed selectivities using behavioral correlations.

    Default is False

  • feat_feat_pval_thr (float, optional) – P-value threshold for feature-feature significance testing during disentanglement. Separate from cell-feat pval_thr because the number of feature pairs (~100-200) is much smaller than neuron-feature pairs (thousands), so a stricter threshold is unnecessary. Only used when with_disentanglement=True. Default is 0.01

  • multifeature_map (dict or None, optional) – Mapping from multifeature tuples to aggregated names for disentanglement. If None, uses DEFAULT_MULTIFEATURE_MAP from disentanglement module. Only used when with_disentanglement=True. Default is None

  • duplicate_behavior (str, optional) –

    How to handle duplicate TimeSeries in neuron or feature bunches.

    • ’ignore’: Process duplicates normally (default)

    • ’raise’: Raise an error if duplicates are found

    • ’warn’: Print a warning but continue processing.

    Default is ‘ignore’

  • engine ({'auto', 'fft', 'loop'}, optional) –

    Computation engine for MI shuffles:

    • ’auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50)

    • ’fft’: Force FFT (raises error if not applicable)

    • ’loop’: Force per-shift loop (original behavior)

    FFT provides ~100x speedup for Stage 2. Default is ‘auto’

  • store_random_shifts (bool, optional) – Whether to store the random shift indices used during shuffle computation. When False (default), random_shifts1 and random_shifts2 arrays are not stored, saving significant memory (~400MB for typical datasets with N=500, M=20). Set to True if you need the shift indices for debugging or reproducibility analysis. Default is False

  • profile (bool, optional) –

    Whether to collect internal timing information. When True, info[‘timings’] will contain execution times (in seconds) for:

    • ’stage1_delay_optimization’: delay optimization (if find_optimal_delays=True)

    • ’stage1_pair_scanning’: stage 1 pair scanning

    • ’stage2_pair_scanning’: stage 2 pair scanning (if applicable)

    • ’fft_type_counts’: Dictionary of FFT type usage counts

    • ’disentanglement’: disentanglement analysis (if with_disentanglement=True)

    • ’total’: sum of all timing sections

    Default is False

  • pre_filter_func (callable or None, optional) –

    Population-level filter function (or composed filter) to run BEFORE disentanglement parallel processing. Only used when with_disentanglement=True. The filter mutates neuron selectivities and pre-computes pair decisions.

    Signature:

    def pre_filter_func(
        neuron_selectivities,    # dict: {neuron_id: [feat1, feat2, ...]} - MUTATE
        pair_decisions,          # dict: {neuron_id: {(f1, f2): 0/0.5/1}} - MUTATE
        renames,                 # dict: {neuron_id: {new_name: (old1, old2)}} - MUTATE
        cell_feat_stats,         # Pre-computed MI values (READ ONLY)
        feat_feat_significance,  # Binary matrix (READ ONLY)
        feat_names,              # List of feature names (READ ONLY)
        **kwargs,                # User-provided extra arguments from filter_kwargs
    ):
        ...
    

    Default: None (no filtering).

  • post_filter_func (callable or None, optional) –

    Population-level filter function to run AFTER disentanglement parallel processing. Can modify pair results (e.g., tie-breaking). Only used when with_disentanglement=True.

    Signature:

    def post_filter_func(
        per_neuron_disent,       # dict: {nid: {'pairs': {...}, ...}} - MUTATE
        cell_feat_stats,         # Pre-computed MI values (READ ONLY)
        feat_names,              # List of feature names (READ ONLY)
        **kwargs,                # User-provided extra arguments
    ):
        ...
    

    Default: None (no post-filtering).

  • filter_kwargs (dict or None, optional) – Dictionary of keyword arguments to pass to pre_filter_func and post_filter_func. Can include pre-extracted data like calcium_data, feature_data, thresholds, etc. Only used when with_disentanglement=True. Default: None.

  • use_circular_2d (bool, default=True) – If True, automatically substitute circular features with their _2d counterparts (cos, sin representation) for MI computation. This improves MI estimation accuracy for circular variables like head direction. Requires that create_circular_2d=True was used during experiment loading.

Return type:

tuple

Returns:

  • stats (dict of dict of dicts) – Outer dict: cells, inner dict: dynamic features, last dict: stats. Can be easily converted to pandas DataFrame by pd.DataFrame(stats)

  • significance (dict of dict of bools) – Significance results for each neuron-feature pair

  • info (dict) – Additional information from compute_me_stats

  • intense_res (IntenseResults) – Complete results object

  • disentanglement_results (dict (only if with_disentanglement=True)) – Contains:

    • ’feat_feat_significance’: Feature-feature significance matrix

    • ’disent_matrix’: Disentanglement results matrix

    • ’count_matrix’: Count matrix from disentanglement

    • ’per_neuron_disent’: Per-neuron detailed results dict mapping neuron_id to ‘pairs’, ‘renames’, and ‘final_sels’ sub-dicts.

    • ’feature_names’: List of feature names

    • ’summary’: Summary statistics from disentanglement

Raises:

ValueError – If data_type is not ‘calcium’ or ‘spikes’ If features are not found in experiment

Notes

  • shift_window is converted from seconds to frames using exp.fps

  • Updates exp.optimal_nf_delays as a side effect

  • Relative MI values are computed using appropriate neural data entropy

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> import numpy as np
>>>
>>> # Create small test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Basic neuron-feature analysis (stage1 for speed)
>>> stats, sig, info, res, _ = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     feat_bunch=['d_feat_0'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>> len(stats)  # Number of neurons analyzed
2
>>> 'd_feat_0' in stats[0]  # Feature present in results
True
>>>
>>> # With disentanglement analysis
>>> result = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     with_disentanglement=True,
...     verbose=False
... )  
...
>>> len(result)  # Returns 5 values with disentanglement
5
>>> stats, sig, info, res, disent = result
>>> 'disent_matrix' in disent
True
driada.intense.pipelines.compute_feat_feat_significance(exp, feat_bunch='all', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', engine='auto', profile=False)[source]

Compute pairwise significance between all behavioral features.

This function calculates pairwise similarity (e.g., mutual information) between all behavioral features using the two-stage INTENSE approach. The diagonal elements are set to zero as self-similarity is prevented by the check_for_coincidence mechanism in get_mi.

Parameters:
  • exp (Experiment) – Experiment object containing behavioral data.

  • feat_bunch (str, list or None) – Feature names to analyze. Default: ‘all’ (all features including multifeatures). Can be a list of specific feature names.

  • metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.

  • n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.

  • n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.

  • metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.

  • noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.

  • ds (int, optional) – Downsampling factor. Default: 1.

  • topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.

  • topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.

  • multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.

  • pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.

  • verbose (bool, optional) – Whether to print progress information. Default: True.

  • enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.

  • n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.

  • seed (int, optional) – Random seed for reproducibility. Default: 42.

  • duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.

  • engine (str, optional) – Computation engine for MI calculation: - ‘auto’: Automatically select FFT when beneficial (default) - ‘fft’: Force FFT-based computation - ‘loop’: Force loop-based computation (useful for comparison/debugging) Default: ‘auto’.

  • profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

  • similarity_matrix (ndarray) – Matrix of similarity values between features. Element [i,j] contains the similarity between feature i and feature j. Diagonal is zero.

  • significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.

  • p_value_matrix (ndarray) – Matrix of p-values for each comparison.

  • feature_names (list) – List of feature names corresponding to matrix indices. May include tuples for multifeatures (e.g., (‘x’, ‘y’)).

  • info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

  • Uses the two-stage INTENSE approach for efficient significance testing

  • Diagonal elements are zero (self-similarity check prevents computation)

  • The function handles both discrete and continuous variables

  • Supports MultiTimeSeries (e.g., place fields from x,y coordinates)

  • For mutual information, values are in bits

  • No optimal delay search is performed (delays are set to 0)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>>
>>> # Create test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=2, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Compute feature-feature correlations
>>> sim_mat, sig_mat, pval_mat, features, info = compute_feat_feat_significance(
...     exp,
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (4, 4)  # 2 discrete + 2 continuous features
True
>>> np.allclose(np.diag(sim_mat), 0)  # Diagonal is zero
True
>>>
>>> # Analyze specific features only
>>> sim_mat2, sig_mat2, pval_mat2, features2, info2 = compute_feat_feat_significance(
...     exp,
...     feat_bunch=['d_feat_0', 'd_feat_1'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat2.shape == (2, 2)
True
Raises:

ValueError – If features are not found in experiment

Return type:

tuple

Notes

  • Only upper triangle is computed for efficiency (matrix is symmetric)

  • Diagonal elements are always zero (self-similarity prevented)

  • No delay optimization is performed between features

  • Supports both discrete and continuous features

  • Multifeatures are created using aggregate_multiple_ts

  • When called with circular features, feat_bunch should contain _2d-substituted names (e.g., headdirection_2d) to match the experiment’s dynamic features after circular substitution.

driada.intense.pipelines.compute_cell_cell_significance(exp, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', profile=False)[source]

Compute pairwise functional correlations between neurons using INTENSE.

This function calculates pairwise similarity (e.g., mutual information) between all neurons using the two-stage INTENSE approach. This can reveal functionally correlated neurons that may form assemblies or functional modules.

Parameters:
  • exp (Experiment) – Experiment object containing neural data.

  • cell_bunch (int, list or None, optional) – Neuron indices to analyze. Default: None (all neurons).

  • data_type (str, optional) – Type of neural data: ‘calcium’ or ‘spikes’. Default: ‘calcium’.

  • metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.

  • n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.

  • n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.

  • metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.

  • noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.

  • ds (int, optional) – Downsampling factor. Default: 1.

  • topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.

  • topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.

  • multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.

  • pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.

  • verbose (bool, optional) – Whether to print progress information. Default: True.

  • enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.

  • n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.

  • seed (int, optional) – Random seed for reproducibility. Default: 42.

  • duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.

  • profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

  • similarity_matrix (ndarray) – Matrix of similarity values between neurons. Element [i,j] contains the similarity between neuron i and neuron j. Diagonal is zero.

  • significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.

  • p_value_matrix (ndarray) – Matrix of p-values for each comparison.

  • cell_ids (list) – List of cell IDs corresponding to matrix indices.

  • info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

  • Uses the two-stage INTENSE approach for efficient significance testing

  • Diagonal elements are zero (self-similarity check prevents computation)

  • For calcium imaging data, considers temporal dynamics

  • For spike data, uses discrete MI formulation

  • Can identify functional assemblies through graph analysis of significant pairs

  • No optimal delay search is performed (synchronous activity assumed)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from driada.information.info_base import TimeSeries
>>> import numpy as np
>>>
>>> # Create experiment with correlated neurons
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Make neurons 0 and 1 correlated
>>> noise = np.random.RandomState(42).randn(len(exp.neurons[0].ca.data)) * 0.1
>>> exp.neurons[1].ca = TimeSeries(
...     exp.neurons[0].ca.data + noise, discrete=False
... )
>>>
>>> # Compute neuron-neuron correlations
>>> sim_mat, sig_mat, pval_mat, cells, info = compute_cell_cell_significance(
...     exp,
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (3, 3)
True
>>> np.allclose(np.diag(sim_mat), 0)  # Self-correlation is zero
True
>>> sim_mat[0, 1] > sim_mat[0, 2]  # Neurons 0,1 more correlated than 0,2
True
Raises:

ValueError – If data_type is not ‘calcium’ or ‘spikes’ If spike data is missing for requested neurons

Return type:

tuple

Notes

  • Only upper triangle is computed for efficiency (matrix is symmetric)

  • Warns if all neurons have identical spike data

  • Computes network statistics when verbose=True

  • Synchronous activity assumed (no delay optimization)

driada.intense.pipelines.compute_embedding_selectivity(exp, embedding_methods=None, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, shift_window=2, remove_anti_selective=True, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42)[source]

Compute INTENSE selectivity between neurons and dimensionality reduction embeddings.

This function treats each embedding component as a dynamic feature and computes the mutual information between neural activity and embedding dimensions. This reveals how individual neurons contribute to the population-level manifold structure.

Parameters:
  • exp (Experiment) – Experiment object with stored embeddings

  • embedding_methods (str, list or None) – Names of embedding methods to analyze. If None, analyzes all stored embeddings.

  • cell_bunch (int, iterable or None) – Neuron indices. By default (None), all neurons will be taken

  • data_type (str) – Data type used for embeddings and INTENSE (‘calcium’ or ‘spikes’)

  • metric (str) – Similarity metric between TimeSeries (default: ‘mi’)

  • mi_estimator (str) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • mode (str) – Computation mode: ‘stage1’, ‘stage2’, or ‘two_stage’ (default)

  • n_shuffles_stage1 (int) – Number of shuffles for first stage (default: 100)

  • n_shuffles_stage2 (int) – Number of shuffles for second stage (default: 10000)

  • metric_distr_type (str) – Distribution type for shuffled metric distribution fit (default: ‘norm’)

  • noise_ampl (float) – Small noise amplitude added to improve numerical fit (default: 1e-3)

  • ds (int) – Downsampling constant (default: 1)

  • use_precomputed_stats (bool) – Whether to use stats saved in Experiment instance (default: True)

  • save_computed_stats (bool) – Whether to save computed stats to Experiment instance (default: True)

  • force_update (bool) – Force update saved statistics if data hash collision found (default: False)

  • topk1 (int) – True MI for stage 1 should be among topk1 MI shuffles (default: 1)

  • topk2 (int) – True MI for stage 2 should be among topk2 MI shuffles (default: 5)

  • multicomp_correction (str or None) – Multiple comparison correction type: None, ‘bonferroni’, or ‘holm’ (default)

  • pval_thr (float) – P-value threshold (default: 0.01)

  • find_optimal_delays (bool) – Find optimal temporal delays between neural activity and embeddings (default: True)

  • shift_window (int) – Window for optimal shift search in seconds (default: 2)

  • verbose (bool) – Print progress information (default: True)

  • enable_parallelization (bool) – Enable parallel computation (default: True)

  • n_jobs (int) – Number of parallel jobs, -1 for all cores (default: -1)

  • seed (int) – Random seed (default: 42)

Returns:

results – Dictionary with keys as embedding method names, each containing: - ‘stats’: Statistics for each neuron-component pair - ‘significance’: Significance results - ‘info’: Additional information from compute_me_stats - ‘intense_results’: Full IntenseResults object from INTENSE computation - ‘significant_neurons’: Dict of neurons significantly selective to embedding components - ‘n_components’: Number of embedding components - ‘component_selectivity’: For each component, list of selective neurons

Return type:

dict

Raises:

ValueError – If no embeddings found for specified data_type If embedding method not found

Notes

  • Temporarily adds embedding components as dynamic features

  • Forces use_precomputed_stats=False for temporary features

  • Component names follow pattern “{method}_comp{index}”

  • Cleanup in finally block ensures experiment state restored

  • Only stage2 significance is considered for results

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from sklearn.decomposition import PCA
>>> import numpy as np
>>>
>>> # Create experiment
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=5,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Create and store PCA embedding
>>> neural_data = np.array([exp.neurons[i].ca.data for i in range(5)]).T
>>> pca = PCA(n_components=2, random_state=42)
>>> embedding = pca.fit_transform(neural_data)
>>> exp.store_embedding(embedding, method_name='pca', data_type='calcium')
>>>
>>> # Compute embedding selectivity
>>> results = compute_embedding_selectivity(
...     exp,
...     embedding_methods=['pca'],
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>>
>>> 'pca' in results
True
>>> results['pca']['n_components']
2
>>> 'component_selectivity' in results['pca']
True

See also

compute_cell_feat_significance

Compute selectivity for behavioral features

get_functional_organization

Analyze organization in embeddings

compare_embeddings

Compare multiple embedding methods

Usage Example

from driada.intense import compute_cell_feat_significance
from driada.experiment import load_demo_experiment

exp = load_demo_experiment()

stats, significance, info, results = compute_cell_feat_significance(
    exp,
    n_shuffles_stage1=100,
    n_shuffles_stage2=1000,
    ds=5,
    find_optimal_delays=False,
)