INTENSE Pipelines

driada.intense.pipelines.substitute_circular_with_2d(feat_ids, exp, verbose=False)[source]

Substitute circular features with their _2d (cos, sin) counterparts.

For features detected as circular that have a corresponding {name}_2d MultiTimeSeries in the experiment, replaces the feature ID with the _2d version.

Parameters:

feat_ids (list) – List of feature IDs (strings or tuples for multi-features).
exp (Experiment) – Experiment object containing dynamic_features.
verbose (bool, default=False) – If True, print substitution information.

Returns:

(new_feat_ids, substitutions) where substitutions is a list of (original, substituted) tuples.

Return type:

tuple

Examples

>>> # Assuming exp has circular feature 'headdirection' with _2d version
>>> feat_ids = ['headdirection', 'speed']  
>>> new_ids, subs = substitute_circular_with_2d(feat_ids, exp)  
>>> new_ids  
['headdirection_2d', 'speed']

driada.intense.pipelines.compute_cell_feat_significance(exp, cell_bunch=None, feat_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, skip_delays=[], shift_window=2, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, with_disentanglement=False, feat_feat_pval_thr=0.01, multifeature_map=None, duplicate_behavior='ignore', engine='auto', store_random_shifts=False, profile=False, pre_filter_func=None, post_filter_func=None, filter_kwargs=None, remove_anti_selective=True, use_circular_2d=True)[source]

Calculates significant neuron-feature pairs

Parameters:

exp (Experiment) – Experiment object to read and write data from
cell_bunch (int, iterable or None, optional) – Neuron indices. By default, (cell_bunch=None), all neurons will be taken
feat_bunch (str, iterable or None, optional) – Feature names. By default, (feat_bunch=None), all single features will be taken
data_type (str, optional) – Data type used for INTENSE computations. Can be ‘calcium’ or ‘spikes’. Default is ‘calcium’
metric (str, optional) – Similarity metric between TimeSeries. Default is ‘mi’
mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ or ‘ksg’. Default is ‘gcmi’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str, optional) –
Computation mode. 3 modes are available:
- 'stage1': perform preliminary scanning with “n_shuffles_stage1” shuffles only. Rejects strictly non-significant neuron-feature pairs, does not give definite results about significance of the others.
- 'stage2': skip stage 1 and perform full-scale scanning (“n_shuffles_stage2” shuffles) of all neuron-feature pairs. Gives definite results, but can be very time-consuming. Also reduces statistical power of multiple comparison tests, since the number of hypotheses is very high.
- 'two_stage': prune non-significant pairs during stage 1 and perform thorough testing for the rest during stage 2. Recommended mode.
Default is ‘two_stage’
n_shuffles_stage1 (int, optional) – Number of shuffles for first stage. Default is 100
n_shuffles_stage2 (int, optional) – Number of shuffles for second stage. Default is 10000
metric_distr_type (str, optional) –
Distribution type for shuffled metric null distribution. Options:
- ’gamma_zi’ (default): Zero-inflated gamma distribution. Explicitly models the probability mass at zero that commonly occurs in MI null distributions. Provides superior goodness-of-fit and accurate parameter estimation without requiring artificial noise. Recommended for all analyses.
- ’gamma’: Standard gamma distribution with small noise added (noise_ampl) to handle zeros. Provided for backward compatibility. Less statistically principled than ‘gamma_zi’.
- Other scipy.stats distributions: ‘lognorm’, ‘norm’, etc. are supported but not recommended for MI distributions.
Recommendation: Use ‘gamma_zi’ (default) for new analyses. It achieves equivalent detection performance while providing statistically correct goodness-of-fit and accurate parameter recovery.

Default: ‘gamma_zi’
noise_ampl (float, optional) – Small noise amplitude added to MI values for numerical stability (only used with metric_distr_type=’gamma’). When using ‘gamma_zi’, this parameter is automatically set to 0 since zero-inflated gamma handles zeros explicitly without requiring artificial noise. Default: 1e-3
ds (int, optional) – Downsampling constant. Every “ds” point will be taken from the data time series. Reduces the computational load, but needs caution since with large “ds” some important information may be lost. Experiment class performs an internal check for this effect. Default is 1
use_precomputed_stats (bool, optional) – Whether to use stats saved in Experiment instance. Stats are accumulated separately for stage1 and stage2. Notes on stats data rewriting (if save_computed_stats=True): If you want to recalculate stage1 results only, use “use_precomputed_stats=False” and “mode=’stage1’”. Stage 2 stats will be erased since they will become irrelevant. If you want to recalculate stage2 results only, use “use_precomputed_stats=True” and “mode=’stage2’” or “mode=’two-stage’” If you want to recalculate everything, use “use_precomputed_stats=False” and “mode=’two-stage’”. Default is True
save_computed_stats (bool, optional) – Whether to save computed stats to Experiment instance. Default is True
force_update (bool, optional) – Whether to force saved statistics data update in case the collision between actual data hashes and saved stats data hashes is found (for example, if neuronal or behavior data has been changed externally). Default is False
topk1 (int, optional) – True MI for stage 1 should be among topk1 MI shuffles. Default is 1
topk2 (int, optional) – True MI for stage 2 should be among topk2 MI shuffles. Default is 5
multicomp_correction (str or None, optional) – Type of multiple comparison correction. Supported types are None (no correction), “bonferroni”, “holm”, and “fdr_bh” (Benjamini-Hochberg FDR). Default is ‘holm’
pval_thr (float, optional) – P-value threshold. If multicomp_correction=None, this is a p-value for a single pair. Otherwise it is a FWER significance level. Default is 0.01
find_optimal_delays (bool, optional) – Allows slight shifting (not more than +- shift_window) of time series, selects a shift with the highest MI as default. Default is True
skip_delays (list, optional) – List of features for which delays are not applied (set to 0). Only features that exist in feat_bunch will be processed. Has no effect if find_optimal_delays = False. Default is []
shift_window (int, optional) – Window for optimal shift search (seconds). Optimal shift (in frames) will lie in the range -shift_window*fps <= opt_shift <= shift_window*fps. Has no effect if find_optimal_delays = False. Default is 2
verbose (bool, optional) – Whether to print progress messages. Default is True
enable_parallelization (bool, optional) – Whether to enable parallel processing. Default is True
n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default is -1
seed (int, optional) – Random seed for reproducibility. Default is 42
with_disentanglement (bool, optional) –
If True, performs a full INTENSE pipeline with mixed selectivity analysis:
1. Computes behavioral feature-feature significance
2. Computes neuron-feature significance
3. Disentangles mixed selectivities using behavioral correlations.
Default is False
feat_feat_pval_thr (float, optional) – P-value threshold for feature-feature significance testing during disentanglement. Separate from cell-feat pval_thr because the number of feature pairs (~100-200) is much smaller than neuron-feature pairs (thousands), so a stricter threshold is unnecessary. Only used when with_disentanglement=True. Default is 0.01
multifeature_map (dict or None, optional) – Mapping from multifeature tuples to aggregated names for disentanglement. If None, uses DEFAULT_MULTIFEATURE_MAP from disentanglement module. Only used when with_disentanglement=True. Default is None
duplicate_behavior (str, optional) –
How to handle duplicate TimeSeries in neuron or feature bunches.
- ’ignore’: Process duplicates normally (default)
- ’raise’: Raise an error if duplicates are found
- ’warn’: Print a warning but continue processing.
Default is ‘ignore’
engine ({'auto', 'fft', 'loop'}, optional) –
Computation engine for MI shuffles:
- ’auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50)
- ’fft’: Force FFT (raises error if not applicable)
- ’loop’: Force per-shift loop (original behavior)
FFT provides ~100x speedup for Stage 2. Default is ‘auto’
store_random_shifts (bool, optional) – Whether to store the random shift indices used during shuffle computation. When False (default), random_shifts1 and random_shifts2 arrays are not stored, saving significant memory (~400MB for typical datasets with N=500, M=20). Set to True if you need the shift indices for debugging or reproducibility analysis. Default is False
profile (bool, optional) –
Whether to collect internal timing information. When True, info[‘timings’] will contain execution times (in seconds) for:
- ’stage1_delay_optimization’: delay optimization (if find_optimal_delays=True)
- ’stage1_pair_scanning’: stage 1 pair scanning
- ’stage2_pair_scanning’: stage 2 pair scanning (if applicable)
- ’fft_type_counts’: Dictionary of FFT type usage counts
- ’disentanglement’: disentanglement analysis (if with_disentanglement=True)
- ’total’: sum of all timing sections
Default is False

pre_filter_func (callable or None, optional) –

Population-level filter function (or composed filter) to run BEFORE disentanglement parallel processing. Only used when with_disentanglement=True. The filter mutates neuron selectivities and pre-computes pair decisions.

Signature:

def pre_filter_func(
    neuron_selectivities,    # dict: {neuron_id: [feat1, feat2, ...]} - MUTATE
    pair_decisions,          # dict: {neuron_id: {(f1, f2): 0/0.5/1}} - MUTATE
    renames,                 # dict: {neuron_id: {new_name: (old1, old2)}} - MUTATE
    cell_feat_stats,         # Pre-computed MI values (READ ONLY)
    feat_feat_significance,  # Binary matrix (READ ONLY)
    feat_names,              # List of feature names (READ ONLY)
    **kwargs,                # User-provided extra arguments from filter_kwargs
):
    ...

Default: None (no filtering).

post_filter_func (callable or None, optional) –

Population-level filter function to run AFTER disentanglement parallel processing. Can modify pair results (e.g., tie-breaking). Only used when with_disentanglement=True.

Signature:

def post_filter_func(
    per_neuron_disent,       # dict: {nid: {'pairs': {...}, ...}} - MUTATE
    cell_feat_stats,         # Pre-computed MI values (READ ONLY)
    feat_names,              # List of feature names (READ ONLY)
    **kwargs,                # User-provided extra arguments
):
    ...

Default: None (no post-filtering).

filter_kwargs (dict or None, optional) – Dictionary of keyword arguments to pass to pre_filter_func and post_filter_func. Can include pre-extracted data like calcium_data, feature_data, thresholds, etc. Only used when with_disentanglement=True. Default: None.
use_circular_2d (bool, default=True) – If True, automatically substitute circular features with their _2d counterparts (cos, sin representation) for MI computation. This improves MI estimation accuracy for circular variables like head direction. Requires that create_circular_2d=True was used during experiment loading.

Return type:

tuple

Returns:

stats (dict of dict of dicts) – Outer dict: cells, inner dict: dynamic features, last dict: stats. Can be easily converted to pandas DataFrame by pd.DataFrame(stats)
significance (dict of dict of bools) – Significance results for each neuron-feature pair
info (dict) – Additional information from compute_me_stats
intense_res (IntenseResults) – Complete results object
disentanglement_results (dict (only if with_disentanglement=True)) – Contains:
- ’feat_feat_significance’: Feature-feature significance matrix
- ’disent_matrix’: Disentanglement results matrix
- ’count_matrix’: Count matrix from disentanglement
- ’per_neuron_disent’: Per-neuron detailed results dict mapping neuron_id to ‘pairs’, ‘renames’, and ‘final_sels’ sub-dicts.
- ’feature_names’: List of feature names
- ’summary’: Summary statistics from disentanglement

Raises:

ValueError – If data_type is not ‘calcium’ or ‘spikes’ If features are not found in experiment

Notes

shift_window is converted from seconds to frames using exp.fps
Updates exp.optimal_nf_delays as a side effect
Relative MI values are computed using appropriate neural data entropy

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> import numpy as np
>>>
>>> # Create small test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Basic neuron-feature analysis (stage1 for speed)
>>> stats, sig, info, res, _ = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     feat_bunch=['d_feat_0'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>> len(stats)  # Number of neurons analyzed
2
>>> 'd_feat_0' in stats[0]  # Feature present in results
True
>>>
>>> # With disentanglement analysis
>>> result = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     with_disentanglement=True,
...     verbose=False
... )  
...
>>> len(result)  # Returns 5 values with disentanglement
5
>>> stats, sig, info, res, disent = result
>>> 'disent_matrix' in disent
True

driada.intense.pipelines.compute_feat_feat_significance(exp, feat_bunch='all', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', engine='auto', profile=False)[source]

Compute pairwise significance between all behavioral features.

This function calculates pairwise similarity (e.g., mutual information) between all behavioral features using the two-stage INTENSE approach. The diagonal elements are set to zero as self-similarity is prevented by the check_for_coincidence mechanism in get_mi.

Parameters:

exp (Experiment) – Experiment object containing behavioral data.
feat_bunch (str, list or None) – Feature names to analyze. Default: ‘all’ (all features including multifeatures). Can be a list of specific feature names.
metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).
mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.
n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.
n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.
metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.
noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.
ds (int, optional) – Downsampling factor. Default: 1.
topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.
topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.
multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.
pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.
verbose (bool, optional) – Whether to print progress information. Default: True.
enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.
n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.
seed (int, optional) – Random seed for reproducibility. Default: 42.
duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.
engine (str, optional) – Computation engine for MI calculation: - ‘auto’: Automatically select FFT when beneficial (default) - ‘fft’: Force FFT-based computation - ‘loop’: Force loop-based computation (useful for comparison/debugging) Default: ‘auto’.
profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

similarity_matrix (ndarray) – Matrix of similarity values between features. Element [i,j] contains the similarity between feature i and feature j. Diagonal is zero.
significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.
p_value_matrix (ndarray) – Matrix of p-values for each comparison.
feature_names (list) – List of feature names corresponding to matrix indices. May include tuples for multifeatures (e.g., (‘x’, ‘y’)).
info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

Uses the two-stage INTENSE approach for efficient significance testing
Diagonal elements are zero (self-similarity check prevents computation)
The function handles both discrete and continuous variables
Supports MultiTimeSeries (e.g., place fields from x,y coordinates)
For mutual information, values are in bits
No optimal delay search is performed (delays are set to 0)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>>
>>> # Create test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=2, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Compute feature-feature correlations
>>> sim_mat, sig_mat, pval_mat, features, info = compute_feat_feat_significance(
...     exp,
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (4, 4)  # 2 discrete + 2 continuous features
True
>>> np.allclose(np.diag(sim_mat), 0)  # Diagonal is zero
True
>>>
>>> # Analyze specific features only
>>> sim_mat2, sig_mat2, pval_mat2, features2, info2 = compute_feat_feat_significance(
...     exp,
...     feat_bunch=['d_feat_0', 'd_feat_1'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat2.shape == (2, 2)
True

Raises:: ValueError – If features are not found in experiment
Return type:: tuple

Notes

Only upper triangle is computed for efficiency (matrix is symmetric)
Diagonal elements are always zero (self-similarity prevented)
No delay optimization is performed between features
Supports both discrete and continuous features
Multifeatures are created using aggregate_multiple_ts
When called with circular features, feat_bunch should contain _2d-substituted names (e.g., headdirection_2d) to match the experiment’s dynamic features after circular substitution.

driada.intense.pipelines.compute_cell_cell_significance(exp, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', profile=False)[source]

Compute pairwise functional correlations between neurons using INTENSE.

This function calculates pairwise similarity (e.g., mutual information) between all neurons using the two-stage INTENSE approach. This can reveal functionally correlated neurons that may form assemblies or functional modules.

Parameters:

exp (Experiment) – Experiment object containing neural data.
cell_bunch (int, list or None, optional) – Neuron indices to analyze. Default: None (all neurons).
data_type (str, optional) – Type of neural data: ‘calcium’ or ‘spikes’. Default: ‘calcium’.
metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).
mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.
n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.
n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.
metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.
noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.
ds (int, optional) – Downsampling factor. Default: 1.
topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.
topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.
multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.
pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.
verbose (bool, optional) – Whether to print progress information. Default: True.
enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.
n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.
seed (int, optional) – Random seed for reproducibility. Default: 42.
duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.
profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

similarity_matrix (ndarray) – Matrix of similarity values between neurons. Element [i,j] contains the similarity between neuron i and neuron j. Diagonal is zero.
significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.
p_value_matrix (ndarray) – Matrix of p-values for each comparison.
cell_ids (list) – List of cell IDs corresponding to matrix indices.
info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

Uses the two-stage INTENSE approach for efficient significance testing
Diagonal elements are zero (self-similarity check prevents computation)
For calcium imaging data, considers temporal dynamics
For spike data, uses discrete MI formulation
Can identify functional assemblies through graph analysis of significant pairs
No optimal delay search is performed (synchronous activity assumed)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from driada.information.info_base import TimeSeries
>>> import numpy as np
>>>
>>> # Create experiment with correlated neurons
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Make neurons 0 and 1 correlated
>>> noise = np.random.RandomState(42).randn(len(exp.neurons[0].ca.data)) * 0.1
>>> exp.neurons[1].ca = TimeSeries(
...     exp.neurons[0].ca.data + noise, discrete=False
... )
>>>
>>> # Compute neuron-neuron correlations
>>> sim_mat, sig_mat, pval_mat, cells, info = compute_cell_cell_significance(
...     exp,
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (3, 3)
True
>>> np.allclose(np.diag(sim_mat), 0)  # Self-correlation is zero
True
>>> sim_mat[0, 1] > sim_mat[0, 2]  # Neurons 0,1 more correlated than 0,2
True

Raises:: ValueError – If data_type is not ‘calcium’ or ‘spikes’ If spike data is missing for requested neurons
Return type:: tuple

Notes

Only upper triangle is computed for efficiency (matrix is symmetric)
Warns if all neurons have identical spike data
Computes network statistics when verbose=True
Synchronous activity assumed (no delay optimization)

driada.intense.pipelines.compute_embedding_selectivity(exp, embedding_methods=None, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, shift_window=2, remove_anti_selective=True, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42)[source]

Compute INTENSE selectivity between neurons and dimensionality reduction embeddings.

This function treats each embedding component as a dynamic feature and computes the mutual information between neural activity and embedding dimensions. This reveals how individual neurons contribute to the population-level manifold structure.

Parameters:

exp (Experiment) – Experiment object with stored embeddings
embedding_methods (str, list or None) – Names of embedding methods to analyze. If None, analyzes all stored embeddings.
cell_bunch (int, iterable or None) – Neuron indices. By default (None), all neurons will be taken
data_type (str) – Data type used for embeddings and INTENSE (‘calcium’ or ‘spikes’)
metric (str) – Similarity metric between TimeSeries (default: ‘mi’)
mi_estimator (str) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str) – Computation mode: ‘stage1’, ‘stage2’, or ‘two_stage’ (default)
n_shuffles_stage1 (int) – Number of shuffles for first stage (default: 100)
n_shuffles_stage2 (int) – Number of shuffles for second stage (default: 10000)
metric_distr_type (str) – Distribution type for shuffled metric distribution fit (default: ‘norm’)
noise_ampl (float) – Small noise amplitude added to improve numerical fit (default: 1e-3)
ds (int) – Downsampling constant (default: 1)
use_precomputed_stats (bool) – Whether to use stats saved in Experiment instance (default: True)
save_computed_stats (bool) – Whether to save computed stats to Experiment instance (default: True)
force_update (bool) – Force update saved statistics if data hash collision found (default: False)
topk1 (int) – True MI for stage 1 should be among topk1 MI shuffles (default: 1)
topk2 (int) – True MI for stage 2 should be among topk2 MI shuffles (default: 5)
multicomp_correction (str or None) – Multiple comparison correction type: None, ‘bonferroni’, or ‘holm’ (default)
pval_thr (float) – P-value threshold (default: 0.01)
find_optimal_delays (bool) – Find optimal temporal delays between neural activity and embeddings (default: True)
shift_window (int) – Window for optimal shift search in seconds (default: 2)
verbose (bool) – Print progress information (default: True)
enable_parallelization (bool) – Enable parallel computation (default: True)
n_jobs (int) – Number of parallel jobs, -1 for all cores (default: -1)
seed (int) – Random seed (default: 42)

Returns:

results – Dictionary with keys as embedding method names, each containing: - ‘stats’: Statistics for each neuron-component pair - ‘significance’: Significance results - ‘info’: Additional information from compute_me_stats - ‘intense_results’: Full IntenseResults object from INTENSE computation - ‘significant_neurons’: Dict of neurons significantly selective to embedding components - ‘n_components’: Number of embedding components - ‘component_selectivity’: For each component, list of selective neurons

Return type:

dict

Raises:

ValueError – If no embeddings found for specified data_type If embedding method not found

Notes

Temporarily adds embedding components as dynamic features
Forces use_precomputed_stats=False for temporary features
Component names follow pattern “{method}_comp{index}”
Cleanup in finally block ensures experiment state restored
Only stage2 significance is considered for results

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from sklearn.decomposition import PCA
>>> import numpy as np
>>>
>>> # Create experiment
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=5,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Create and store PCA embedding
>>> neural_data = np.array([exp.neurons[i].ca.data for i in range(5)]).T
>>> pca = PCA(n_components=2, random_state=42)
>>> embedding = pca.fit_transform(neural_data)
>>> exp.store_embedding(embedding, method_name='pca', data_type='calcium')
>>>
>>> # Compute embedding selectivity
>>> results = compute_embedding_selectivity(
...     exp,
...     embedding_methods=['pca'],
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>>
>>> 'pca' in results
True
>>> results['pca']['n_components']
2
>>> 'component_selectivity' in results['pca']
True

Main Functions

driada.intense.pipelines.compute_cell_feat_significance(exp, cell_bunch=None, feat_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, skip_delays=[], shift_window=2, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, with_disentanglement=False, feat_feat_pval_thr=0.01, multifeature_map=None, duplicate_behavior='ignore', engine='auto', store_random_shifts=False, profile=False, pre_filter_func=None, post_filter_func=None, filter_kwargs=None, remove_anti_selective=True, use_circular_2d=True)[source]

Calculates significant neuron-feature pairs

Parameters:

exp (Experiment) – Experiment object to read and write data from
cell_bunch (int, iterable or None, optional) – Neuron indices. By default, (cell_bunch=None), all neurons will be taken
feat_bunch (str, iterable or None, optional) – Feature names. By default, (feat_bunch=None), all single features will be taken
data_type (str, optional) – Data type used for INTENSE computations. Can be ‘calcium’ or ‘spikes’. Default is ‘calcium’
metric (str, optional) – Similarity metric between TimeSeries. Default is ‘mi’
mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ or ‘ksg’. Default is ‘gcmi’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str, optional) –
Computation mode. 3 modes are available:
- 'stage1': perform preliminary scanning with “n_shuffles_stage1” shuffles only. Rejects strictly non-significant neuron-feature pairs, does not give definite results about significance of the others.
- 'stage2': skip stage 1 and perform full-scale scanning (“n_shuffles_stage2” shuffles) of all neuron-feature pairs. Gives definite results, but can be very time-consuming. Also reduces statistical power of multiple comparison tests, since the number of hypotheses is very high.
- 'two_stage': prune non-significant pairs during stage 1 and perform thorough testing for the rest during stage 2. Recommended mode.
Default is ‘two_stage’
n_shuffles_stage1 (int, optional) – Number of shuffles for first stage. Default is 100
n_shuffles_stage2 (int, optional) – Number of shuffles for second stage. Default is 10000
metric_distr_type (str, optional) –
Distribution type for shuffled metric null distribution. Options:
- ’gamma_zi’ (default): Zero-inflated gamma distribution. Explicitly models the probability mass at zero that commonly occurs in MI null distributions. Provides superior goodness-of-fit and accurate parameter estimation without requiring artificial noise. Recommended for all analyses.
- ’gamma’: Standard gamma distribution with small noise added (noise_ampl) to handle zeros. Provided for backward compatibility. Less statistically principled than ‘gamma_zi’.
- Other scipy.stats distributions: ‘lognorm’, ‘norm’, etc. are supported but not recommended for MI distributions.
Recommendation: Use ‘gamma_zi’ (default) for new analyses. It achieves equivalent detection performance while providing statistically correct goodness-of-fit and accurate parameter recovery.

Default: ‘gamma_zi’
noise_ampl (float, optional) – Small noise amplitude added to MI values for numerical stability (only used with metric_distr_type=’gamma’). When using ‘gamma_zi’, this parameter is automatically set to 0 since zero-inflated gamma handles zeros explicitly without requiring artificial noise. Default: 1e-3
ds (int, optional) – Downsampling constant. Every “ds” point will be taken from the data time series. Reduces the computational load, but needs caution since with large “ds” some important information may be lost. Experiment class performs an internal check for this effect. Default is 1
use_precomputed_stats (bool, optional) – Whether to use stats saved in Experiment instance. Stats are accumulated separately for stage1 and stage2. Notes on stats data rewriting (if save_computed_stats=True): If you want to recalculate stage1 results only, use “use_precomputed_stats=False” and “mode=’stage1’”. Stage 2 stats will be erased since they will become irrelevant. If you want to recalculate stage2 results only, use “use_precomputed_stats=True” and “mode=’stage2’” or “mode=’two-stage’” If you want to recalculate everything, use “use_precomputed_stats=False” and “mode=’two-stage’”. Default is True
save_computed_stats (bool, optional) – Whether to save computed stats to Experiment instance. Default is True
force_update (bool, optional) – Whether to force saved statistics data update in case the collision between actual data hashes and saved stats data hashes is found (for example, if neuronal or behavior data has been changed externally). Default is False
topk1 (int, optional) – True MI for stage 1 should be among topk1 MI shuffles. Default is 1
topk2 (int, optional) – True MI for stage 2 should be among topk2 MI shuffles. Default is 5
multicomp_correction (str or None, optional) – Type of multiple comparison correction. Supported types are None (no correction), “bonferroni”, “holm”, and “fdr_bh” (Benjamini-Hochberg FDR). Default is ‘holm’
pval_thr (float, optional) – P-value threshold. If multicomp_correction=None, this is a p-value for a single pair. Otherwise it is a FWER significance level. Default is 0.01
find_optimal_delays (bool, optional) – Allows slight shifting (not more than +- shift_window) of time series, selects a shift with the highest MI as default. Default is True
skip_delays (list, optional) – List of features for which delays are not applied (set to 0). Only features that exist in feat_bunch will be processed. Has no effect if find_optimal_delays = False. Default is []
shift_window (int, optional) – Window for optimal shift search (seconds). Optimal shift (in frames) will lie in the range -shift_window*fps <= opt_shift <= shift_window*fps. Has no effect if find_optimal_delays = False. Default is 2
verbose (bool, optional) – Whether to print progress messages. Default is True
enable_parallelization (bool, optional) – Whether to enable parallel processing. Default is True
n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default is -1
seed (int, optional) – Random seed for reproducibility. Default is 42
with_disentanglement (bool, optional) –
If True, performs a full INTENSE pipeline with mixed selectivity analysis:
1. Computes behavioral feature-feature significance
2. Computes neuron-feature significance
3. Disentangles mixed selectivities using behavioral correlations.
Default is False
feat_feat_pval_thr (float, optional) – P-value threshold for feature-feature significance testing during disentanglement. Separate from cell-feat pval_thr because the number of feature pairs (~100-200) is much smaller than neuron-feature pairs (thousands), so a stricter threshold is unnecessary. Only used when with_disentanglement=True. Default is 0.01
multifeature_map (dict or None, optional) – Mapping from multifeature tuples to aggregated names for disentanglement. If None, uses DEFAULT_MULTIFEATURE_MAP from disentanglement module. Only used when with_disentanglement=True. Default is None
duplicate_behavior (str, optional) –
How to handle duplicate TimeSeries in neuron or feature bunches.
- ’ignore’: Process duplicates normally (default)
- ’raise’: Raise an error if duplicates are found
- ’warn’: Print a warning but continue processing.
Default is ‘ignore’
engine ({'auto', 'fft', 'loop'}, optional) –
Computation engine for MI shuffles:
- ’auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50)
- ’fft’: Force FFT (raises error if not applicable)
- ’loop’: Force per-shift loop (original behavior)
FFT provides ~100x speedup for Stage 2. Default is ‘auto’
store_random_shifts (bool, optional) – Whether to store the random shift indices used during shuffle computation. When False (default), random_shifts1 and random_shifts2 arrays are not stored, saving significant memory (~400MB for typical datasets with N=500, M=20). Set to True if you need the shift indices for debugging or reproducibility analysis. Default is False
profile (bool, optional) –
Whether to collect internal timing information. When True, info[‘timings’] will contain execution times (in seconds) for:
- ’stage1_delay_optimization’: delay optimization (if find_optimal_delays=True)
- ’stage1_pair_scanning’: stage 1 pair scanning
- ’stage2_pair_scanning’: stage 2 pair scanning (if applicable)
- ’fft_type_counts’: Dictionary of FFT type usage counts
- ’disentanglement’: disentanglement analysis (if with_disentanglement=True)
- ’total’: sum of all timing sections
Default is False

pre_filter_func (callable or None, optional) –

Population-level filter function (or composed filter) to run BEFORE disentanglement parallel processing. Only used when with_disentanglement=True. The filter mutates neuron selectivities and pre-computes pair decisions.

Signature:

def pre_filter_func(
    neuron_selectivities,    # dict: {neuron_id: [feat1, feat2, ...]} - MUTATE
    pair_decisions,          # dict: {neuron_id: {(f1, f2): 0/0.5/1}} - MUTATE
    renames,                 # dict: {neuron_id: {new_name: (old1, old2)}} - MUTATE
    cell_feat_stats,         # Pre-computed MI values (READ ONLY)
    feat_feat_significance,  # Binary matrix (READ ONLY)
    feat_names,              # List of feature names (READ ONLY)
    **kwargs,                # User-provided extra arguments from filter_kwargs
):
    ...

Default: None (no filtering).

post_filter_func (callable or None, optional) –

Population-level filter function to run AFTER disentanglement parallel processing. Can modify pair results (e.g., tie-breaking). Only used when with_disentanglement=True.

Signature:

def post_filter_func(
    per_neuron_disent,       # dict: {nid: {'pairs': {...}, ...}} - MUTATE
    cell_feat_stats,         # Pre-computed MI values (READ ONLY)
    feat_names,              # List of feature names (READ ONLY)
    **kwargs,                # User-provided extra arguments
):
    ...

Default: None (no post-filtering).

filter_kwargs (dict or None, optional) – Dictionary of keyword arguments to pass to pre_filter_func and post_filter_func. Can include pre-extracted data like calcium_data, feature_data, thresholds, etc. Only used when with_disentanglement=True. Default: None.
use_circular_2d (bool, default=True) – If True, automatically substitute circular features with their _2d counterparts (cos, sin representation) for MI computation. This improves MI estimation accuracy for circular variables like head direction. Requires that create_circular_2d=True was used during experiment loading.

Return type:

tuple

Returns:

stats (dict of dict of dicts) – Outer dict: cells, inner dict: dynamic features, last dict: stats. Can be easily converted to pandas DataFrame by pd.DataFrame(stats)
significance (dict of dict of bools) – Significance results for each neuron-feature pair
info (dict) – Additional information from compute_me_stats
intense_res (IntenseResults) – Complete results object
disentanglement_results (dict (only if with_disentanglement=True)) – Contains:
- ’feat_feat_significance’: Feature-feature significance matrix
- ’disent_matrix’: Disentanglement results matrix
- ’count_matrix’: Count matrix from disentanglement
- ’per_neuron_disent’: Per-neuron detailed results dict mapping neuron_id to ‘pairs’, ‘renames’, and ‘final_sels’ sub-dicts.
- ’feature_names’: List of feature names
- ’summary’: Summary statistics from disentanglement

Raises:

ValueError – If data_type is not ‘calcium’ or ‘spikes’ If features are not found in experiment

Notes

shift_window is converted from seconds to frames using exp.fps
Updates exp.optimal_nf_delays as a side effect
Relative MI values are computed using appropriate neural data entropy

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> import numpy as np
>>>
>>> # Create small test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Basic neuron-feature analysis (stage1 for speed)
>>> stats, sig, info, res, _ = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     feat_bunch=['d_feat_0'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>> len(stats)  # Number of neurons analyzed
2
>>> 'd_feat_0' in stats[0]  # Feature present in results
True
>>>
>>> # With disentanglement analysis
>>> result = compute_cell_feat_significance(
...     exp,
...     cell_bunch=[0, 1],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     with_disentanglement=True,
...     verbose=False
... )  
...
>>> len(result)  # Returns 5 values with disentanglement
5
>>> stats, sig, info, res, disent = result
>>> 'disent_matrix' in disent
True

driada.intense.pipelines.compute_feat_feat_significance(exp, feat_bunch='all', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', engine='auto', profile=False)[source]

Compute pairwise significance between all behavioral features.

This function calculates pairwise similarity (e.g., mutual information) between all behavioral features using the two-stage INTENSE approach. The diagonal elements are set to zero as self-similarity is prevented by the check_for_coincidence mechanism in get_mi.

Parameters:

exp (Experiment) – Experiment object containing behavioral data.
feat_bunch (str, list or None) – Feature names to analyze. Default: ‘all’ (all features including multifeatures). Can be a list of specific feature names.
metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).
mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.
n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.
n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.
metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.
noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.
ds (int, optional) – Downsampling factor. Default: 1.
topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.
topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.
multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.
pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.
verbose (bool, optional) – Whether to print progress information. Default: True.
enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.
n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.
seed (int, optional) – Random seed for reproducibility. Default: 42.
duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.
engine (str, optional) – Computation engine for MI calculation: - ‘auto’: Automatically select FFT when beneficial (default) - ‘fft’: Force FFT-based computation - ‘loop’: Force loop-based computation (useful for comparison/debugging) Default: ‘auto’.
profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

similarity_matrix (ndarray) – Matrix of similarity values between features. Element [i,j] contains the similarity between feature i and feature j. Diagonal is zero.
significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.
p_value_matrix (ndarray) – Matrix of p-values for each comparison.
feature_names (list) – List of feature names corresponding to matrix indices. May include tuples for multifeatures (e.g., (‘x’, ‘y’)).
info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

Uses the two-stage INTENSE approach for efficient significance testing
Diagonal elements are zero (self-similarity check prevents computation)
The function handles both discrete and continuous variables
Supports MultiTimeSeries (e.g., place fields from x,y coordinates)
For mutual information, values are in bits
No optimal delay search is performed (delays are set to 0)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>>
>>> # Create test experiment
>>> exp = generate_synthetic_exp(n_dfeats=2, n_cfeats=2, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Compute feature-feature correlations
>>> sim_mat, sig_mat, pval_mat, features, info = compute_feat_feat_significance(
...     exp,
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (4, 4)  # 2 discrete + 2 continuous features
True
>>> np.allclose(np.diag(sim_mat), 0)  # Diagonal is zero
True
>>>
>>> # Analyze specific features only
>>> sim_mat2, sig_mat2, pval_mat2, features2, info2 = compute_feat_feat_significance(
...     exp,
...     feat_bunch=['d_feat_0', 'd_feat_1'],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat2.shape == (2, 2)
True

Raises:: ValueError – If features are not found in experiment
Return type:: tuple

Notes

Only upper triangle is computed for efficiency (matrix is symmetric)
Diagonal elements are always zero (self-similarity prevented)
No delay optimization is performed between features
Supports both discrete and continuous features
Multifeatures are created using aggregate_multiple_ts
When called with circular features, feat_bunch should contain _2d-substituted names (e.g., headdirection_2d) to match the experiment’s dynamic features after circular substitution.

driada.intense.pipelines.compute_cell_cell_significance(exp, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=1000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42, duplicate_behavior='ignore', profile=False)[source]

Compute pairwise functional correlations between neurons using INTENSE.

This function calculates pairwise similarity (e.g., mutual information) between all neurons using the two-stage INTENSE approach. This can reveal functionally correlated neurons that may form assemblies or functional modules.

Parameters:

exp (Experiment) – Experiment object containing neural data.
cell_bunch (int, list or None, optional) – Neuron indices to analyze. Default: None (all neurons).
data_type (str, optional) – Type of neural data: ‘calcium’ or ‘spikes’. Default: ‘calcium’.
metric (str, optional) – Similarity metric to use. Default: ‘mi’ (mutual information).
mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str, optional) – Computation mode: ‘two_stage’, ‘stage1’, or ‘stage2’. Default: ‘two_stage’.
n_shuffles_stage1 (int, optional) – Number of shuffles for stage 1. Default: 100.
n_shuffles_stage2 (int, optional) – Number of shuffles for stage 2. Default: 1000.
metric_distr_type (str, optional) – Distribution type for metric null distribution (‘gamma_zi’, ‘gamma’, etc.). Default: ‘gamma_zi’.
noise_ampl (float, optional) – Small noise amplitude for numerical stability. Default: 1e-3.
ds (int, optional) – Downsampling factor. Default: 1.
topk1 (int, optional) – Top-k criterion for stage 1. Default: 1.
topk2 (int, optional) – Top-k criterion for stage 2. Default: 5.
multicomp_correction (str or None, optional) – Multiple comparison correction method. Default: ‘holm’.
pval_thr (float, optional) – P-value threshold for significance. Default: 0.01.
verbose (bool, optional) – Whether to print progress information. Default: True.
enable_parallelization (bool, optional) – Whether to use parallel processing. Default: True.
n_jobs (int, optional) – Number of parallel jobs. -1 means use all processors. Default: -1.
seed (int, optional) – Random seed for reproducibility. Default: 42.
duplicate_behavior (str, optional) – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing Default: ‘ignore’.
profile (bool, optional) – If True, collect timing and FFT type information. Default: False. When enabled, info[‘timings’] will contain: - ‘stage1_pair_scanning’: Stage 1 scanning time (seconds) - ‘stage2_pair_scanning’: Stage 2 scanning time if applicable (seconds) - ‘fft_type_counts’: Dictionary of FFT type usage counts - ‘matrix_construction’: Symmetric matrix construction time (seconds) - ‘total’: Total execution time (seconds)

Return type:

tuple

Returns:

similarity_matrix (ndarray) – Matrix of similarity values between neurons. Element [i,j] contains the similarity between neuron i and neuron j. Diagonal is zero.
significance_matrix (ndarray) – Matrix of binary significance values. 1 indicates significant similarity.
p_value_matrix (ndarray) – Matrix of p-values for each comparison.
cell_ids (list) – List of cell IDs corresponding to matrix indices.
info (dict) – Dictionary containing additional information from compute_me_stats.

Notes

Uses the two-stage INTENSE approach for efficient significance testing
Diagonal elements are zero (self-similarity check prevents computation)
For calcium imaging data, considers temporal dynamics
For spike data, uses discrete MI formulation
Can identify functional assemblies through graph analysis of significant pairs
No optimal delay search is performed (synchronous activity assumed)

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from driada.information.info_base import TimeSeries
>>> import numpy as np
>>>
>>> # Create experiment with correlated neurons
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=3,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Make neurons 0 and 1 correlated
>>> noise = np.random.RandomState(42).randn(len(exp.neurons[0].ca.data)) * 0.1
>>> exp.neurons[1].ca = TimeSeries(
...     exp.neurons[0].ca.data + noise, discrete=False
... )
>>>
>>> # Compute neuron-neuron correlations
>>> sim_mat, sig_mat, pval_mat, cells, info = compute_cell_cell_significance(
...     exp,
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )
>>> sim_mat.shape == (3, 3)
True
>>> np.allclose(np.diag(sim_mat), 0)  # Self-correlation is zero
True
>>> sim_mat[0, 1] > sim_mat[0, 2]  # Neurons 0,1 more correlated than 0,2
True

Raises:: ValueError – If data_type is not ‘calcium’ or ‘spikes’ If spike data is missing for requested neurons
Return type:: tuple

Notes

Only upper triangle is computed for efficiency (matrix is symmetric)
Warns if all neurons have identical spike data
Computes network statistics when verbose=True
Synchronous activity assumed (no delay optimization)

driada.intense.pipelines.compute_embedding_selectivity(exp, embedding_methods=None, cell_bunch=None, data_type='calcium', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, mode='two_stage', n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, use_precomputed_stats=True, save_computed_stats=True, force_update=False, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=True, shift_window=2, remove_anti_selective=True, verbose=True, enable_parallelization=True, n_jobs=-1, seed=42)[source]

Compute INTENSE selectivity between neurons and dimensionality reduction embeddings.

This function treats each embedding component as a dynamic feature and computes the mutual information between neural activity and embedding dimensions. This reveals how individual neurons contribute to the population-level manifold structure.

Parameters:

exp (Experiment) – Experiment object with stored embeddings
embedding_methods (str, list or None) – Names of embedding methods to analyze. If None, analyzes all stored embeddings.
cell_bunch (int, iterable or None) – Neuron indices. By default (None), all neurons will be taken
data_type (str) – Data type used for embeddings and INTENSE (‘calcium’ or ‘spikes’)
metric (str) – Similarity metric between TimeSeries (default: ‘mi’)
mi_estimator (str) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’
mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.
mode (str) – Computation mode: ‘stage1’, ‘stage2’, or ‘two_stage’ (default)
n_shuffles_stage1 (int) – Number of shuffles for first stage (default: 100)
n_shuffles_stage2 (int) – Number of shuffles for second stage (default: 10000)
metric_distr_type (str) – Distribution type for shuffled metric distribution fit (default: ‘norm’)
noise_ampl (float) – Small noise amplitude added to improve numerical fit (default: 1e-3)
ds (int) – Downsampling constant (default: 1)
use_precomputed_stats (bool) – Whether to use stats saved in Experiment instance (default: True)
save_computed_stats (bool) – Whether to save computed stats to Experiment instance (default: True)
force_update (bool) – Force update saved statistics if data hash collision found (default: False)
topk1 (int) – True MI for stage 1 should be among topk1 MI shuffles (default: 1)
topk2 (int) – True MI for stage 2 should be among topk2 MI shuffles (default: 5)
multicomp_correction (str or None) – Multiple comparison correction type: None, ‘bonferroni’, or ‘holm’ (default)
pval_thr (float) – P-value threshold (default: 0.01)
find_optimal_delays (bool) – Find optimal temporal delays between neural activity and embeddings (default: True)
shift_window (int) – Window for optimal shift search in seconds (default: 2)
verbose (bool) – Print progress information (default: True)
enable_parallelization (bool) – Enable parallel computation (default: True)
n_jobs (int) – Number of parallel jobs, -1 for all cores (default: -1)
seed (int) – Random seed (default: 42)

Returns:

results – Dictionary with keys as embedding method names, each containing: - ‘stats’: Statistics for each neuron-component pair - ‘significance’: Significance results - ‘info’: Additional information from compute_me_stats - ‘intense_results’: Full IntenseResults object from INTENSE computation - ‘significant_neurons’: Dict of neurons significantly selective to embedding components - ‘n_components’: Number of embedding components - ‘component_selectivity’: For each component, list of selective neurons

Return type:

dict

Raises:

ValueError – If no embeddings found for specified data_type If embedding method not found

Notes

Temporarily adds embedding components as dynamic features
Forces use_precomputed_stats=False for temporary features
Component names follow pattern “{method}_comp{index}”
Cleanup in finally block ensures experiment state restored
Only stage2 significance is considered for results

Examples

>>> from driada.experiment.synthetic import generate_synthetic_exp
>>> from sklearn.decomposition import PCA
>>> import numpy as np
>>>
>>> # Create experiment
>>> exp = generate_synthetic_exp(n_dfeats=1, n_cfeats=1, nneurons=5,
...                              duration=60, fps=10, seed=42, verbose=False)
>>>
>>> # Create and store PCA embedding
>>> neural_data = np.array([exp.neurons[i].ca.data for i in range(5)]).T
>>> pca = PCA(n_components=2, random_state=42)
>>> embedding = pca.fit_transform(neural_data)
>>> exp.store_embedding(embedding, method_name='pca', data_type='calcium')
>>>
>>> # Compute embedding selectivity
>>> results = compute_embedding_selectivity(
...     exp,
...     embedding_methods=['pca'],
...     cell_bunch=[0, 1, 2],
...     mode='stage1',
...     n_shuffles_stage1=10,
...     verbose=False
... )  
...
>>>
>>> 'pca' in results
True
>>> results['pca']['n_components']
2
>>> 'component_selectivity' in results['pca']
True

Usage Example

from driada.intense import compute_cell_feat_significance
from driada.experiment import load_demo_experiment

exp = load_demo_experiment()

stats, significance, info, results = compute_cell_feat_significance(
    exp,
    n_shuffles_stage1=100,
    n_shuffles_stage2=1000,
    ds=5,
    find_optimal_delays=False,
)