INTENSE Core Implementation

class driada.intense.intense_base.StageConfig(stage_num, n_shuffles, mask, topk, pval_thr=None, multicomp_correction=None)[source]

Configuration for a single stage of INTENSE computation.

Encapsulates all stage-specific parameters to enable unified scan_stage() function for both Stage 1 and Stage 2.

Parameters:
stage_num

Stage number (1 or 2).

Type:

int

n_shuffles

Number of shuffles for this stage.

Type:

int

mask

Binary mask indicating which pairs to compute.

Type:

np.ndarray

topk

True MI should rank in top k among shuffles.

Type:

int

pval_thr

Base p-value threshold (Stage 2 only). Default 0.05.

Type:

float, optional

multicomp_correction

Multiple comparison correction method (Stage 2 only). Options: ‘holm’, ‘bonferroni’, etc.

Type:

str, optional

stage_num: int
n_shuffles: int
mask: ndarray
topk: int
pval_thr: Optional[float] = None
multicomp_correction: Optional[str] = None
__init__(stage_num, n_shuffles, mask, topk, pval_thr=None, multicomp_correction=None)
Parameters:
Return type:

None

driada.intense.intense_base.get_calcium_feature_me_profile(exp, cell_id=None, feat_id=None, cbunch=None, fbunch=None, shift_window=2, ds=1, metric='mi', mi_estimator='gcmi', data_type='calcium')[source]

Compute metric profile between neurons and behavioral features across time shifts.

Parameters:
  • exp (Experiment) – Experiment object containing neurons and behavioral features.

  • cell_id (int, optional) – Index of a single neuron in exp.neurons. Deprecated - use cbunch instead.

  • feat_id (str or tuple of str, optional) – Single feature name(s) to analyze. Deprecated - use fbunch instead.

  • cbunch (int, iterable or None, optional) – Neuron indices. If None (default), all neurons will be analyzed. Takes precedence over cell_id if both provided.

  • fbunch (str, iterable or None, optional) – Feature names. If None (default), all single features will be analyzed. Takes precedence over feat_id if both provided.

  • shift_window (int, optional) – Maximum shift to test in each direction (seconds). Default: 2. Converted to frames internally using exp.fps.

  • ds (int, optional) – Downsampling factor. Default: 1 (no downsampling).

  • metric (str, optional) – Similarity metric to compute. Default: ‘mi’. - ‘mi’: Mutual information - ‘spearman’: Spearman correlation - Other metrics supported by get_sim function

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • data_type (str, optional) – Type of neural data to use. Default: ‘calcium’. - ‘calcium’: Use calcium imaging data - ‘spikes’: Use spike data

Returns:

If single cell_id and feat_id provided (backward compatibility):

{‘me0’: float, ‘shifted_me’: list of float}

If cbunch or fbunch used:

Nested dictionary with structure: {cell_id: {feat_id: {‘me0’: float, ‘shifted_me’: list}}} where shifted_me contains metric values from -window to +window.

Return type:

dict

Notes

  • shift_window is in seconds, converted to frames using exp.fps

  • Total number of shifts tested: 2 * shift_window * fps / ds

  • Multi-feature analysis (tuple feat_id) only supported for metric=’mi’

  • Progress bar shows computation progress

Examples

This function requires an Experiment object, which contains neural recordings and behavioral features. Here’s a conceptual example:

>>> # Pseudo-code example (requires actual Experiment object):
>>> # exp = load_experiment()  # Load your experiment data
>>> #
>>> # # Analyze MI between neuron 0 and speed feature
>>> # me0, profile = get_calcium_feature_me_profile(exp, 0, 'speed',
>>> #                                              window=100, ds=5)
>>> #
>>> # # Or analyze multiple neurons and features at once
>>> # results = get_calcium_feature_me_profile(exp, cbunch=[0, 1],
>>> #                                          fbunch=['speed', 'direction'],
>>> #                                          window=50, ds=2)
>>> # # Access results: results[neuron_id][feature_name]['me0']
>>> pass  # Actual usage requires Experiment object
driada.intense.intense_base.scan_pairs(ts_bunch1, ts_bunch2, metric, nsh, optimal_delays, random_shifts=None, mi_estimator='gcmi', ds=1, mask=None, noise_const=0.001, seed=None, enable_progressbar=True, engine='auto', fft_cache=None, mi_estimator_kwargs=None)[source]

Calculate similarity metric and shuffled distributions for pairs of time series.

This function computes the similarity metric between all pairs from ts_bunch1 and ts_bunch2, along with shuffled distributions for significance testing.

Parameters:
  • ts_bunch1 (list of TimeSeries or MultiTimeSeries) – First set of time series (typically neural signals).

  • ts_bunch2 (list of TimeSeries or MultiTimeSeries) – Second set of time series (typically behavioral variables).

  • metric (str) – Similarity metric to compute. See validate_metric for supported options.

  • nsh (int) – Number of shuffles for significance testing.

  • optimal_delays (np.ndarray) – Optimal delays array of shape (len(ts_bunch1), len(ts_bunch2)). Contains best shifts in frames.

  • random_shifts (np.ndarray, optional) – Pre-generated random shifts of shape (len(ts_bunch1), len(ts_bunch2), nsh). If None, shifts will be generated using seed and stable keys.

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ (Gaussian copula) or ‘ksg’ (k-nearest neighbors).

  • ds (int, default=1) – Downsampling factor. Every ds-th point is used from the time series.

  • mask (np.ndarray, optional) – Binary mask array of shape (len(ts_bunch1), len(ts_bunch2)). 0 skips calculation, 1 proceeds.

  • noise_const (float, default=1e-3) – Small noise amplitude added to improve numerical stability.

  • seed (int, optional) – Random seed for reproducibility.

  • enable_progressbar (bool, default=True) – Whether to show progress bar during computation.

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior)

  • fft_cache (dict, optional) – Pre-computed FFT cache from _build_fft_cache. Keys are (key1, key2) tuples using stable identifiers from _get_ts_key(). If provided, avoids redundant data extraction.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[ndarray, ndarray]

Returns:

  • random_shifts (np.ndarray) – Array of shape (len(ts_bunch1), len(ts_bunch2), nsh) containing random shifts used for shuffled distribution computation.

  • me_total (np.ndarray) – Array of shape (len(ts_bunch1), len(ts_bunch2), nsh+1). Contains true metric values at index 0 and shuffled values at indices 1:nsh+1.

Notes

  • True metric values: me_total[:,:,0]

  • Shuffled values: me_total[:,:,1:]

  • Random shifts are drawn uniformly from time series length

  • Noise is added as: value * (1 + noise_const * U(-1,1))

  • FFT optimization provides ~100x speedup for univariate continuous GCMI

driada.intense.intense_base.scan_pairs_parallel(ts_bunch1, ts_bunch2, metric, nsh, optimal_delays, mi_estimator='gcmi', ds=1, mask=None, noise_const=0.001, seed=None, n_jobs=-1, engine='auto', fft_cache=None, mi_estimator_kwargs=None)[source]

Calculate metric values and shuffles for time series pairs using parallel processing.

Parameters:
  • ts_bunch1 (list of TimeSeries) – First set of time series.

  • ts_bunch2 (list of TimeSeries) – Second set of time series.

  • metric (str) – Similarity metric to compute: - ‘mi’: Mutual information - ‘spearman’: Spearman correlation - Other metrics supported by get_sim function

  • nsh (int) – Number of shuffles to perform.

  • optimal_delays (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2))) – Pre-computed optimal delays for each pair.

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ (Gaussian copula) or ‘ksg’ (k-nearest neighbors).

  • ds (int, default=1) – Downsampling factor.

  • mask (np.ndarray, optional) – Binary mask of shape (len(ts_bunch1), len(ts_bunch2)). 0 = skip computation, 1 = compute. Default: all ones.

  • noise_const (float, default=1e-3) – Small noise added to improve numerical stability.

  • seed (int, optional) – Random seed for reproducibility.

  • n_jobs (int, default=-1) – Number of parallel jobs. -1 uses all cores.

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior)

  • fft_cache (dict, optional) – Pre-computed FFT cache mapping (key1, key2) tuples to FFTCacheEntry objects. Keys are stable identifiers from _get_ts_key(). If provided, avoids redundant data extraction. If None, FFT type is computed fresh for each pair.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[ndarray, ndarray]

Returns:

  • random_shifts (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh)) – Random shifts used for shuffling.

  • me_total (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh+1)) – Metric values. [:,:,0] contains true values, [:,:,1:] contains shuffles.

Raises:

ValueError – If input validation fails or parameters are invalid.

Notes

  • Parallelization is done by splitting ts_bunch1 across workers

  • Each worker handles a subset of ts_bunch1 against all of ts_bunch2

  • Uses threading backend if PyTorch present (checked lazily), else loky

  • Random seeding ensures reproducibility across different mask configurations

  • FFT optimization provides ~100x speedup for univariate continuous GCMI

See also

scan_pairs

Sequential version of this function

scan_pairs_router

Wrapper that chooses between parallel and sequential

Examples

>>> # Minimal example with 2x2 pairs
>>> import numpy as np
>>> from driada.information.info_base import TimeSeries
>>> np.random.seed(42)  # For reproducibility
>>> # Small data: 2 neurons, 2 behaviors, 50 timepoints
>>> neurons = [TimeSeries(np.random.randn(50), discrete=False) for _ in range(2)]
>>> behaviors = [TimeSeries(np.random.randn(50), discrete=False) for _ in range(2)]
>>> delays = np.zeros((2, 2), dtype=int)  # No delays
>>> # Just 5 shuffles for demonstration
>>> shifts, metrics = scan_pairs_parallel(neurons, behaviors, 'mi',
...                                      5, delays, n_jobs=1, seed=42)
>>> shifts.shape
(2, 2, 5)
>>> metrics.shape  # Original + 5 shuffles = 6 total
(2, 2, 6)
driada.intense.intense_base.scan_pairs_router(ts_bunch1, ts_bunch2, metric, nsh, optimal_delays, mi_estimator='gcmi', ds=1, mask=None, noise_const=0.001, seed=None, enable_parallelization=True, n_jobs=-1, engine='auto', fft_cache=None, mi_estimator_kwargs=None)[source]

Route metric computation to parallel or sequential implementation.

Parameters:
  • ts_bunch1 (list of TimeSeries) – First set of time series.

  • ts_bunch2 (list of TimeSeries) – Second set of time series.

  • metric (str) – Similarity metric to compute: - ‘mi’: Mutual information - ‘spearman’: Spearman correlation - Other metrics supported by get_sim function

  • nsh (int) – Number of shuffles to perform.

  • optimal_delays (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2))) – Pre-computed optimal delays for each pair.

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ (Gaussian copula) or ‘ksg’ (k-nearest neighbors).

  • ds (int, default=1) – Downsampling factor.

  • mask (np.ndarray, optional) – Binary mask of shape (len(ts_bunch1), len(ts_bunch2)). 0 = skip computation, 1 = compute. Default: all ones.

  • noise_const (float, default=1e-3) – Small noise added to improve numerical stability.

  • seed (int, optional) – Random seed for reproducibility.

  • enable_parallelization (bool, default=True) – Whether to use parallel processing.

  • n_jobs (int, default=-1) – Number of parallel jobs if parallelization enabled. -1 uses all cores.

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior)

  • fft_cache (dict, optional) – Pre-computed FFT cache mapping (global_i, j) tuples to FFTCacheEntry objects. If provided, avoids redundant data extraction. If None, FFT type is computed fresh for each pair. Use _build_fft_cache() to create this cache.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[ndarray, ndarray]

Returns:

  • random_shifts (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh)) – Random shifts used for shuffling.

  • me_total (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh+1)) – Metric values. [:,:,0] contains true values, [:,:,1:] contains shuffles.

Notes

This function automatically chooses between sequential and parallel implementations based on the enable_parallelization flag. It’s the recommended entry point for scan_pairs functionality.

FFT optimization provides ~100x speedup for univariate continuous GCMI.

See also

scan_pairs

Sequential implementation

scan_pairs_parallel

Parallel implementation

Examples

>>> # Router example - chooses sequential or parallel execution
>>> import numpy as np
>>> from driada.information.info_base import TimeSeries
>>> np.random.seed(42)
>>> # Minimal data for fast execution
>>> neurons = [TimeSeries(np.random.randn(30), discrete=False) for _ in range(2)]
>>> behaviors = [TimeSeries(np.random.randn(30), discrete=False) for _ in range(2)]
>>> delays = np.zeros((2, 2), dtype=int)  # No delays
>>> # Use sequential mode (enable_parallelization=False)
>>> shifts, metrics = scan_pairs_router(neurons, behaviors, 'mi',
...                                    3, delays, enable_parallelization=False, seed=42)
>>> metrics.shape  # 1 original + 3 shuffles = 4 total
(2, 2, 4)
>>> # First slice contains actual MI values
>>> metrics[:, :, 0].shape
(2, 2)
driada.intense.intense_base.scan_stage(ts_bunch1, ts_bunch2, config, optimal_delays, metric, mi_estimator, metric_distr_type, noise_const, ds, seed, enable_parallelization, n_jobs, engine, fft_cache=None, verbose=True, mi_estimator_kwargs=None)[source]

Execute a single stage of INTENSE computation.

This function encapsulates the common logic between Stage 1 and Stage 2:

  1. Scan pairs to compute metric values and shuffle distributions

  2. Compute statistical tables from the results

  3. Apply the appropriate criterion (Stage 1: rank-based filtering using topk; Stage 2: p-value based with multiple comparison correction)

For Stage 2, the multiple comparison correction threshold is computed internally from the stage statistics using config.pval_thr and config.multicomp_correction.

Parameters:
  • ts_bunch1 (list of TimeSeries or MultiTimeSeries) – First set of time series (typically neural signals).

  • ts_bunch2 (list of TimeSeries or MultiTimeSeries) – Second set of time series (typically behavioral variables).

  • config (StageConfig) – Configuration for this stage (stage number, n_shuffles, mask, topk, etc.).

  • optimal_delays (np.ndarray) – Optimal delays array of shape (len(ts_bunch1), len(ts_bunch2)).

  • metric (str) – Similarity metric to compute.

  • mi_estimator (str) – Mutual information estimator (‘gcmi’ or ‘ksg’).

  • metric_distr_type (str) – Distribution type for fitting shuffled metric values.

  • noise_const (float) – Small noise amplitude added for numerical stability.

  • ds (int) – Downsampling factor.

  • seed (int) – Random seed for reproducibility.

  • enable_parallelization (bool) – Whether to use parallel processing.

  • n_jobs (int) – Number of parallel jobs if parallelization enabled.

  • engine (str) – Computation engine (‘auto’, ‘fft’, ‘loop’).

  • fft_cache (dict, optional) – Pre-computed FFT cache for accelerated computation.

  • verbose (bool, default=True) – Whether to print stage information.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[dict, dict, dict]

Returns:

  • stage_stats (dict) – Statistical results for all pairs from get_table_of_stats.

  • stage_significance (dict) – Significance results for all pairs from apply_stage_criterion.

  • stage_info (dict) – Additional information including: - ‘random_shifts’: Random shifts array used for shuffling - ‘me_total’: Full metric values array (true + shuffles) - ‘pass_mask’: Binary mask of pairs that passed the criterion - ‘multicorr_thr’: Multiple comparison threshold (Stage 2 only, None for Stage 1)

driada.intense.intense_base.compute_me_stats(ts_bunch1, ts_bunch2, names1=None, names2=None, mode='two_stage', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, precomputed_mask_stage1=None, precomputed_mask_stage2=None, n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=False, skip_delays=[], shift_window=100, verbose=True, seed=None, enable_parallelization=True, n_jobs=-1, duplicate_behavior='ignore', engine='auto', store_random_shifts=False, profile=False)[source]

Calculates similarity metric statistics for TimeSeries or MultiTimeSeries pairs

Parameters:
  • ts_bunch1 (list of TimeSeries objects) – First set of time series

  • ts_bunch2 (list of TimeSeries objects) – Second set of time series

  • names1 (list of str, optional) – names than will be given to time series from tsbunch1 in final results

  • names2 (list of str, optional) – names than will be given to time series from tsbunch2 in final results

  • mode (str, default='two_stage') –

    Computation mode. Options:

    • 'stage1': preliminary scanning with n_shuffles_stage1 shuffles only. Rejects strictly non-significant pairs, does not give definite results about significance of the others.

    • 'stage2': skip stage 1, perform full-scale scanning (n_shuffles_stage2 shuffles) of all pairs. Gives definite results but can be very time-consuming. Also reduces statistical power of multiple comparison tests since the number of hypotheses is very high.

    • 'two_stage': prune non-significant pairs during stage 1 then perform thorough testing for the rest during stage 2. Recommended.

  • metric (str, default='mi') – similarity metric between TimeSeries

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • precomputed_mask_stage1 (np.array, optional) – precomputed mask for skipping some of possible pairs in stage 1. Shape: (len(ts_bunch1), len(ts_bunch2)) 0 in mask values means calculation will be skipped. 1 in mask values means calculation will proceed.

  • precomputed_mask_stage2 (np.array, optional) – precomputed mask for skipping some of possible pairs in stage 2. Shape: (len(ts_bunch1), len(ts_bunch2)) 0 in mask values means calculation will be skipped. 1 in mask values means calculation will proceed.

  • n_shuffles_stage1 (int, default=100) – number of shuffles for first stage

  • n_shuffles_stage2 (int, default=10000) – number of shuffles for second stage

  • metric_distr_type (str, default="gamma_zi") –

    Distribution type for shuffled metric null distribution. Options:

    • ’gamma_zi’ (default): Zero-inflated gamma distribution. Explicitly models the probability mass at zero that commonly occurs in MI null distributions. Provides superior goodness-of-fit and accurate parameter estimation without requiring artificial noise.

    • ’gamma’: Standard gamma distribution with small noise added (noise_ampl) to handle zeros. Provided for backward compatibility. Less statistically principled than ‘gamma_zi’.

    • Other scipy.stats distributions: ‘lognorm’, ‘norm’, etc. are supported but not recommended for MI distributions.

  • noise_ampl (float, default=1e-3) – Small noise amplitude, which is added to metrics to improve numerical fit

  • ds (int, default=1) – Downsampling constant. Every “ds” point will be taken from the data time series.

  • topk1 (int, default=1) – true MI for stage 1 should be among topk1 MI shuffles

  • topk2 (int, default=5) – true MI for stage 2 should be among topk2 MI shuffles

  • multicomp_correction (str or None, default='holm') – type of multiple comparisons correction. Supported types are None (no correction), “bonferroni”, “holm”, and “fdr_bh”.

  • pval_thr (float, default=0.01) – pvalue threshold. if multicomp_correction=None, this is a p-value for a single pair. For FWER methods (bonferroni, holm), this is the family-wise error rate. For FDR methods (fdr_bh), this is the false discovery rate.

  • find_optimal_delays (bool, default=False) – Allows slight shifting (not more than +- shift_window) of time series, selects a shift with the highest MI as default.

  • skip_delays (list, default=[]) – List of indices from ts_bunch2 for which delays are not applied (set to 0). Has no effect if find_optimal_delays = False

  • shift_window (int, default=100) – Window for optimal shift search (frames). Optimal shift will lie in the range -shift_window <= opt_shift <= shift_window

  • verbose (bool, default=True) – whether to print intermediate information

  • seed (int, optional) – random seed for reproducibility

  • enable_parallelization (bool, default=True) – whether to use parallel processing for computations

  • n_jobs (int, default=-1) – number of parallel jobs to use. -1 means use all available processors

  • duplicate_behavior (str, default='ignore') – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior) FFT optimization provides ~100x speedup for Stage 2.

  • store_random_shifts (bool, default=False) – Whether to store the random shift indices used during shuffle computation. When False (default), random_shifts1 and random_shifts2 arrays are not stored in accumulated_info, saving significant memory (e.g., ~400MB for typical datasets). Set to True if you need the shift indices for debugging or reproducibility analysis.

  • profile (bool, default=False) – Whether to collect internal timing information. When True, accumulated_info will include a ‘timings’ dict with execution times (in seconds) for: - ‘stage1_delay_optimization’: delay optimization (if find_optimal_delays=True) - ‘stage1_pair_scanning’: stage 1 pair scanning - ‘stage2_pair_scanning’: stage 2 pair scanning (if applicable) - ‘total’: sum of all timing sections

Returns:

  • stats (dict of dict of dicts) – Outer dict keys: indices of tsbunch1 or names1, if given Inner dict keys: indices or tsbunch2 or names2, if given Last dict: dictionary of stats variables. Can be easily converted to pandas DataFrame by pd.DataFrame(stats)

  • significance (dict of dict of dicts) – Outer dict keys: indices of tsbunch1 or names1, if given Inner dict keys: indices or tsbunch2 or names2, if given Last dict: dictionary of significance-related variables. Can be easily converted to pandas DataFrame by pd.DataFrame(significance)

  • accumulated_info (dict) – Data collected during computation.

Raises:

ValueError – If mode is not ‘stage1’, ‘stage2’, or ‘two_stage’. If multicomp_correction is not None, ‘bonferroni’, ‘holm’, or ‘fdr_bh’. If pval_thr is not between 0 and 1. If duplicate_behavior is not ‘ignore’, ‘raise’, or ‘warn’. If duplicate TimeSeries found and duplicate_behavior=’raise’.

Notes

  • When comparing the same bunch (ts_bunch1 is ts_bunch2), the diagonal of masks is automatically set to 0 to avoid self-comparisons.

  • In ‘stage2’ mode, dummy stage1 structures are created with placeholder values to maintain consistency in the return format.

  • For stage2, the final mask combines stage1 results with precomputed_mask_stage2 using logical AND.

  • Input masks are never modified; copies are created when needed.

Low-level computation functions for INTENSE analysis.

Classes

class driada.intense.intense_base.IntenseResults[source]

Container for INTENSE computation results.

info

Metadata about the computation (optimal delays, thresholds, etc.).

Type:

dict

intense_params

Parameters used for the INTENSE computation.

Type:

dict

stats

Statistical results (p-values, metric values, etc.).

Type:

dict

significance

Significance test results for each neuron-feature pair.

Type:

dict

update(property_name, data)[source]

Add or update a property with data.

update_multiple(datadict)[source]

Update multiple properties from a dictionary.

Examples

>>> # Create results container and add analysis outputs
>>> results = IntenseResults()
>>> # Add statistical results
>>> results.update('stats', {'neuron1': {'feature1': {'me': 0.5, 'pval': 0.01}}})
>>> # Add computation metadata
>>> results.update('info', {'optimal_delays': [[0, 5], [10, 0]],
...                        'n_shuffles': 1000})
>>> # Access stored data
>>> results.stats['neuron1']['feature1']['me']
0.5
__init__()[source]

Initialize an empty IntenseResults container.

Creates an IntenseResults object with no initial data. Properties are added dynamically using the update() or update_multiple() methods.

Notes

The IntenseResults class serves as a flexible container for storing INTENSE computation outputs. It allows dynamic addition of properties to accommodate different analysis configurations and results.

Common properties added during INTENSE analysis: - ‘stats’: Statistical test results (p-values, metric values) - ‘significance’: Binary significance indicators - ‘info’: Computation metadata (delays, parameters used) - ‘intense_params’: Parameters used for the computation

See also

compute_cell_feat_significance

Main function that returns IntenseResults

update(property_name, data)[source]

Add or update a property with data.

Stores analysis results as attributes of the IntenseResults object, allowing flexible storage of various data types and structures.

Parameters:
  • property_name (str) – Name of the property to store. Will become an attribute of the object accessible via dot notation.

  • data (any) – Data to store. Can be any Python object: arrays, dictionaries, dataframes, custom objects, etc.

Examples

>>> # Store different types of analysis results
>>> import numpy as np
>>> results = IntenseResults()
>>> # Add mutual information matrix
>>> results.update('mi_matrix', np.array([[0, 0.5], [0.5, 0]]))
>>> # Add list of significant neuron-feature pairs
>>> results.update('significant_pairs', [(0, 1), (2, 3)])
>>> # Access via attribute notation
>>> results.mi_matrix
array([[0. , 0.5],
       [0.5, 0. ]])
>>> results.significant_pairs
[(0, 1), (2, 3)]

Notes

Property names should be valid Python identifiers. Existing properties will be overwritten without warning.

update_multiple(datadict)[source]

Update multiple properties from a dictionary.

Batch update of multiple properties at once, useful for storing related analysis results together.

Parameters:

datadict (dict) – Dictionary mapping property names to data values. Each key-value pair will be stored as an attribute.

Examples

>>> # Batch update multiple analysis results at once
>>> import numpy as np
>>> results = IntenseResults()
>>> # Add multiple related results together
>>> results.update_multiple({
...     'mi_values': np.array([0.1, 0.5, 0.3]),
...     'p_values': np.array([0.05, 0.001, 0.02]),
...     'significant': np.array([False, True, True]),
...     'parameters': {'metric': 'mi', 'correction': 'fdr'}
... })
>>> # All properties are now accessible
>>> results.mi_values
array([0.1, 0.5, 0.3])
>>> results.significant
array([False,  True,  True])

See also

update

Add single property

populate_stage1_pvals()[source]

Reconstruct and populate pre_pval in stats from saved Stage 1 shuffle data.

Fills in stats[neuron][feature][“pre_pval”] (currently None) with actual p-values computed from the saved me_total1 distributions. Modifies self.stats in-place.

Returns:

  • pre_pvals (np.ndarray, shape (n_neurons, n_features)) – The reconstructed p-value array.

  • mi_values (np.ndarray, shape (n_neurons, n_features)) – The MI values array.

Raises:
  • AttributeError – If required attributes (info, stats, intense_params) are missing.

  • KeyError – If me_total1 is not present in info.

validate_against_ground_truth(ground_truth, verbose=True)[source]

Compare INTENSE detections against known ground truth.

Validates the analysis results against a ground truth dictionary, typically generated by generate_tuned_selectivity_exp(). Computes sensitivity, precision, F1 score, and per-type detection rates.

Parameters:
  • ground_truth (dict) – Ground truth from generate_tuned_selectivity_exp(). Must contain: - “expected_pairs” : list of (neuron_idx, feature_name) tuples - “neuron_types” : dict mapping neuron_idx to group name (optional)

  • verbose (bool, optional) – Print detailed results. Default: True.

Returns:

metrics – Validation metrics containing: - “true_positives” : int - Number of correctly detected pairs - “false_positives” : int - Number of spurious detections - “false_negatives” : int - Number of missed pairs - “sensitivity” : float - TP / (TP + FN) - “precision” : float - TP / (TP + FP) - “f1” : float - Harmonic mean of sensitivity and precision - “type_stats” : dict - Per-neuron-type statistics - “tp_pairs” : set - True positive (neuron, feature) pairs - “fp_pairs” : set - False positive pairs - “fn_pairs” : set - False negative pairs

Return type:

dict

Notes

This method requires that the IntenseResults object has a ‘significance’ attribute populated with neuron-feature significance results.

Examples

>>> # After running INTENSE analysis
>>> results = IntenseResults()
>>> # ... populate with analysis results ...
>>> ground_truth = {"expected_pairs": [(0, "hd"), (1, "x")],
...                 "neuron_types": {0: "hd_cell", 1: "place_cell"}}
>>> # metrics = results.validate_against_ground_truth(ground_truth)

See also

generate_tuned_selectivity_exp

Generates experiments with ground truth

memory_usage()[source]

Return memory usage breakdown in bytes.

Analyzes the memory consumption of all stored data in the IntenseResults object, providing a detailed breakdown by attribute. Useful for diagnosing memory issues and verifying that memory optimizations (like store_random_shifts=False) are working as expected.

Returns:

usage – Dictionary mapping attribute names to their memory usage in bytes. Keys include: - “info.{key}”: Memory for each numpy array in the info dict - “info.{key}”: Memory for DataFrames in info (sum of all columns) - “stats”: Approximate memory for stats dict - “significance”: Approximate memory for significance dict

Return type:

dict

Notes

  • For numpy arrays, uses the .nbytes attribute for accurate measurement

  • For pandas DataFrames, uses memory_usage(deep=True) for accurate measurement

  • For other objects, uses sys.getsizeof() which may underestimate nested structures

  • The “random_shifts1” and “random_shifts2” arrays are the largest consumers when store_random_shifts=True

Examples

>>> from driada.intense import IntenseResults
>>> import numpy as np
>>> results = IntenseResults()
>>> results.update('info', {
...     'me_total1': np.zeros((10, 5, 101)),
...     'me_total2': np.zeros((10, 5, 10001))
... })
>>> usage = results.memory_usage()
>>> 'info.me_total1' in usage
True
>>> usage['info.me_total1']
40400

Function Groups

Mutual Information Computation
driada.intense.intense_base.get_calcium_feature_me_profile(exp, cell_id=None, feat_id=None, cbunch=None, fbunch=None, shift_window=2, ds=1, metric='mi', mi_estimator='gcmi', data_type='calcium')[source]

Compute metric profile between neurons and behavioral features across time shifts.

Parameters:
  • exp (Experiment) – Experiment object containing neurons and behavioral features.

  • cell_id (int, optional) – Index of a single neuron in exp.neurons. Deprecated - use cbunch instead.

  • feat_id (str or tuple of str, optional) – Single feature name(s) to analyze. Deprecated - use fbunch instead.

  • cbunch (int, iterable or None, optional) – Neuron indices. If None (default), all neurons will be analyzed. Takes precedence over cell_id if both provided.

  • fbunch (str, iterable or None, optional) – Feature names. If None (default), all single features will be analyzed. Takes precedence over feat_id if both provided.

  • shift_window (int, optional) – Maximum shift to test in each direction (seconds). Default: 2. Converted to frames internally using exp.fps.

  • ds (int, optional) – Downsampling factor. Default: 1 (no downsampling).

  • metric (str, optional) – Similarity metric to compute. Default: ‘mi’. - ‘mi’: Mutual information - ‘spearman’: Spearman correlation - Other metrics supported by get_sim function

  • mi_estimator (str, optional) – Mutual information estimator to use when metric=’mi’. Default: ‘gcmi’. Options: ‘gcmi’ or ‘ksg’

  • data_type (str, optional) – Type of neural data to use. Default: ‘calcium’. - ‘calcium’: Use calcium imaging data - ‘spikes’: Use spike data

Returns:

If single cell_id and feat_id provided (backward compatibility):

{‘me0’: float, ‘shifted_me’: list of float}

If cbunch or fbunch used:

Nested dictionary with structure: {cell_id: {feat_id: {‘me0’: float, ‘shifted_me’: list}}} where shifted_me contains metric values from -window to +window.

Return type:

dict

Notes

  • shift_window is in seconds, converted to frames using exp.fps

  • Total number of shifts tested: 2 * shift_window * fps / ds

  • Multi-feature analysis (tuple feat_id) only supported for metric=’mi’

  • Progress bar shows computation progress

Examples

This function requires an Experiment object, which contains neural recordings and behavioral features. Here’s a conceptual example:

>>> # Pseudo-code example (requires actual Experiment object):
>>> # exp = load_experiment()  # Load your experiment data
>>> #
>>> # # Analyze MI between neuron 0 and speed feature
>>> # me0, profile = get_calcium_feature_me_profile(exp, 0, 'speed',
>>> #                                              window=100, ds=5)
>>> #
>>> # # Or analyze multiple neurons and features at once
>>> # results = get_calcium_feature_me_profile(exp, cbunch=[0, 1],
>>> #                                          fbunch=['speed', 'direction'],
>>> #                                          window=50, ds=2)
>>> # # Access results: results[neuron_id][feature_name]['me0']
>>> pass  # Actual usage requires Experiment object
driada.intense.intense_base.scan_pairs(ts_bunch1, ts_bunch2, metric, nsh, optimal_delays, random_shifts=None, mi_estimator='gcmi', ds=1, mask=None, noise_const=0.001, seed=None, enable_progressbar=True, engine='auto', fft_cache=None, mi_estimator_kwargs=None)[source]

Calculate similarity metric and shuffled distributions for pairs of time series.

This function computes the similarity metric between all pairs from ts_bunch1 and ts_bunch2, along with shuffled distributions for significance testing.

Parameters:
  • ts_bunch1 (list of TimeSeries or MultiTimeSeries) – First set of time series (typically neural signals).

  • ts_bunch2 (list of TimeSeries or MultiTimeSeries) – Second set of time series (typically behavioral variables).

  • metric (str) – Similarity metric to compute. See validate_metric for supported options.

  • nsh (int) – Number of shuffles for significance testing.

  • optimal_delays (np.ndarray) – Optimal delays array of shape (len(ts_bunch1), len(ts_bunch2)). Contains best shifts in frames.

  • random_shifts (np.ndarray, optional) – Pre-generated random shifts of shape (len(ts_bunch1), len(ts_bunch2), nsh). If None, shifts will be generated using seed and stable keys.

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ (Gaussian copula) or ‘ksg’ (k-nearest neighbors).

  • ds (int, default=1) – Downsampling factor. Every ds-th point is used from the time series.

  • mask (np.ndarray, optional) – Binary mask array of shape (len(ts_bunch1), len(ts_bunch2)). 0 skips calculation, 1 proceeds.

  • noise_const (float, default=1e-3) – Small noise amplitude added to improve numerical stability.

  • seed (int, optional) – Random seed for reproducibility.

  • enable_progressbar (bool, default=True) – Whether to show progress bar during computation.

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior)

  • fft_cache (dict, optional) – Pre-computed FFT cache from _build_fft_cache. Keys are (key1, key2) tuples using stable identifiers from _get_ts_key(). If provided, avoids redundant data extraction.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[ndarray, ndarray]

Returns:

  • random_shifts (np.ndarray) – Array of shape (len(ts_bunch1), len(ts_bunch2), nsh) containing random shifts used for shuffled distribution computation.

  • me_total (np.ndarray) – Array of shape (len(ts_bunch1), len(ts_bunch2), nsh+1). Contains true metric values at index 0 and shuffled values at indices 1:nsh+1.

Notes

  • True metric values: me_total[:,:,0]

  • Shuffled values: me_total[:,:,1:]

  • Random shifts are drawn uniformly from time series length

  • Noise is added as: value * (1 + noise_const * U(-1,1))

  • FFT optimization provides ~100x speedup for univariate continuous GCMI

driada.intense.intense_base.scan_pairs_parallel(ts_bunch1, ts_bunch2, metric, nsh, optimal_delays, mi_estimator='gcmi', ds=1, mask=None, noise_const=0.001, seed=None, n_jobs=-1, engine='auto', fft_cache=None, mi_estimator_kwargs=None)[source]

Calculate metric values and shuffles for time series pairs using parallel processing.

Parameters:
  • ts_bunch1 (list of TimeSeries) – First set of time series.

  • ts_bunch2 (list of TimeSeries) – Second set of time series.

  • metric (str) – Similarity metric to compute: - ‘mi’: Mutual information - ‘spearman’: Spearman correlation - Other metrics supported by get_sim function

  • nsh (int) – Number of shuffles to perform.

  • optimal_delays (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2))) – Pre-computed optimal delays for each pair.

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ (Gaussian copula) or ‘ksg’ (k-nearest neighbors).

  • ds (int, default=1) – Downsampling factor.

  • mask (np.ndarray, optional) – Binary mask of shape (len(ts_bunch1), len(ts_bunch2)). 0 = skip computation, 1 = compute. Default: all ones.

  • noise_const (float, default=1e-3) – Small noise added to improve numerical stability.

  • seed (int, optional) – Random seed for reproducibility.

  • n_jobs (int, default=-1) – Number of parallel jobs. -1 uses all cores.

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior)

  • fft_cache (dict, optional) – Pre-computed FFT cache mapping (key1, key2) tuples to FFTCacheEntry objects. Keys are stable identifiers from _get_ts_key(). If provided, avoids redundant data extraction. If None, FFT type is computed fresh for each pair.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[ndarray, ndarray]

Returns:

  • random_shifts (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh)) – Random shifts used for shuffling.

  • me_total (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh+1)) – Metric values. [:,:,0] contains true values, [:,:,1:] contains shuffles.

Raises:

ValueError – If input validation fails or parameters are invalid.

Notes

  • Parallelization is done by splitting ts_bunch1 across workers

  • Each worker handles a subset of ts_bunch1 against all of ts_bunch2

  • Uses threading backend if PyTorch present (checked lazily), else loky

  • Random seeding ensures reproducibility across different mask configurations

  • FFT optimization provides ~100x speedup for univariate continuous GCMI

See also

scan_pairs

Sequential version of this function

scan_pairs_router

Wrapper that chooses between parallel and sequential

Examples

>>> # Minimal example with 2x2 pairs
>>> import numpy as np
>>> from driada.information.info_base import TimeSeries
>>> np.random.seed(42)  # For reproducibility
>>> # Small data: 2 neurons, 2 behaviors, 50 timepoints
>>> neurons = [TimeSeries(np.random.randn(50), discrete=False) for _ in range(2)]
>>> behaviors = [TimeSeries(np.random.randn(50), discrete=False) for _ in range(2)]
>>> delays = np.zeros((2, 2), dtype=int)  # No delays
>>> # Just 5 shuffles for demonstration
>>> shifts, metrics = scan_pairs_parallel(neurons, behaviors, 'mi',
...                                      5, delays, n_jobs=1, seed=42)
>>> shifts.shape
(2, 2, 5)
>>> metrics.shape  # Original + 5 shuffles = 6 total
(2, 2, 6)
driada.intense.intense_base.scan_pairs_router(ts_bunch1, ts_bunch2, metric, nsh, optimal_delays, mi_estimator='gcmi', ds=1, mask=None, noise_const=0.001, seed=None, enable_parallelization=True, n_jobs=-1, engine='auto', fft_cache=None, mi_estimator_kwargs=None)[source]

Route metric computation to parallel or sequential implementation.

Parameters:
  • ts_bunch1 (list of TimeSeries) – First set of time series.

  • ts_bunch2 (list of TimeSeries) – Second set of time series.

  • metric (str) – Similarity metric to compute: - ‘mi’: Mutual information - ‘spearman’: Spearman correlation - Other metrics supported by get_sim function

  • nsh (int) – Number of shuffles to perform.

  • optimal_delays (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2))) – Pre-computed optimal delays for each pair.

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ (Gaussian copula) or ‘ksg’ (k-nearest neighbors).

  • ds (int, default=1) – Downsampling factor.

  • mask (np.ndarray, optional) – Binary mask of shape (len(ts_bunch1), len(ts_bunch2)). 0 = skip computation, 1 = compute. Default: all ones.

  • noise_const (float, default=1e-3) – Small noise added to improve numerical stability.

  • seed (int, optional) – Random seed for reproducibility.

  • enable_parallelization (bool, default=True) – Whether to use parallel processing.

  • n_jobs (int, default=-1) – Number of parallel jobs if parallelization enabled. -1 uses all cores.

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior)

  • fft_cache (dict, optional) – Pre-computed FFT cache mapping (global_i, j) tuples to FFTCacheEntry objects. If provided, avoids redundant data extraction. If None, FFT type is computed fresh for each pair. Use _build_fft_cache() to create this cache.

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

Return type:

tuple[ndarray, ndarray]

Returns:

  • random_shifts (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh)) – Random shifts used for shuffling.

  • me_total (np.ndarray of shape (len(ts_bunch1), len(ts_bunch2), nsh+1)) – Metric values. [:,:,0] contains true values, [:,:,1:] contains shuffles.

Notes

This function automatically chooses between sequential and parallel implementations based on the enable_parallelization flag. It’s the recommended entry point for scan_pairs functionality.

FFT optimization provides ~100x speedup for univariate continuous GCMI.

See also

scan_pairs

Sequential implementation

scan_pairs_parallel

Parallel implementation

Examples

>>> # Router example - chooses sequential or parallel execution
>>> import numpy as np
>>> from driada.information.info_base import TimeSeries
>>> np.random.seed(42)
>>> # Minimal data for fast execution
>>> neurons = [TimeSeries(np.random.randn(30), discrete=False) for _ in range(2)]
>>> behaviors = [TimeSeries(np.random.randn(30), discrete=False) for _ in range(2)]
>>> delays = np.zeros((2, 2), dtype=int)  # No delays
>>> # Use sequential mode (enable_parallelization=False)
>>> shifts, metrics = scan_pairs_router(neurons, behaviors, 'mi',
...                                    3, delays, enable_parallelization=False, seed=42)
>>> metrics.shape  # 1 original + 3 shuffles = 4 total
(2, 2, 4)
>>> # First slice contains actual MI values
>>> metrics[:, :, 0].shape
(2, 2)
Statistical Analysis
driada.intense.intense_base.compute_me_stats(ts_bunch1, ts_bunch2, names1=None, names2=None, mode='two_stage', metric='mi', mi_estimator='gcmi', mi_estimator_kwargs=None, precomputed_mask_stage1=None, precomputed_mask_stage2=None, n_shuffles_stage1=100, n_shuffles_stage2=10000, metric_distr_type='gamma_zi', noise_ampl=0.001, ds=1, topk1=1, topk2=5, multicomp_correction='holm', pval_thr=0.01, find_optimal_delays=False, skip_delays=[], shift_window=100, verbose=True, seed=None, enable_parallelization=True, n_jobs=-1, duplicate_behavior='ignore', engine='auto', store_random_shifts=False, profile=False)[source]

Calculates similarity metric statistics for TimeSeries or MultiTimeSeries pairs

Parameters:
  • ts_bunch1 (list of TimeSeries objects) – First set of time series

  • ts_bunch2 (list of TimeSeries objects) – Second set of time series

  • names1 (list of str, optional) – names than will be given to time series from tsbunch1 in final results

  • names2 (list of str, optional) – names than will be given to time series from tsbunch2 in final results

  • mode (str, default='two_stage') –

    Computation mode. Options:

    • 'stage1': preliminary scanning with n_shuffles_stage1 shuffles only. Rejects strictly non-significant pairs, does not give definite results about significance of the others.

    • 'stage2': skip stage 1, perform full-scale scanning (n_shuffles_stage2 shuffles) of all pairs. Gives definite results but can be very time-consuming. Also reduces statistical power of multiple comparison tests since the number of hypotheses is very high.

    • 'two_stage': prune non-significant pairs during stage 1 then perform thorough testing for the rest during stage 2. Recommended.

  • metric (str, default='mi') – similarity metric between TimeSeries

  • mi_estimator (str, default='gcmi') – Mutual information estimator to use when metric=’mi’. Options: ‘gcmi’ or ‘ksg’

  • mi_estimator_kwargs (dict, optional) – Additional keyword arguments passed to the MI estimator function.

  • precomputed_mask_stage1 (np.array, optional) – precomputed mask for skipping some of possible pairs in stage 1. Shape: (len(ts_bunch1), len(ts_bunch2)) 0 in mask values means calculation will be skipped. 1 in mask values means calculation will proceed.

  • precomputed_mask_stage2 (np.array, optional) – precomputed mask for skipping some of possible pairs in stage 2. Shape: (len(ts_bunch1), len(ts_bunch2)) 0 in mask values means calculation will be skipped. 1 in mask values means calculation will proceed.

  • n_shuffles_stage1 (int, default=100) – number of shuffles for first stage

  • n_shuffles_stage2 (int, default=10000) – number of shuffles for second stage

  • metric_distr_type (str, default="gamma_zi") –

    Distribution type for shuffled metric null distribution. Options:

    • ’gamma_zi’ (default): Zero-inflated gamma distribution. Explicitly models the probability mass at zero that commonly occurs in MI null distributions. Provides superior goodness-of-fit and accurate parameter estimation without requiring artificial noise.

    • ’gamma’: Standard gamma distribution with small noise added (noise_ampl) to handle zeros. Provided for backward compatibility. Less statistically principled than ‘gamma_zi’.

    • Other scipy.stats distributions: ‘lognorm’, ‘norm’, etc. are supported but not recommended for MI distributions.

  • noise_ampl (float, default=1e-3) – Small noise amplitude, which is added to metrics to improve numerical fit

  • ds (int, default=1) – Downsampling constant. Every “ds” point will be taken from the data time series.

  • topk1 (int, default=1) – true MI for stage 1 should be among topk1 MI shuffles

  • topk2 (int, default=5) – true MI for stage 2 should be among topk2 MI shuffles

  • multicomp_correction (str or None, default='holm') – type of multiple comparisons correction. Supported types are None (no correction), “bonferroni”, “holm”, and “fdr_bh”.

  • pval_thr (float, default=0.01) – pvalue threshold. if multicomp_correction=None, this is a p-value for a single pair. For FWER methods (bonferroni, holm), this is the family-wise error rate. For FDR methods (fdr_bh), this is the false discovery rate.

  • find_optimal_delays (bool, default=False) – Allows slight shifting (not more than +- shift_window) of time series, selects a shift with the highest MI as default.

  • skip_delays (list, default=[]) – List of indices from ts_bunch2 for which delays are not applied (set to 0). Has no effect if find_optimal_delays = False

  • shift_window (int, default=100) – Window for optimal shift search (frames). Optimal shift will lie in the range -shift_window <= opt_shift <= shift_window

  • verbose (bool, default=True) – whether to print intermediate information

  • seed (int, optional) – random seed for reproducibility

  • enable_parallelization (bool, default=True) – whether to use parallel processing for computations

  • n_jobs (int, default=-1) – number of parallel jobs to use. -1 means use all available processors

  • duplicate_behavior (str, default='ignore') – How to handle duplicate TimeSeries in ts_bunch1 or ts_bunch2. - ‘ignore’: Process duplicates normally (default) - ‘raise’: Raise an error if duplicates are found - ‘warn’: Print a warning but continue processing

  • engine ({'auto', 'fft', 'loop'}, default='auto') – Computation engine for MI shuffles: - ‘auto’: Use FFT when applicable (univariate continuous GCMI with nsh >= 50) - ‘fft’: Force FFT (raises error if not applicable) - ‘loop’: Force per-shift loop (original behavior) FFT optimization provides ~100x speedup for Stage 2.

  • store_random_shifts (bool, default=False) – Whether to store the random shift indices used during shuffle computation. When False (default), random_shifts1 and random_shifts2 arrays are not stored in accumulated_info, saving significant memory (e.g., ~400MB for typical datasets). Set to True if you need the shift indices for debugging or reproducibility analysis.

  • profile (bool, default=False) – Whether to collect internal timing information. When True, accumulated_info will include a ‘timings’ dict with execution times (in seconds) for: - ‘stage1_delay_optimization’: delay optimization (if find_optimal_delays=True) - ‘stage1_pair_scanning’: stage 1 pair scanning - ‘stage2_pair_scanning’: stage 2 pair scanning (if applicable) - ‘total’: sum of all timing sections

Returns:

  • stats (dict of dict of dicts) – Outer dict keys: indices of tsbunch1 or names1, if given Inner dict keys: indices or tsbunch2 or names2, if given Last dict: dictionary of stats variables. Can be easily converted to pandas DataFrame by pd.DataFrame(stats)

  • significance (dict of dict of dicts) – Outer dict keys: indices of tsbunch1 or names1, if given Inner dict keys: indices or tsbunch2 or names2, if given Last dict: dictionary of significance-related variables. Can be easily converted to pandas DataFrame by pd.DataFrame(significance)

  • accumulated_info (dict) – Data collected during computation.

Raises:

ValueError – If mode is not ‘stage1’, ‘stage2’, or ‘two_stage’. If multicomp_correction is not None, ‘bonferroni’, ‘holm’, or ‘fdr_bh’. If pval_thr is not between 0 and 1. If duplicate_behavior is not ‘ignore’, ‘raise’, or ‘warn’. If duplicate TimeSeries found and duplicate_behavior=’raise’.

Notes

  • When comparing the same bunch (ts_bunch1 is ts_bunch2), the diagonal of masks is automatically set to 0 to avoid self-comparisons.

  • In ‘stage2’ mode, dummy stage1 structures are created with placeholder values to maintain consistency in the return format.

  • For stage2, the final mask combines stage1 results with precomputed_mask_stage2 using logical AND.

  • Input masks are never modified; copies are created when needed.

I/O
driada.intense.io.save_results(results, fname, compressed=False)[source]

Save IntenseResults to NPZ file.

Arrays stored as NPZ arrays, dicts embedded as JSON strings. Uncompressed is fastest; compressed saves ~10x disk space.

Parameters:
  • results (IntenseResults) – Results object from INTENSE analysis.

  • fname (str or Path) – Output file path (should end with .npz).

  • compressed (bool, default False) – If True, use zlib compression (slower but 10x smaller files).

Examples

>>> from driada.intense.io import IntenseResults, save_results, load_results
>>> results = IntenseResults()
>>> results.update('stats', {'cell1': {'feat1': {'me': 0.5}}})
>>> save_results(results, 'test.npz')
>>> loaded = load_results('test.npz')
>>> loaded.stats['cell1']['feat1']['me']
0.5
driada.intense.io.load_results(fname)[source]

Load IntenseResults from NPZ file.

Parameters:

fname (str or Path) – Path to the NPZ file.

Returns:

Reconstructed results object.

Return type:

IntenseResults

Examples

>>> from driada.intense.io import IntenseResults, save_results, load_results
>>> results = IntenseResults()
>>> results.update('stats', {'cell1': {'feat1': {'me': 0.5}}})
>>> save_results(results, 'test.npz')
>>> loaded = load_results('test.npz')
>>> loaded.stats['cell1']['feat1']['me']
0.5