INTENSE Statistics

driada.intense.stats.chebyshev_ineq(data, val)[source]

Calculate upper bound on tail probability using Chebyshev’s inequality.

Parameters:
  • data (array-like) – Sample data to estimate mean and std from. Must have non-zero variance.

  • val (float) – Value to compute tail probability for.

Returns:

p_bound – Upper bound on P(X >= val) based on Chebyshev’s inequality.

Return type:

float

Raises:

ValueError – If data has zero variance (all values are identical).

Notes

Chebyshev’s inequality states that P(abs(X - μ) >= k*σ) <= 1/k² This gives P(X >= val) <= 1/z² where z = (val - μ)/σ

driada.intense.stats.get_lognormal_p(data, val)[source]

Calculate p-value assuming log-normal distribution.

Parameters:
  • data (array-like) – Sample data to fit log-normal distribution. Must contain positive values.

  • val (float) – Observed value to compute p-value for.

Returns:

p_value – P(X >= val) under fitted log-normal distribution.

Return type:

float

Raises:

ValueError – If data contains non-positive values.

Notes

Fits log-normal distribution with floc=0 (zero lower bound). Log-normal distribution is suitable for positive-valued data.

driada.intense.stats.get_gamma_p(data, val)[source]

Calculate p-value assuming gamma distribution.

Parameters:
  • data (array-like) – Sample data to fit gamma distribution. Must contain positive values.

  • val (float) – Observed value to compute p-value for.

Returns:

p_value – P(X >= val) under fitted gamma distribution.

Return type:

float

Raises:

ValueError – If data contains non-positive values.

Notes

Fits gamma distribution with floc=0 (zero lower bound). Gamma distribution is suitable for positive-valued data.

driada.intense.stats.get_gamma_zi_p(data, val, zero_threshold=1e-10)[source]

Calculate p-value using zero-inflated gamma (ZIG) distribution.

Parameters:
  • data (array-like) – Sample data to fit ZIG distribution (typically shuffled MI values). Can contain zeros and positive values.

  • val (float) – Observed value to compute p-value for.

  • zero_threshold (float, optional) – Values below this threshold are considered zeros. Default: 1e-10.

Returns:

p_value – P(X >= val) under fitted ZIG distribution.

Return type:

float

Notes

Zero-Inflated Gamma model:

  • P(X = 0) = pi (zero inflation parameter)

  • P(X > 0) = (1-pi) * Gamma(shape, scale)

P-value calculation:

  • If val <= zero_threshold: p_value = 1.0 (zeros never significant)

  • If val > zero_threshold: p_value = (1-pi) * Gamma.sf(val) (zero mass doesn’t contribute to tail probability for positive values)

Edge cases:

  • All zeros (pi=1): Returns 1.0 for any val

  • No zeros (pi=0): Equivalent to pure gamma distribution

  • Gamma fit fails: Returns conservative 1.0

driada.intense.stats.get_distribution_function(dist_name)[source]

Get distribution function from scipy.stats by name.

Parameters:

dist_name (str) – Name of distribution (e.g., ‘gamma’, ‘lognorm’, ‘norm’).

Returns:

dist – Distribution function object.

Return type:

scipy.stats distribution

Raises:

ValueError – If distribution name not found in scipy.stats.

driada.intense.stats.get_mi_distr_pvalue(data, val, distr_type='gamma')[source]

Calculate p-value by fitting a distribution to data.

Parameters:
  • data (array-like) – Sample data (typically shuffled metric values).

  • val (float) – Observed value to compute p-value for.

  • distr_type (str, optional) – Distribution type to fit. Options: - ‘gamma’: Gamma distribution (requires positive values) - ‘gamma_zi’: Zero-inflated gamma (handles zeros explicitly) - ‘lognorm’: Log-normal distribution - Any scipy.stats distribution name Default: ‘gamma’.

Returns:

p_value – P(X >= val) under fitted distribution. Returns 1.0 if distribution fitting fails.

Return type:

float

Notes

  • For ‘gamma_zi’, uses zero-inflated gamma model (handles zeros)

  • For ‘gamma’ and ‘lognorm’, fits with floc=0 (zero lower bound)

  • For other distributions, uses default fitting

  • Returns conservative p-value (1.0) on fitting errors

driada.intense.stats.reconstruct_stage1_pvals(me_total1, metric_distr_type='gamma')[source]

Reconstruct Stage 1 p-values from saved shuffle distributions.

Stage 1 skips p-value computation for performance (pre_pval=None). This function reconstructs them post-hoc from the saved me_total1 array using the same distribution fitting as Stage 2.

Parameters:
  • me_total1 (np.ndarray, shape (n1, n2, nsh1+1)) – Stage 1 MI array. [:,:,0] = true MI, [:,:,1:] = shuffles.

  • metric_distr_type (str, optional) – Distribution type for p-value fitting. Default: ‘gamma’.

Returns:

  • pre_pvals (np.ndarray, shape (n1, n2)) – Reconstructed p-values. NaN where MI data is missing/zero.

  • mi_values (np.ndarray, shape (n1, n2)) – True MI values (me_total1[:,:,0]).

driada.intense.stats.get_mask(ptable, rtable, pval_thr, rank_thr)[source]

Create binary mask based on p-value and rank thresholds.

Parameters:
  • ptable (np.ndarray) – Array of p-values.

  • rtable (np.ndarray) – Array of ranks (0 to 1).

  • pval_thr (float) – P-value threshold. Values <= pval_thr pass.

  • rank_thr (float) – Rank threshold. Values >= rank_thr pass.

Returns:

mask – Binary mask: 1 where both conditions satisfied (p <= pval_thr AND rank >= rank_thr), 0 otherwise.

Return type:

np.ndarray

driada.intense.stats.stats_not_empty(pair_stats, current_data_hash, stage=1)[source]

Check if statistics are valid and complete for given stage.

Parameters:
  • pair_stats (dict) – Dictionary of computed statistics.

  • current_data_hash (str) – Hash of current data to validate against.

  • stage (int, optional) – Stage to check (1 or 2). Default: 1.

Returns:

is_valid – True if stats are valid and complete, False otherwise.

Return type:

bool

Raises:

ValueError – If stage is not 1 or 2.

driada.intense.stats.criterion1(pair_stats, nsh1, topk=1)[source]

Check if pair passes stage 1 significance criterion.

Parameters:
  • pair_stats (dict) – Dictionary containing ‘pre_rval’ from stage 1 analysis.

  • nsh1 (int) – Number of shuffles for first stage.

  • topk (int, optional) – True MI should rank in top k among shuffles. Default: 1.

Returns:

crit_passed – True if pair’s rank exceeds threshold (1 - topk/(nsh1+1)).

Return type:

bool

Notes

The criterion checks if: pre_rval > (1 - topk/(nsh1+1)) For topk=1 and nsh1=100, this requires pre_rval > 0.99

driada.intense.stats.criterion2(pair_stats, nsh2, pval_thr, topk=5)[source]

Check if pair passes stage 2 significance criterion.

Parameters:
  • pair_stats (dict) – Dictionary containing ‘rval’ and ‘pval’ from stage 2 analysis.

  • nsh2 (int) – Number of shuffles for second stage.

  • pval_thr (float) – P-value threshold after multiple hypothesis correction.

  • topk (int, optional) – True MI should rank in top k among shuffles. Default: 5.

Returns:

crit_passed – True if both conditions met: 1) rval > (1 - topk/(nsh2+1)) 2) pval < pval_thr

Return type:

bool

Notes

Both rank and p-value criteria must be satisfied. Missing ‘rval’ or ‘pval’ results in False.

driada.intense.stats.apply_stage_criterion(stage_stats, stage_num, n1, n2, n_shuffles, topk, multicorr_thr=None)[source]

Apply stage-appropriate significance criterion to all pairs.

Thin wrapper that delegates to criterion1 (Stage 1) or criterion2 (Stage 2) based on stage_num. Enables unified scan_stage() function.

Parameters:
  • stage_stats (dict) – Nested dictionary with statistics from get_table_of_stats.

  • stage_num (int) – Stage number (1 or 2).

  • n1 (int) – Number of items in first dimension (neurons).

  • n2 (int) – Number of items in second dimension (features).

  • n_shuffles (int) – Number of shuffles used in this stage.

  • topk (int) – True MI should rank in top k among shuffles.

  • multicorr_thr (float, optional) – Multiple comparison corrected p-value threshold. Required for stage_num=2, ignored for stage_num=1.

Return type:

tuple

Returns:

  • significance (dict) – Nested dict with boolean significance for each pair. Keys are stage-specific: ‘stage1’ or ‘stage2’.

  • pass_mask (np.ndarray) – Binary mask of shape (n1, n2) with 1 for pairs that passed.

driada.intense.stats.get_all_nonempty_pvals(all_stats, ids1, ids2)[source]

Extract all non-empty p-values from nested statistics dictionary.

Parameters:
  • all_stats (dict of dict) – Nested dictionary with statistics.

  • ids1 (list) – First dimension indices.

  • ids2 (list) – Second dimension indices.

Returns:

all_pvals – List of all non-None p-values found.

Return type:

list

driada.intense.stats.get_table_of_stats(metable, optimal_delays, precomputed_mask=None, metric_distr_type='gamma_zi', nsh=0, stage=1)[source]

Convert metric table to statistics dictionary.

Parameters:
  • metable (np.ndarray of shape (n1, n2, nsh+1)) – Metric values where [:,:,0] is true values, [:,:,1:] are shuffles.

  • optimal_delays (np.ndarray of shape (n1, n2)) – Optimal delays for each pair.

  • precomputed_mask (np.ndarray, optional) – Binary mask: 1 = compute stats, 0 = skip. Default: all ones.

  • metric_distr_type (str, optional) – Distribution for p-value calculation. Default: ‘gamma_zi’.

  • nsh (int, optional) – Number of shuffles. Default: 0.

  • stage (int, optional) – Stage (1 or 2) determines which stats to compute. Default: 1.

Returns:

stage_stats – Nested dictionary with computed statistics for each pair.

Return type:

dict of dict

driada.intense.stats.merge_stage_stats(stage1_stats, stage2_stats)[source]

Merge statistics from stage 1 and stage 2.

Parameters:
  • stage1_stats (dict of dict) – Statistics from stage 1 (preliminary).

  • stage2_stats (dict of dict) – Statistics from stage 2 (full).

Returns:

merged_stats – Combined statistics with both stage 1 and 2 results.

Return type:

dict of dict

driada.intense.stats.merge_stage_significance(stage_1_significance, stage_2_significance)[source]

Merge significance results from stage 1 and stage 2.

Parameters:
  • stage_1_significance (dict of dict) – Significance results from stage 1.

  • stage_2_significance (dict of dict) – Significance results from stage 2.

Returns:

merged_significance – Combined significance results.

Return type:

dict of dict

Statistical tools for INTENSE analysis including distribution fitting and p-value computation.

Function Groups

Distribution Functions
driada.intense.stats.get_lognormal_p(data, val)[source]

Calculate p-value assuming log-normal distribution.

Parameters:
  • data (array-like) – Sample data to fit log-normal distribution. Must contain positive values.

  • val (float) – Observed value to compute p-value for.

Returns:

p_value – P(X >= val) under fitted log-normal distribution.

Return type:

float

Raises:

ValueError – If data contains non-positive values.

Notes

Fits log-normal distribution with floc=0 (zero lower bound). Log-normal distribution is suitable for positive-valued data.

driada.intense.stats.get_gamma_p(data, val)[source]

Calculate p-value assuming gamma distribution.

Parameters:
  • data (array-like) – Sample data to fit gamma distribution. Must contain positive values.

  • val (float) – Observed value to compute p-value for.

Returns:

p_value – P(X >= val) under fitted gamma distribution.

Return type:

float

Raises:

ValueError – If data contains non-positive values.

Notes

Fits gamma distribution with floc=0 (zero lower bound). Gamma distribution is suitable for positive-valued data.

driada.intense.stats.get_distribution_function(dist_name)[source]

Get distribution function from scipy.stats by name.

Parameters:

dist_name (str) – Name of distribution (e.g., ‘gamma’, ‘lognorm’, ‘norm’).

Returns:

dist – Distribution function object.

Return type:

scipy.stats distribution

Raises:

ValueError – If distribution name not found in scipy.stats.

driada.intense.stats.chebyshev_ineq(data, val)[source]

Calculate upper bound on tail probability using Chebyshev’s inequality.

Parameters:
  • data (array-like) – Sample data to estimate mean and std from. Must have non-zero variance.

  • val (float) – Value to compute tail probability for.

Returns:

p_bound – Upper bound on P(X >= val) based on Chebyshev’s inequality.

Return type:

float

Raises:

ValueError – If data has zero variance (all values are identical).

Notes

Chebyshev’s inequality states that P(abs(X - μ) >= k*σ) <= 1/k² This gives P(X >= val) <= 1/z² where z = (val - μ)/σ

P-value Computation
driada.intense.stats.get_mi_distr_pvalue(data, val, distr_type='gamma')[source]

Calculate p-value by fitting a distribution to data.

Parameters:
  • data (array-like) – Sample data (typically shuffled metric values).

  • val (float) – Observed value to compute p-value for.

  • distr_type (str, optional) – Distribution type to fit. Options: - ‘gamma’: Gamma distribution (requires positive values) - ‘gamma_zi’: Zero-inflated gamma (handles zeros explicitly) - ‘lognorm’: Log-normal distribution - Any scipy.stats distribution name Default: ‘gamma’.

Returns:

p_value – P(X >= val) under fitted distribution. Returns 1.0 if distribution fitting fails.

Return type:

float

Notes

  • For ‘gamma_zi’, uses zero-inflated gamma model (handles zeros)

  • For ‘gamma’ and ‘lognorm’, fits with floc=0 (zero lower bound)

  • For other distributions, uses default fitting

  • Returns conservative p-value (1.0) on fitting errors

driada.intense.stats.get_all_nonempty_pvals(all_stats, ids1, ids2)[source]

Extract all non-empty p-values from nested statistics dictionary.

Parameters:
  • all_stats (dict of dict) – Nested dictionary with statistics.

  • ids1 (list) – First dimension indices.

  • ids2 (list) – Second dimension indices.

Returns:

all_pvals – List of all non-None p-values found.

Return type:

list

Data Filtering and Validation
driada.intense.stats.get_mask(ptable, rtable, pval_thr, rank_thr)[source]

Create binary mask based on p-value and rank thresholds.

Parameters:
  • ptable (np.ndarray) – Array of p-values.

  • rtable (np.ndarray) – Array of ranks (0 to 1).

  • pval_thr (float) – P-value threshold. Values <= pval_thr pass.

  • rank_thr (float) – Rank threshold. Values >= rank_thr pass.

Returns:

mask – Binary mask: 1 where both conditions satisfied (p <= pval_thr AND rank >= rank_thr), 0 otherwise.

Return type:

np.ndarray

driada.intense.stats.stats_not_empty(pair_stats, current_data_hash, stage=1)[source]

Check if statistics are valid and complete for given stage.

Parameters:
  • pair_stats (dict) – Dictionary of computed statistics.

  • current_data_hash (str) – Hash of current data to validate against.

  • stage (int, optional) – Stage to check (1 or 2). Default: 1.

Returns:

is_valid – True if stats are valid and complete, False otherwise.

Return type:

bool

Raises:

ValueError – If stage is not 1 or 2.

driada.intense.stats.criterion1(pair_stats, nsh1, topk=1)[source]

Check if pair passes stage 1 significance criterion.

Parameters:
  • pair_stats (dict) – Dictionary containing ‘pre_rval’ from stage 1 analysis.

  • nsh1 (int) – Number of shuffles for first stage.

  • topk (int, optional) – True MI should rank in top k among shuffles. Default: 1.

Returns:

crit_passed – True if pair’s rank exceeds threshold (1 - topk/(nsh1+1)).

Return type:

bool

Notes

The criterion checks if: pre_rval > (1 - topk/(nsh1+1)) For topk=1 and nsh1=100, this requires pre_rval > 0.99

driada.intense.stats.criterion2(pair_stats, nsh2, pval_thr, topk=5)[source]

Check if pair passes stage 2 significance criterion.

Parameters:
  • pair_stats (dict) – Dictionary containing ‘rval’ and ‘pval’ from stage 2 analysis.

  • nsh2 (int) – Number of shuffles for second stage.

  • pval_thr (float) – P-value threshold after multiple hypothesis correction.

  • topk (int, optional) – True MI should rank in top k among shuffles. Default: 5.

Returns:

crit_passed – True if both conditions met: 1) rval > (1 - topk/(nsh2+1)) 2) pval < pval_thr

Return type:

bool

Notes

Both rank and p-value criteria must be satisfied. Missing ‘rval’ or ‘pval’ results in False.

Result Aggregation
driada.intense.stats.get_table_of_stats(metable, optimal_delays, precomputed_mask=None, metric_distr_type='gamma_zi', nsh=0, stage=1)[source]

Convert metric table to statistics dictionary.

Parameters:
  • metable (np.ndarray of shape (n1, n2, nsh+1)) – Metric values where [:,:,0] is true values, [:,:,1:] are shuffles.

  • optimal_delays (np.ndarray of shape (n1, n2)) – Optimal delays for each pair.

  • precomputed_mask (np.ndarray, optional) – Binary mask: 1 = compute stats, 0 = skip. Default: all ones.

  • metric_distr_type (str, optional) – Distribution for p-value calculation. Default: ‘gamma_zi’.

  • nsh (int, optional) – Number of shuffles. Default: 0.

  • stage (int, optional) – Stage (1 or 2) determines which stats to compute. Default: 1.

Returns:

stage_stats – Nested dictionary with computed statistics for each pair.

Return type:

dict of dict

driada.intense.stats.merge_stage_stats(stage1_stats, stage2_stats)[source]

Merge statistics from stage 1 and stage 2.

Parameters:
  • stage1_stats (dict of dict) – Statistics from stage 1 (preliminary).

  • stage2_stats (dict of dict) – Statistics from stage 2 (full).

Returns:

merged_stats – Combined statistics with both stage 1 and 2 results.

Return type:

dict of dict

driada.intense.stats.merge_stage_significance(stage_1_significance, stage_2_significance)[source]

Merge significance results from stage 1 and stage 2.

Parameters:
  • stage_1_significance (dict of dict) – Significance results from stage 1.

  • stage_2_significance (dict of dict) – Significance results from stage 2.

Returns:

merged_significance – Combined significance results.

Return type:

dict of dict