INTENSE Statistics
- driada.intense.stats.chebyshev_ineq(data, val)[source]
Calculate upper bound on tail probability using Chebyshev’s inequality.
- Parameters:
data (array-like) – Sample data to estimate mean and std from. Must have non-zero variance.
val (float) – Value to compute tail probability for.
- Returns:
p_bound – Upper bound on P(X >= val) based on Chebyshev’s inequality.
- Return type:
- Raises:
ValueError – If data has zero variance (all values are identical).
Notes
Chebyshev’s inequality states that
P(abs(X - μ) >= k*σ) <= 1/k²This givesP(X >= val) <= 1/z²wherez = (val - μ)/σ
- driada.intense.stats.get_lognormal_p(data, val)[source]
Calculate p-value assuming log-normal distribution.
- Parameters:
data (array-like) – Sample data to fit log-normal distribution. Must contain positive values.
val (float) – Observed value to compute p-value for.
- Returns:
p_value – P(X >= val) under fitted log-normal distribution.
- Return type:
- Raises:
ValueError – If data contains non-positive values.
Notes
Fits log-normal distribution with floc=0 (zero lower bound). Log-normal distribution is suitable for positive-valued data.
- driada.intense.stats.get_gamma_p(data, val)[source]
Calculate p-value assuming gamma distribution.
- Parameters:
data (array-like) – Sample data to fit gamma distribution. Must contain positive values.
val (float) – Observed value to compute p-value for.
- Returns:
p_value – P(X >= val) under fitted gamma distribution.
- Return type:
- Raises:
ValueError – If data contains non-positive values.
Notes
Fits gamma distribution with floc=0 (zero lower bound). Gamma distribution is suitable for positive-valued data.
- driada.intense.stats.get_gamma_zi_p(data, val, zero_threshold=1e-10)[source]
Calculate p-value using zero-inflated gamma (ZIG) distribution.
- Parameters:
- Returns:
p_value – P(X >= val) under fitted ZIG distribution.
- Return type:
Notes
Zero-Inflated Gamma model:
P(X = 0) = pi (zero inflation parameter)
P(X > 0) = (1-pi) * Gamma(shape, scale)
P-value calculation:
If val <= zero_threshold: p_value = 1.0 (zeros never significant)
If val > zero_threshold: p_value = (1-pi) * Gamma.sf(val) (zero mass doesn’t contribute to tail probability for positive values)
Edge cases:
All zeros (pi=1): Returns 1.0 for any val
No zeros (pi=0): Equivalent to pure gamma distribution
Gamma fit fails: Returns conservative 1.0
- driada.intense.stats.get_distribution_function(dist_name)[source]
Get distribution function from scipy.stats by name.
- Parameters:
dist_name (str) – Name of distribution (e.g., ‘gamma’, ‘lognorm’, ‘norm’).
- Returns:
dist – Distribution function object.
- Return type:
scipy.stats distribution
- Raises:
ValueError – If distribution name not found in scipy.stats.
- driada.intense.stats.get_mi_distr_pvalue(data, val, distr_type='gamma')[source]
Calculate p-value by fitting a distribution to data.
- Parameters:
data (array-like) – Sample data (typically shuffled metric values).
val (float) – Observed value to compute p-value for.
distr_type (str, optional) – Distribution type to fit. Options: - ‘gamma’: Gamma distribution (requires positive values) - ‘gamma_zi’: Zero-inflated gamma (handles zeros explicitly) - ‘lognorm’: Log-normal distribution - Any scipy.stats distribution name Default: ‘gamma’.
- Returns:
p_value – P(X >= val) under fitted distribution. Returns 1.0 if distribution fitting fails.
- Return type:
Notes
For ‘gamma_zi’, uses zero-inflated gamma model (handles zeros)
For ‘gamma’ and ‘lognorm’, fits with floc=0 (zero lower bound)
For other distributions, uses default fitting
Returns conservative p-value (1.0) on fitting errors
- driada.intense.stats.reconstruct_stage1_pvals(me_total1, metric_distr_type='gamma')[source]
Reconstruct Stage 1 p-values from saved shuffle distributions.
Stage 1 skips p-value computation for performance (pre_pval=None). This function reconstructs them post-hoc from the saved me_total1 array using the same distribution fitting as Stage 2.
- Parameters:
me_total1 (np.ndarray, shape (n1, n2, nsh1+1)) – Stage 1 MI array. [:,:,0] = true MI, [:,:,1:] = shuffles.
metric_distr_type (str, optional) – Distribution type for p-value fitting. Default: ‘gamma’.
- Returns:
pre_pvals (np.ndarray, shape (n1, n2)) – Reconstructed p-values. NaN where MI data is missing/zero.
mi_values (np.ndarray, shape (n1, n2)) – True MI values (me_total1[:,:,0]).
- driada.intense.stats.get_mask(ptable, rtable, pval_thr, rank_thr)[source]
Create binary mask based on p-value and rank thresholds.
- Parameters:
- Returns:
mask – Binary mask: 1 where both conditions satisfied (p <= pval_thr AND rank >= rank_thr), 0 otherwise.
- Return type:
np.ndarray
- driada.intense.stats.stats_not_empty(pair_stats, current_data_hash, stage=1)[source]
Check if statistics are valid and complete for given stage.
- Parameters:
- Returns:
is_valid – True if stats are valid and complete, False otherwise.
- Return type:
- Raises:
ValueError – If stage is not 1 or 2.
- driada.intense.stats.criterion1(pair_stats, nsh1, topk=1)[source]
Check if pair passes stage 1 significance criterion.
- Parameters:
- Returns:
crit_passed – True if pair’s rank exceeds threshold (1 - topk/(nsh1+1)).
- Return type:
Notes
The criterion checks if: pre_rval > (1 - topk/(nsh1+1)) For topk=1 and nsh1=100, this requires pre_rval > 0.99
- driada.intense.stats.criterion2(pair_stats, nsh2, pval_thr, topk=5)[source]
Check if pair passes stage 2 significance criterion.
- Parameters:
- Returns:
crit_passed – True if both conditions met: 1) rval > (1 - topk/(nsh2+1)) 2) pval < pval_thr
- Return type:
Notes
Both rank and p-value criteria must be satisfied. Missing ‘rval’ or ‘pval’ results in False.
- driada.intense.stats.apply_stage_criterion(stage_stats, stage_num, n1, n2, n_shuffles, topk, multicorr_thr=None)[source]
Apply stage-appropriate significance criterion to all pairs.
Thin wrapper that delegates to criterion1 (Stage 1) or criterion2 (Stage 2) based on stage_num. Enables unified scan_stage() function.
- Parameters:
stage_stats (dict) – Nested dictionary with statistics from get_table_of_stats.
stage_num (int) – Stage number (1 or 2).
n1 (int) – Number of items in first dimension (neurons).
n2 (int) – Number of items in second dimension (features).
n_shuffles (int) – Number of shuffles used in this stage.
topk (int) – True MI should rank in top k among shuffles.
multicorr_thr (float, optional) – Multiple comparison corrected p-value threshold. Required for stage_num=2, ignored for stage_num=1.
- Return type:
- Returns:
significance (dict) – Nested dict with boolean significance for each pair. Keys are stage-specific: ‘stage1’ or ‘stage2’.
pass_mask (np.ndarray) – Binary mask of shape (n1, n2) with 1 for pairs that passed.
- driada.intense.stats.get_all_nonempty_pvals(all_stats, ids1, ids2)[source]
Extract all non-empty p-values from nested statistics dictionary.
- driada.intense.stats.get_table_of_stats(metable, optimal_delays, precomputed_mask=None, metric_distr_type='gamma_zi', nsh=0, stage=1)[source]
Convert metric table to statistics dictionary.
- Parameters:
metable (np.ndarray of shape (n1, n2, nsh+1)) – Metric values where [:,:,0] is true values, [:,:,1:] are shuffles.
optimal_delays (np.ndarray of shape (n1, n2)) – Optimal delays for each pair.
precomputed_mask (np.ndarray, optional) – Binary mask: 1 = compute stats, 0 = skip. Default: all ones.
metric_distr_type (str, optional) – Distribution for p-value calculation. Default: ‘gamma_zi’.
nsh (int, optional) – Number of shuffles. Default: 0.
stage (int, optional) – Stage (1 or 2) determines which stats to compute. Default: 1.
- Returns:
stage_stats – Nested dictionary with computed statistics for each pair.
- Return type:
- driada.intense.stats.merge_stage_stats(stage1_stats, stage2_stats)[source]
Merge statistics from stage 1 and stage 2.
- driada.intense.stats.merge_stage_significance(stage_1_significance, stage_2_significance)[source]
Merge significance results from stage 1 and stage 2.
Statistical tools for INTENSE analysis including distribution fitting and p-value computation.
Function Groups
- Distribution Functions
- driada.intense.stats.get_lognormal_p(data, val)[source]
Calculate p-value assuming log-normal distribution.
- Parameters:
data (array-like) – Sample data to fit log-normal distribution. Must contain positive values.
val (float) – Observed value to compute p-value for.
- Returns:
p_value – P(X >= val) under fitted log-normal distribution.
- Return type:
- Raises:
ValueError – If data contains non-positive values.
Notes
Fits log-normal distribution with floc=0 (zero lower bound). Log-normal distribution is suitable for positive-valued data.
- driada.intense.stats.get_gamma_p(data, val)[source]
Calculate p-value assuming gamma distribution.
- Parameters:
data (array-like) – Sample data to fit gamma distribution. Must contain positive values.
val (float) – Observed value to compute p-value for.
- Returns:
p_value – P(X >= val) under fitted gamma distribution.
- Return type:
- Raises:
ValueError – If data contains non-positive values.
Notes
Fits gamma distribution with floc=0 (zero lower bound). Gamma distribution is suitable for positive-valued data.
- driada.intense.stats.get_distribution_function(dist_name)[source]
Get distribution function from scipy.stats by name.
- Parameters:
dist_name (str) – Name of distribution (e.g., ‘gamma’, ‘lognorm’, ‘norm’).
- Returns:
dist – Distribution function object.
- Return type:
scipy.stats distribution
- Raises:
ValueError – If distribution name not found in scipy.stats.
- driada.intense.stats.chebyshev_ineq(data, val)[source]
Calculate upper bound on tail probability using Chebyshev’s inequality.
- Parameters:
data (array-like) – Sample data to estimate mean and std from. Must have non-zero variance.
val (float) – Value to compute tail probability for.
- Returns:
p_bound – Upper bound on P(X >= val) based on Chebyshev’s inequality.
- Return type:
- Raises:
ValueError – If data has zero variance (all values are identical).
Notes
Chebyshev’s inequality states that
P(abs(X - μ) >= k*σ) <= 1/k²This givesP(X >= val) <= 1/z²wherez = (val - μ)/σ
- P-value Computation
- driada.intense.stats.get_mi_distr_pvalue(data, val, distr_type='gamma')[source]
Calculate p-value by fitting a distribution to data.
- Parameters:
data (array-like) – Sample data (typically shuffled metric values).
val (float) – Observed value to compute p-value for.
distr_type (str, optional) – Distribution type to fit. Options: - ‘gamma’: Gamma distribution (requires positive values) - ‘gamma_zi’: Zero-inflated gamma (handles zeros explicitly) - ‘lognorm’: Log-normal distribution - Any scipy.stats distribution name Default: ‘gamma’.
- Returns:
p_value – P(X >= val) under fitted distribution. Returns 1.0 if distribution fitting fails.
- Return type:
Notes
For ‘gamma_zi’, uses zero-inflated gamma model (handles zeros)
For ‘gamma’ and ‘lognorm’, fits with floc=0 (zero lower bound)
For other distributions, uses default fitting
Returns conservative p-value (1.0) on fitting errors
- Data Filtering and Validation
- driada.intense.stats.get_mask(ptable, rtable, pval_thr, rank_thr)[source]
Create binary mask based on p-value and rank thresholds.
- Parameters:
- Returns:
mask – Binary mask: 1 where both conditions satisfied (p <= pval_thr AND rank >= rank_thr), 0 otherwise.
- Return type:
np.ndarray
- driada.intense.stats.stats_not_empty(pair_stats, current_data_hash, stage=1)[source]
Check if statistics are valid and complete for given stage.
- Parameters:
- Returns:
is_valid – True if stats are valid and complete, False otherwise.
- Return type:
- Raises:
ValueError – If stage is not 1 or 2.
- driada.intense.stats.criterion1(pair_stats, nsh1, topk=1)[source]
Check if pair passes stage 1 significance criterion.
- Parameters:
- Returns:
crit_passed – True if pair’s rank exceeds threshold (1 - topk/(nsh1+1)).
- Return type:
Notes
The criterion checks if: pre_rval > (1 - topk/(nsh1+1)) For topk=1 and nsh1=100, this requires pre_rval > 0.99
- driada.intense.stats.criterion2(pair_stats, nsh2, pval_thr, topk=5)[source]
Check if pair passes stage 2 significance criterion.
- Parameters:
- Returns:
crit_passed – True if both conditions met: 1) rval > (1 - topk/(nsh2+1)) 2) pval < pval_thr
- Return type:
Notes
Both rank and p-value criteria must be satisfied. Missing ‘rval’ or ‘pval’ results in False.
- Result Aggregation
- driada.intense.stats.get_table_of_stats(metable, optimal_delays, precomputed_mask=None, metric_distr_type='gamma_zi', nsh=0, stage=1)[source]
Convert metric table to statistics dictionary.
- Parameters:
metable (np.ndarray of shape (n1, n2, nsh+1)) – Metric values where [:,:,0] is true values, [:,:,1:] are shuffles.
optimal_delays (np.ndarray of shape (n1, n2)) – Optimal delays for each pair.
precomputed_mask (np.ndarray, optional) – Binary mask: 1 = compute stats, 0 = skip. Default: all ones.
metric_distr_type (str, optional) – Distribution for p-value calculation. Default: ‘gamma_zi’.
nsh (int, optional) – Number of shuffles. Default: 0.
stage (int, optional) – Stage (1 or 2) determines which stats to compute. Default: 1.
- Returns:
stage_stats – Nested dictionary with computed statistics for each pair.
- Return type:
- driada.intense.stats.merge_stage_stats(stage1_stats, stage2_stats)[source]
Merge statistics from stage 1 and stage 2.