Information Theory Utilities

Utility functions and helper classes for information-theoretic computations.

Data Types

Advanced type detection for time series data.

This module provides comprehensive detection of time series types including: - Discrete vs Continuous classification - Circular/periodic signal detection - Probabilistic type inference with confidence scores - Detection of various data patterns (binary, categorical, count, etc.)

class driada.information.time_series_types.TimeSeriesType(primary_type, subtype, confidence, is_circular, circular_period, periodicity, metadata)[source]

Result of time series type detection.

Encapsulates the results of comprehensive time series analysis, including classification of the primary data type (discrete/continuous), subtype details, periodicity information, and confidence scores.

This class provides a structured way to represent and query time series characteristics, useful for selecting appropriate analysis methods.

Parameters:
  • primary_type (Literal['discrete', 'continuous', 'ambiguous'])

  • subtype (Literal['binary', 'categorical', 'count', 'timeline', 'linear', 'circular'] | None)

  • confidence (float)

  • is_circular (bool)

  • circular_period (float | None)

  • periodicity (float | None)

  • metadata (Dict[str, float])

primary_type

Primary classification of the time series: - ‘discrete’: Integer-valued or categorical data - ‘continuous’: Real-valued measurements - ‘ambiguous’: Cannot confidently determine type

Type:

{‘discrete’, ‘continuous’, ‘ambiguous’}

subtype

More specific classification: - Discrete subtypes: ‘binary’ (0/1), ‘categorical’, ‘count’ (non-negative integers) - Continuous subtypes: ‘linear’, ‘circular’ (phase/angle), ‘timeline’

Type:

{‘binary’, ‘categorical’, ‘count’, ‘timeline’, ‘linear’, ‘circular’}, optional

confidence

Confidence score for the classification (0-1). Higher values indicate more certainty in the type detection.

Type:

float

is_circular

Whether the data represents circular/angular quantities (e.g., phases, angles, time-of-day).

Type:

bool

circular_period

Period of circular data (e.g., 2π for radians, 360 for degrees).

Type:

float, optional

periodicity

Detected period from autocorrelation analysis. Non-circular data can still be periodic (e.g., oscillations, rhythms).

Type:

float, optional

metadata

Statistical properties computed during detection, including: - n_unique: number of unique values - unique_ratio: fraction of unique values - is_integer: whether all values are integers - has_decimals: whether any decimals present - entropy: Shannon entropy - Various other statistical measures

Type:

dict

Examples

>>> # Binary spike trains
>>> spikes = np.array([0, 0, 1, 0, 1, 1, 0, 0])
>>> result = analyze_time_series_type(spikes)
>>> result.primary_type
'discrete'
>>> result.subtype
'binary'
>>> # Circular phase data
>>> phases = np.random.uniform(0, 2*np.pi, 100)
>>> result = analyze_time_series_type(phases)
>>> result.is_circular
True

Notes

The detection algorithm uses multiple statistical tests and heuristics to determine the most likely type. For ambiguous cases (e.g., discretized continuous data), the confidence score helps indicate uncertainty.

primary_type: Literal['discrete', 'continuous', 'ambiguous']
subtype: Optional[Literal['binary', 'categorical', 'count', 'timeline', 'linear', 'circular']]
confidence: float
is_circular: bool
circular_period: Optional[float]
periodicity: Optional[float]
metadata: Dict[str, float]
property is_discrete: bool

Check if the time series is primarily discrete.

Returns:

True if the primary type is discrete, False otherwise.

Return type:

bool

property is_continuous: bool

Check if the time series is primarily continuous.

Returns:

True if the primary type is continuous, False otherwise.

Return type:

bool

property is_ambiguous: bool

Check if the time series type is ambiguous.

Returns:

True if the type detection was ambiguous (could not confidently classify as discrete or continuous), False otherwise.

Return type:

bool

property is_periodic: bool

Check if the time series has detected periodicity.

Returns:

True if periodicity was detected (periodicity is not None), False otherwise.

Return type:

bool

Notes

Returns True only for valid positive finite periods.

__init__(primary_type, subtype, confidence, is_circular, circular_period, periodicity, metadata)
Parameters:
  • primary_type (Literal['discrete', 'continuous', 'ambiguous'])

  • subtype (Literal['binary', 'categorical', 'count', 'timeline', 'linear', 'circular'] | None)

  • confidence (float)

  • is_circular (bool)

  • circular_period (float | None)

  • periodicity (float | None)

  • metadata (Dict[str, float])

Return type:

None

driada.information.time_series_types.analyze_time_series_type(data, name=None, confidence_threshold=0.7, min_samples=30, verbose=False)[source]

Analyze and detect the type of a time series using comprehensive statistical analysis.

Parameters:
  • data (np.ndarray) – 1D array of time series values. Must contain numeric data.

  • name (str, optional) – Name of the time series (used for context-aware detection)

  • confidence_threshold (float) – Minimum confidence for definitive classification

  • min_samples (int) – Minimum samples for reliable detection

  • verbose (bool) – Print detection details

Returns:

Comprehensive type detection results

Return type:

TimeSeriesType

Raises:
  • ValueError – If data is empty, contains non-numeric values, or contains NaN/Inf values.

  • TypeError – If data cannot be converted to numpy array.

driada.information.time_series_types.is_discrete_time_series(ts, return_confidence=False)[source]

Simple function that returns whether time series is discrete (True) or continuous (False).

Parameters:
  • ts (np.ndarray) – Time series data

  • return_confidence (bool) – If True, also return confidence score

Returns:

True if discrete, False if continuous. If return_confidence=True, returns (is_discrete, confidence)

Return type:

bool or tuple

Raises:
  • ValueError – If data is empty, contains non-numeric values, or contains NaN/Inf values.

  • TypeError – If data cannot be converted to numpy array.

driada.information.time_series_types.detect_ts_type(ts, return_confidence=False)

Simple function that returns whether time series is discrete (True) or continuous (False).

Parameters:
  • ts (np.ndarray) – Time series data

  • return_confidence (bool) – If True, also return confidence score

Returns:

True if discrete, False if continuous. If return_confidence=True, returns (is_discrete, confidence)

Return type:

bool or tuple

Raises:
  • ValueError – If data is empty, contains non-numeric values, or contains NaN/Inf values.

  • TypeError – If data cannot be converted to numpy array.

Helper Functions

driada.information.info_utils.py_fast_digamma_arr(data)

Compute digamma function for an array of values using fast approximation.

This is a JIT-compiled version that processes arrays efficiently. Uses a series expansion approximation that is accurate for x > 5.

Parameters:

data (ndarray) – Input array of positive values. All values must be > 0.

Returns:

Array of digamma values corresponding to input data.

Return type:

ndarray

Raises:

None – Invalid inputs (x <= 0) return NaN instead of raising exceptions due to numba JIT compilation constraints.

Notes

The algorithm uses a recurrence relation to shift x to the range where the series expansion is accurate (x > 5), then applies an asymptotic expansion with correction terms.

For x <= 0, the function returns NaN to avoid infinite loops.

driada.information.info_utils.py_fast_digamma(x)

Compute digamma function for a single value using fast approximation.

This is a JIT-compiled scalar version of the digamma (psi) function. Uses a series expansion approximation that is accurate for x > 5.

Parameters:

x (float) – Input value. Must be positive (x > 0).

Returns:

The digamma function value psi(x).

Return type:

float

Raises:

None – Invalid inputs (x <= 0) return NaN instead of raising exceptions due to numba JIT compilation constraints.

Notes

The digamma function is the logarithmic derivative of the gamma function: psi(x) = d/dx log(Gamma(x)) = Gamma’(x) / Gamma(x)

The algorithm uses: 1. Recurrence relation psi(x) = psi(x+1) - 1/x to shift to x > 5 2. Asymptotic expansion for large x with Bernoulli number corrections

For x <= 0, the function returns NaN to avoid infinite loops.

driada.information.info_utils.binary_mi_score(contingency)[source]

Calculate mutual information for discrete variables from contingency table.

Computes the mutual information between two discrete random variables based on their joint probability distribution represented as a contingency table.

Parameters:

contingency (ndarray of shape (n_classes_x, n_classes_y)) – Contingency table where element [i, j] contains the count of samples with x=i and y=j. Must contain non-negative values.

Returns:

Mutual information score in nats (natural log base). Returns 0.0 if either variable has only one class.

Return type:

float

Raises:
  • ValueError – If contingency table has wrong dimensions or contains negative values.

  • TypeError – If contingency is not array-like or contains non-numeric values.

Notes

The mutual information is calculated as: MI(X,Y) = sum_ij P(x_i, y_j) * log(P(x_i, y_j) / (P(x_i) * P(y_j)))

This implementation: - Handles sparse contingency tables efficiently by only computing over non-zero entries - Returns 0 for degenerate cases (single cluster) - Clips negative values due to numerical errors to 0

JIT-Optimized Functions

JIT-compiled entropy calculation functions for performance optimization.

Performance characteristics:

  • entropy_d_jit: Faster for small datasets (< 1000 samples), slower for large datasets due to numpy’s highly optimized C implementation. Best speedups seen with small data.

  • joint_entropy_dd_jit: Consistently faster across all dataset sizes (2x-30x speedup) as it avoids the overhead of numpy’s histogram2d function.

The implementations use vectorized operations without explicit loops where possible, leveraging numba’s ability to compile numpy operations efficiently.

driada.information.entropy_jit.entropy_d_jit(x)

JIT-compiled discrete entropy calculation using vectorized operations.

Parameters:

x (array-like) – Discrete variable values. Must be numeric and sortable.

Returns:

Entropy in bits. Returns 0.0 for empty arrays.

Return type:

float

Raises:
  • AttributeError – If x doesn’t have required array attributes (size).

  • TypeError – If x contains non-numeric or non-sortable values.

Notes

Uses Shannon entropy formula H = -Σ(p*log2(p)) where p are the probabilities of each unique value. The implementation sorts the input array to efficiently count unique values.

driada.information.entropy_jit.joint_entropy_dd_jit(x, y)

JIT-compiled joint entropy for two discrete variables.

Parameters:
  • x (array-like) – First discrete variable. Must be numeric.

  • y (array-like) – Second discrete variable. Must be numeric and same length as x.

Returns:

Joint entropy H(X,Y) in bits.

Return type:

float

Raises:
  • ValueError – If x and y have different lengths or are empty.

  • AttributeError – If x or y don’t have required array attributes.

  • TypeError – If x or y contain non-numeric values.

Notes

Creates a joint encoding of (x,y) pairs and calculates entropy of the joint distribution. Uses overflow-safe encoding with automatic fallback to Cantor pairing for large value ranges.