Entropy Estimation
This module provides functions for estimating entropy of discrete and continuous random variables.
Functions
- driada.information.entropy.entropy_d(x)[source]
Calculate entropy for a discrete variable.
Automatically selects between JIT-compiled and numpy implementations based on dataset size for optimal performance. JIT version is used for arrays smaller than ENTROPY_D_JIT_THRESHOLD (1000 elements).
- Parameters:
x (array-like) – Discrete variable values. Should contain numeric values (integers or floats representing discrete states).
- Returns:
Entropy in bits.
- Return type:
- Raises:
ValueError – If input is not numeric.
Examples
>>> entropy_d([1, 1, 2, 2]) # uniform binary distribution 1.0 >>> entropy_d([1, 2, 3, 4]) # uniform 4-way distribution 2.0
Notes
For small datasets (< 1000 elements), automatically uses JIT-compiled implementation if available. For larger datasets, uses optimized numpy implementation to avoid JIT compilation overhead.
- driada.information.entropy.joint_entropy_dd(x, y)[source]
Calculate joint entropy for two discrete variables.
Automatically uses JIT-compiled version which is consistently faster than the histogram2d approach across all dataset sizes.
- Parameters:
x (array-like) – First discrete variable. Must have same length as y.
y (array-like) – Second discrete variable. Must have same length as x.
- Returns:
Joint entropy H(X,Y) in bits.
- Return type:
Examples
>>> joint_entropy_dd([1, 1, 2, 2], [1, 2, 1, 2]) # independent 2.0 >>> joint_entropy_dd([1, 1, 2, 2], [1, 1, 2, 2]) # perfectly dependent 1.0
Notes
When JIT compilation is available, always uses the JIT version as it is consistently faster. Falls back to histogram2d-based implementation if JIT is not available.
- driada.information.entropy.conditional_entropy_cdd(z, x, y, k=5, estimator='gcmi')[source]
Calculate conditional differential entropy for a continuous variable given two discrete variables.
Computes H(Z|X,Y) where Z is continuous and X,Y are discrete. Two estimators are available: GCMI (fast, Gaussian assumption) and KSG (accurate, nonparametric).
- Parameters:
z (array-like) – Continuous variable. Must have same length as x and y.
x (array-like) – First discrete variable. Must have same length as z and y.
y (array-like) – Second discrete variable. Must have same length as z and x.
k (int, optional) – For KSG: number of nearest neighbors. For GCMI: minimum subset size threshold (partitions smaller than k are excluded). Default: 5.
estimator ({'gcmi', 'ksg'}, optional) – Entropy estimation method: - ‘gcmi’: Fast, assumes Gaussian distribution - ‘ksg’: Accurate, nonparametric k-nearest neighbor approach Default: ‘gcmi’.
- Returns:
Conditional entropy H(Z|X,Y) in bits.
- Return type:
Examples
>>> z = [0.1, 0.2, 0.8, 0.9, 0.3, 0.7] >>> x = [1, 1, 2, 2, 1, 2] >>> y = [1, 2, 1, 2, 1, 1] >>> result = conditional_entropy_cdd(z, x, y, k=3) >>> isinstance(result, float) True
Notes
GCMI estimator is faster but assumes data follows Gaussian distribution. KSG estimator is slower but works for arbitrary continuous distributions.
- driada.information.entropy.conditional_entropy_cd(z, x, k=5, estimator='gcmi')[source]
Calculate conditional differential entropy for a continuous variable given a discrete variable.
Computes H(Z|X) where Z is continuous and X is discrete. Two estimators are available: GCMI (fast, Gaussian assumption) and KSG (accurate, nonparametric).
- Parameters:
z (array-like) – Continuous variable. Must have same length as x.
x (array-like) – Discrete variable. Must have same length as z.
k (int, optional) – For KSG: number of nearest neighbors. For GCMI: minimum subset size threshold (partitions smaller than k are excluded). Default: 5.
estimator ({'gcmi', 'ksg'}, optional) – Entropy estimation method: - ‘gcmi’: Fast, assumes Gaussian distribution - ‘ksg’: Accurate, nonparametric k-nearest neighbor approach Default: ‘gcmi’.
- Returns:
Conditional entropy H(Z|X) in bits.
- Return type:
Examples
>>> z = [0.1, 0.2, 0.8, 0.9] >>> x = [1, 1, 2, 2] >>> result = conditional_entropy_cd(z, x, k=1) >>> isinstance(result, float) True
Notes
GCMI estimator is faster but assumes data follows Gaussian distribution. KSG estimator is slower but works for arbitrary continuous distributions.
Usage Examples
Joint Entropy
from driada.information.entropy import joint_entropy_dd
import numpy as np
# Joint entropy of two variables
x = np.random.randint(0, 4, 1000)
y = np.random.randint(0, 3, 1000)
H_xy = joint_entropy_dd(x, y)
print(f"H(X,Y) = {H_xy:.3f} bits")
Conditional Entropy
from driada.information.entropy import conditional_entropy_cdd
import numpy as np
# H(Z|X,Y) - uncertainty in continuous Z given discrete X,Y
x = np.random.randint(0, 3, 1000) # Discrete
y = np.random.randint(0, 2, 1000) # Discrete
z = np.random.randn(1000) + x # Continuous, depends on X
H_z_given_xy = conditional_entropy_cdd(z, x, y)
print(f"H(Z|X,Y) = {H_z_given_xy:.3f} bits")
Theory
Shannon Entropy:
For discrete random variable X:
For continuous random variable X:
Conditional Entropy:
Mutual Information: