environmentaltools.temporal.fit_distribution
- environmentaltools.temporal.fit_distribution(data: DataFrame, bins: int, model: str)[source]
Fit a probability distribution and compute goodness-of-fit metric.
Fits a specified probability distribution to data using maximum likelihood estimation and computes the sum of squared errors (SSE) between the empirical and theoretical cumulative distribution functions.
- Parameters:
data (pd.DataFrame or array-like) – Time series data to fit
bins (int) – Number of bins for histogram computation
model (scipy.stats distribution) – Probability distribution model to fit (e.g., scipy.stats.weibull_min)
- Returns:
Array containing [SSE, param1, param2, …, loc, scale] where:
- SSEfloat
Sum of squared errors between empirical and fitted CDF
- param1, param2, …float
Shape parameters of the distribution
- locfloat
Location parameter
- scalefloat
Scale parameter
- Return type:
np.ndarray
Notes
The function uses scipy’s built-in fit method with special handling for the generalized Pareto distribution (uses 0.01 as initial shape parameter).
SSE is computed as: sum((empirical_cdf - theoretical_cdf)^2)
If fitting fails (NaN result), SSE is set to 1e10.
Examples
>>> import numpy as np >>> import scipy.stats as st >>> data = st.weibull_min.rvs(1.5, loc=0, scale=2, size=1000) >>> results = fit_distribution(data, bins=50, model=st.weibull_min) >>> sse, shape, loc, scale = results[0], results[1], results[2], results[3]