environmentaltools.temporal.fit_distribution

environmentaltools.temporal.fit_distribution(data: DataFrame, bins: int, model: str)[source]

Fit a probability distribution and compute goodness-of-fit metric.

Fits a specified probability distribution to data using maximum likelihood estimation and computes the sum of squared errors (SSE) between the empirical and theoretical cumulative distribution functions.

Parameters:
  • data (pd.DataFrame or array-like) – Time series data to fit

  • bins (int) – Number of bins for histogram computation

  • model (scipy.stats distribution) – Probability distribution model to fit (e.g., scipy.stats.weibull_min)

Returns:

Array containing [SSE, param1, param2, …, loc, scale] where:

  • SSEfloat

    Sum of squared errors between empirical and fitted CDF

  • param1, param2, …float

    Shape parameters of the distribution

  • locfloat

    Location parameter

  • scalefloat

    Scale parameter

Return type:

np.ndarray

Notes

The function uses scipy’s built-in fit method with special handling for the generalized Pareto distribution (uses 0.01 as initial shape parameter).

SSE is computed as: sum((empirical_cdf - theoretical_cdf)^2)

If fitting fails (NaN result), SSE is set to 1e10.

Examples

>>> import numpy as np
>>> import scipy.stats as st
>>> data = st.weibull_min.rvs(1.5, loc=0, scale=2, size=1000)
>>> results = fit_distribution(data, bins=50, model=st.weibull_min)
>>> sse, shape, loc, scale = results[0], results[1], results[2], results[3]