environmentaltools.temporal.look_models
- environmentaltools.temporal.look_models(data, variable, percentiles=[1], file_name='models_out', funcs='natural')[source]
Fit multiple probability models to data and rank by estimation quality.
Tests various probability distributions from scipy.stats and ranks them by Sum of Squared Errors (SSE) to identify the best-fitting model for the data.
- Parameters:
data (pd.DataFrame) – Raw time series containing the variable to fit
variable (str) – Name of the variable column to analyze
percentiles (list, optional) – Values of CDF at transitions between different probability models for mixed distributions. Default: [1]
file_name (str, optional) – Name of the output file to save fitted parameters. Default: ‘models_out’
funcs (str or list, optional) – Probability models to test. Options: - ‘natural’: Common environmental distributions (default) - None: All continuous distributions in scipy.stats - list: Custom list of distribution names
- Returns:
DataFrame with fitted parameters sorted by SSE (best fit first). Columns: ‘models’, ‘sse’, and distribution parameters (‘a’, ‘b’, ‘c’, etc.)
- Return type:
pd.DataFrame
Notes
The ‘natural’ option includes distributions commonly used in environmental modeling: alpha, beta, expon, genpareto, genextreme, gamma, gumbel_r, gumbel_l, triang, lognorm, norm, rayleigh, weibull_min, weibull_max.
The SSE is computed between the empirical CDF and the fitted CDF. Lower SSE indicates better fit.
Examples
>>> results = look_models(data, 'wave_height', file_name='wave_models') >>> print(results.head()) # Shows top 5 best-fitting models