environmentaltools.temporal.look_models

environmentaltools.temporal.look_models(data, variable, percentiles=[1], file_name='models_out', funcs='natural')[source]

Fit multiple probability models to data and rank by estimation quality.

Tests various probability distributions from scipy.stats and ranks them by Sum of Squared Errors (SSE) to identify the best-fitting model for the data.

Parameters:
  • data (pd.DataFrame) – Raw time series containing the variable to fit

  • variable (str) – Name of the variable column to analyze

  • percentiles (list, optional) – Values of CDF at transitions between different probability models for mixed distributions. Default: [1]

  • file_name (str, optional) – Name of the output file to save fitted parameters. Default: ‘models_out’

  • funcs (str or list, optional) – Probability models to test. Options: - ‘natural’: Common environmental distributions (default) - None: All continuous distributions in scipy.stats - list: Custom list of distribution names

Returns:

DataFrame with fitted parameters sorted by SSE (best fit first). Columns: ‘models’, ‘sse’, and distribution parameters (‘a’, ‘b’, ‘c’, etc.)

Return type:

pd.DataFrame

Notes

The ‘natural’ option includes distributions commonly used in environmental modeling: alpha, beta, expon, genpareto, genextreme, gamma, gumbel_r, gumbel_l, triang, lognorm, norm, rayleigh, weibull_min, weibull_max.

The SSE is computed between the empirical CDF and the fitted CDF. Lower SSE indicates better fit.

Examples

>>> results = look_models(data, 'wave_height', file_name='wave_models')
>>> print(results.head())  # Shows top 5 best-fitting models