Temporal Module

Temporal module for time series analysis and simulation.

This module provides comprehensive tools for statistical characterization and simulation of environmental time series, including: - Marginal and joint distribution fitting - Non-stationary analysis - Extreme value analysis (POT and annual maxima) - Time series simulation - Classification and regime analysis

The subpackage temporal package aimed at providing users with a friendly, general code to statistically characterize a vector random process (RP) to obtain realizations of it. It is implemented in Python - an interpreted, high-level, object-oriented programming language widely used in the scientific community - and it makes the most of the Python packages ecosystem. Among the existing Python packages, it uses Numpy, which is the fundamental package for scientific computing in Python [1], SciPy, which offers a wide range of optimization and statistics routines [2]. Pandas [3] to analyse and manipulate data.

The tools implemented in the package named temporal allow to capture the statistical properties of a non stationary (NS) vector RP by using compound or piecewise parametric PMs to properly describe all the range of values and to simulate uni- or multivariate time series with the same random behavior. The statistical parameters of the distributions are assumed to depend on time and are expanded into a Generalized Fourier Series (GFS) [4] in order to reproduce their NS behavior. The applicability of the present approach has been illustrated in several works with different purposes, among others: (i) the observed wave climate variability in the preceding century and expected changes in projections under a climate change scenario [5]; (ii) the optimal design and management of an oscillating water column system [6] [7], (iii) the planning of maintenance strategies of coastal structures [8], (iv) the analysis of monthly Wolf sunspot number over a 22 year period [4], and (v) the simulation of estuarine water conditions for the management of the estuary [9].

Analysis Functions

Marginal Fitting

fit_marginal_distribution(df, parameters[, ...])

Fits a stationary (or not), simple or mixed probability model to data.

check_marginal_params(param)

Checks the input parameters and includes some required arguments for the computation

add_noise_to_array(data, variables[, ...])

Adds small random noise to the selected variable(s) in a time series for better estimations.

look_models(data, variable[, percentiles, ...])

Fit multiple probability models to data and rank by estimation quality.

Storm Analysis

storm_series(data, cols, info)

Extract storm events from time series data using threshold-based criteria.

storm_properties(data, cols, info)

Extract and characterize individual storm event properties.

Dependencies

dependencies(df, param)

Fit temporal dependency structure using Vector Autoregression (VAR) model.

check_dependencies_params(param)

Checks the input parameters and includes some required arguments for the computation of multivariate dependencies

fit_var_model(data, order)

Computes the coefficients of the VAR(p) model and chooses the model with lowest BIC.

varfit_OLS(y, z)

Estimates the parameters of VAR using the RMSE described in Lutkepohl (ecs.

ensemble_dt(models[, percentiles])

Compute ensemble of temporal dependency parameters from multiple models.

Indicators and Confidence

iso_indicators(indicators, reference, variable)

Compute indicators from iso-probability lines of non-stationary CDF.

confidence_bands(rmse, n, confidence_level)

Calculate confidence bands for model predictions using RMSE.

generate_outputfilename(parameters)

_summary_

Statistical Fitting Functions (Core)

Stationary Analysis

stationary_analysis(df, param)

Fit stationary probability distribution models to time series data.

fit_distribution(data, bins, model)

Fit a probability distribution and compute goodness-of-fit metric.

Non-Stationary Analysis

nonstationary_analysis(df, param)

Perform non-stationary probability analysis with time-varying parameters.

nonst_fit(df, param)

Fits a non-stationary probability model

fourier_initialization(df, param)

Initialize Fourier expansion parameters using moving window analysis.

fourier_expansion(data, par, param)

Estimate time-varying distribution parameters using Fourier expansion.

initial_params(param, par, pos, imode, comp)

Prepares the parameters from previous fit to the following

matching_lower_bound(par)

Matching conditions between two probability models (PMs).

Distribution Functions

fit(df_, param_, par0, mode_, ref)

Fits the data to the given probability model

negative_log_likelihood(par, df, imod, ...)

Compute negative log-likelihood for time-varying distribution parameters.

get_params(df, param, par, imod, t_expans)

Gets the parameters of the probability models for fitting

ppf(df, param)

Computes the inverse of the probability function

cdf(df, param[, ppf])

Computes the cumulative distribution function

transform(data, params)

Apply power transformation to normalize data distribution.

inverse_transform(data, params[, ensemble])

Reverse power transformation to original data scale.

numerical_cdf_pdf_at_n(n, param, variable[, ...])

Compute the cdf and pdf numerically at a given time "n" (normalized year)

ensemble_cdf(data, param, variable[, nodes])

Computes the ensemble of the cumulative distribution function for several models

ensemble_ppf(df, param, variable[, nodes])

Computes the inverse of the cumulative distribution function

params_t_expansion(mod, param, nper)

Computes the oscillatory dependency in time of the parameters

Simulation Functions

do_simulations(param[, tidalConstituents, ...])

Simulates time series with the probability model given in param and the temporal dependency given in df_dt.

check_parameters(param)

_summary_ TODO

var_simulation(par, lsim, distribution)

Creates new normalized multivariate time series

class_seasons(date[, type_])

Obtains the season of any date

Regime Analysis Functions

Confidence Intervals and Bootstrapping

confidence_intervals(boot, alpha, resample, ...)

Calculate confidence intervals using percentile or BCa bootstrap methods.

bootstrapping(peaks, tri, method, func, ...)

Perform bootstrap resampling for return period estimation.

Probability Models

probability_model_fit(data, tr, method, ...)

Estimate probability distribution parameters and return period values.

l_mom(data, func)

Estimate distribution parameters using L-moments method.

lmom_genpareto(data, tr, nyears)

Fit Generalized Pareto Distribution using L-moments with diagnostic metrics.

Extreme Value Analysis

pot_method(df, var_, window_size[, alpha, ...])

Perform Peaks Over Threshold (POT) analysis with multiple thresholds.

au2(par, data)

Calculate Anderson-Darling test statistic for GPD goodness-of-fit.

fit_gpd_bootstrap(df_eventos, alpha, umb, ...)

Compute GPD parameters with bootstrap resampling and confidence intervals.

annual_maxima_method(df, alpha, method, ...)

Fit GEV or Gumbel distribution to annual maxima with bootstrap CI.

Classification Functions

Storm Classification

class_storm_seasons(df_vars_ciclos[, type_])

Splits the data into seasons.

classification(cases, cases_sha, data, ...)

Classifies data using various machine learning classifiers.

MDA and Reconstruction

maximum_dissimilarity_algorithm(data, ...[, ...])

Implements the Maximum Dissimilarity Algorithm (Camus et al. 2011).

reconstruction(cases_deep, data_deep, ...[, ...])

Reconstructs deep water variables from shallow water data using regression methods.

regression(base_train, target_train, base_pred)

Performs regression using various interpolation and machine learning methods.

normalize(data, variables[, circular])

Normalizes data using the maximum distance between values

Copula Analysis

Bivariate Dependence Modeling

class environmentaltools.temporal.Copula(X, Y, family, *args)[source]

Bases: object

Copula-based bivariate dependence modeling.

Estimates copula parameters for joint random variables and generates correlated samples. Supports Clayton, Frank, and Gumbel copula families.

Parameters:
  • X (np.ndarray) – Data of the first variable (1-dimensional)

  • Y (np.ndarray) – Data of the second variable (1-dimensional, same size as X)

  • family (str) – Copula family: ‘clayton’, ‘frank’, or ‘gumbel’

  • *args (tuple, optional) – Theoretical marginal distributions: (F1, p1, F2, p2) where F1, F2 are scipy.stats distributions and p1, p2 are parameter tuples

X

First variable data

Type:

np.ndarray

Y

Second variable data

Type:

np.ndarray

family

Selected copula family

Type:

str

theta

Estimated copula parameter

Type:

float

tau

Kendall’s tau correlation coefficient

Type:

float

pr

Pearson correlation coefficient and p-value

Type:

tuple

sr

Spearman correlation coefficient and p-value

Type:

tuple

U

Generated uniform samples for first variable

Type:

np.ndarray or None

V

Generated uniform samples for second variable

Type:

np.ndarray or None

X1

Generated samples in original scale for first variable

Type:

np.ndarray

Y1

Generated samples in original scale for second variable

Type:

np.ndarray

Notes

Based on the ambhas 0.4.0 module by Sat Kumar Tomer. Original: http://civil.iisc.ernet.in/~satkumar/

Modifications include: - Conditional copulas for all families - Replaced statistics package with sklearn.neighbors.KernelDensity

The copula parameter theta is estimated from Kendall’s tau: - Clayton: theta = 2*tau / (1 - tau) - Frank: theta optimized via Debye function - Gumbel: theta = 1 / (1 - tau)

Examples

>>> from scipy.stats import norm, lognorm
>>> x = np.random.normal(0, 1, 1000)
>>> y = 2*x + np.random.normal(0, 0.5, 1000)
>>> cop = Copula(x, y, 'clayton')
>>> cop.generate_xy(500)
>>> print(cop.theta, cop.tau)

References

Nelsen, R.B. (2006). An Introduction to Copulas. Springer.

Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall.

estimate(data=None)[source]

Estimate ensemble statistics for Y conditioned on X values.

Parameters:

data (np.ndarray, optional) – X values at which to estimate Y statistics. If None, uses self.X.

Returns:

  • Y1_mean (np.ndarray) – Mean of Y for each X value

  • Y1_std (np.ndarray) – Standard deviation of Y for each X value

  • Y1_ll (np.ndarray) – 25th percentile (lower quartile) of Y for each X value

  • Y1_ul (np.ndarray) – 75th percentile (upper quartile) of Y for each X value

Notes

The function: 1. Generates copula ensemble if not already available (10000 samples) 2. Bins X1 samples into 50 bins 3. Computes statistics within each bin 4. Interpolates statistics linearly to provided data points

Useful for uncertainty quantification and conditional predictions.

Examples

>>> cop = Copula(x, y, 'frank')
>>> x_new = np.linspace(x.min(), x.max(), 100)
>>> y_mean, y_std, y_ll, y_ul = cop.estimate(x_new)
>>> plt.plot(x_new, y_mean)
>>> plt.fill_between(x_new, y_ll, y_ul, alpha=0.3)
generate_C(u, v)[source]

Compute copula values C(u, v) on a grid.

Parameters:
  • u (np.ndarray) – Grid points for first variable (uniform scale)

  • v (np.ndarray) – Grid points for second variable (uniform scale)

Notes

Computes the copula CDF C(u,v) = P(U ≤ u, V ≤ v) on a meshgrid using family-specific formulas:

Clayton: C(u,v) = (u^(-theta) + v^(-theta) - 1)^(-1/theta)

Frank: C(u,v) = -1/theta * log(1 + (exp(-theta*u)-1)*(exp(-theta*v)-1)/(exp(-theta)-1))

Gumbel: C(u,v) = exp(-((−log u)^theta + (−log v)^theta)^(1/theta))

Results stored in self.C as a 2D array.

Examples

>>> u = np.linspace(0.01, 0.99, 50)
>>> v = np.linspace(0.01, 0.99, 50)
>>> cop.generate_C(u, v)
>>> plt.contourf(u, v, cop.C)
generate_cond()[source]

Generate conditional samples V given U using the conditional copula.

Generates V from the conditional distribution C(V|U) where U is provided in self.U. Uses the conditional copula: C(v|u) = ∂C(u,v)/∂u

Raises:

ValueError – For Clayton: if theta <= -1 or theta == 0 For Frank: if theta == 0 For Gumbel: if theta <= 1

Notes

Conditional generation formulas:

Clayton:
V = U^(-theta) * (U^(-theta) + W^(-theta) - 1)^(-1/theta) /

(U * (U^(-theta) + W^(-theta) - 1))

Frank:

Uses exponential transformation of conditional inverse

Gumbel:

Complex expression involving logarithms and power transformations

Results stored in self.V. Requires self.U to be set beforehand.

Examples

>>> cop.U = np.array([0.2, 0.5, 0.8])
>>> cop.generate_cond()
>>> print(cop.V)
generate_uv(n=1000)[source]

Generate random uniform variables (U, V) from the copula.

Parameters:

n (int, optional) – Number of copula samples to generate. Default is 1000.

Raises:

ValueError – For Clayton: if theta <= -1 or theta == 0 For Frank: if theta == 0 For Gumbel: if theta <= 1

Notes

Generation algorithms by copula family:

Clayton: Uses conditional distribution method - U ~ Uniform(0,1) - V = U * (W^(-theta/(1+theta)) - 1 + U^theta)^(-1/theta)

Frank: Uses conditional inverse method - Solves implicit equation involving exponentials

Gumbel: Uses Marshall-Olkin algorithm - Generates stable distribution random variables - Uses rejection sampling for complex dependence structure

Results stored in self.U and self.V attributes.

Examples

>>> cop = Copula(x, y, 'gumbel')
>>> cop.generate_uv(500)
>>> print(cop.U[:5], cop.V[:5])
generate_xy(n=0)[source]

Generate random samples (X, Y) in original scale from the copula.

Parameters:

n (int, optional) – Number of samples to generate. If 0 and U is not None, generates conditional samples. Default is 0.

Notes

The function:

  1. Generates (U, V) from copula if not already generated

  2. Transforms to original scale using:

    • Theoretical marginals (if provided in __init__)

    • Empirical inverse CDFs via kernel density estimation

  3. Stores results in self.X1 and self.Y1

  4. Resets self.U and self.V to None

If U is already set, generates conditional V given U. Uses Epanechnikov kernel with bandwidth=0.1 for empirical CDFs.

Examples

>>> cop = Copula(x, y, 'clayton')
>>> cop.generate_xy(1000)
>>> plt.scatter(cop.X1, cop.Y1)

Utility Functions

Event Detection

environmentaltools.temporal.extreme_events(data, var_name, threshold, minimum_interarrival_time, minimum_cycle_length, interpolation=False, interpolation_method='linear', interpolation_freq='1min', truncate=False, extra_info=False)[source]

Extract storm and calm cycles from time series data using threshold-based detection.

Identifies independent extreme events (storms) and calm periods by detecting threshold crossings, merging nearby events, and filtering short-duration events. Optionally interpolates crossing times for improved temporal resolution.

Parameters:
  • data (pd.DataFrame) – Time series data with datetime index

  • var_name (str) – Name of the variable column to analyze

  • threshold (float) – Threshold value separating storm (cycle) and calm periods

  • minimum_interarrival_time (pd.Timedelta) – Minimum time between storms to ensure statistical independence

  • minimum_cycle_length (pd.Timedelta) – Minimum storm duration to consider it valid

  • interpolation (bool, default=False) – Enable interpolation to estimate precise threshold crossing times

  • interpolation_method (str, default='linear') – Interpolation method: ‘linear’, ‘cubic’, ‘nearest’, etc.

  • interpolation_freq (str, default='1min') – Frequency for interpolation grid (e.g., ‘1min’, ’10s’, ‘1h’)

  • truncate (bool, default=False) – Clip storm values below threshold and calm values above threshold to the threshold value

  • extra_info (bool, default=False) – Return additional diagnostic information

Returns:

  • cycles (list of pd.Series) – List of storm cycles, each as a Series with values and timestamps

  • calm_periods (list of pd.Series) – List of calm periods between storms

  • info (dict, optional) – Additional information (only if extra_info=True):

    • ’data_cycles’pd.DataFrame

      Data for all storm periods

    • ’data_calm_periods’pd.DataFrame

      Data for all calm periods

    • ’cycles_indexes_clipped’list

      Indices where storm values were clipped (if truncate=True)

    • ’calm_periods_indexes_clipped’list

      Indices where calm values were clipped (if truncate=True)

    • ’interpolation_indexes’np.ndarray

      Indices of interpolated threshold crossings (if interpolation=True)

Notes

The algorithm follows these steps:

  1. Identify initial threshold crossings

  2. Optionally interpolate to refine crossing times

  3. Merge nearby storms (within minimum_interarrival_time)

  4. Remove short storms (< minimum_cycle_length)

  5. Optionally truncate values to threshold

Storm independence is crucial for extreme value analysis. The minimum_interarrival_time should be chosen based on the decorrelation time of the process (typically 3-5 times the characteristic timescale).

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'Hs': [1.2, 2.5, 3.1, 2.8, 1.5, 0.8, 1.0, 2.9, 3.5, 2.1]
... }, index=pd.date_range('2020', periods=10, freq='6h'))
>>> cycles, calms = extreme_events(
...     df, 'Hs', threshold=2.0,
...     minimum_interarrival_time=pd.Timedelta('12h'),
...     minimum_cycle_length=pd.Timedelta('6h')
... )
>>> print(f"Found {len(cycles)} storm events")
environmentaltools.temporal.events_duration(events)[source]

Calculate duration of each event (storm or calm period).

Computes the time span of each event from its first to last timestamp.

Parameters:

events (list of pd.Series) – List of events where each element is a Series with values and datetime index representing the event time evolution

Returns:

Duration of each event (as timedelta) with the event start time as index

Return type:

pd.Series

Examples

>>> events = [
...     pd.Series([2.5, 3.0, 2.8], index=pd.date_range('2020-01-01', periods=3, freq='1h')),
...     pd.Series([2.2, 2.7], index=pd.date_range('2020-01-05', periods=2, freq='1h'))
... ]
>>> durations = events_duration(events)
>>> print(durations)
environmentaltools.temporal.values_over_threshold(data, threshold)[source]

Identify indices where data exceeds threshold using sparse arrays.

Efficiently detects contiguous blocks of values above threshold using pandas sparse arrays for memory efficiency.

Parameters:
  • data (pd.Series) – Time series data

  • threshold (float) – Threshold value for event detection

Returns:

  • cycles_start (np.ndarray) – Integer indices marking the start of each exceedance block

  • cycles_end (np.ndarray) – Integer indices marking the end (exclusive) of each exceedance block

Notes

Uses pandas sparse block arrays for efficient storage and detection of contiguous regions above threshold.

Examples

>>> data = pd.Series([1.0, 2.5, 3.0, 2.8, 1.5, 0.8, 2.9, 3.1])
>>> start, end = values_over_threshold(data, threshold=2.0)
>>> print(f"Exceedance blocks: {list(zip(start, end))}")

Interpolation

environmentaltools.temporal.interpolation_series(data, index_start, index_end)[source]

Prepare data for interpolation between cycle boundaries.

Creates a DataFrame with start and end values grouped for subsequent interpolation across threshold crossings.

Parameters:
  • data (pd.Series) – Time series data

  • index_start (array-like) – Indices just before threshold crossings

  • index_end (array-like) – Indices just after threshold crossings

Returns:

Combined data with ‘value’ and ‘group’ columns for interpolation

Return type:

pd.DataFrame

Notes

Groups are created to interpolate each crossing independently.

environmentaltools.temporal.interpolation_boundaries_index(data, cycles_start, cycles_end, interpolation_freq, interpolation_method, threshold)[source]

Find precise threshold crossing times through interpolation.

Interpolates between data points surrounding threshold crossings to estimate the exact time when the threshold was crossed.

Parameters:
  • data (pd.Series) – Time series data with datetime index

  • cycles_start (np.ndarray) – Indices of cycle start positions

  • cycles_end (np.ndarray) – Indices of cycle end positions

  • interpolation_freq (str) – Temporal frequency for interpolation grid (e.g., ‘1min’, ’10s’)

  • interpolation_method (str) – Interpolation method: ‘linear’, ‘cubic’, ‘quadratic’, etc.

  • threshold (float) – Threshold value to find crossing times for

Returns:

DatetimeIndex values of estimated threshold crossing times

Return type:

np.ndarray

Notes

The function: 1. Creates dense time grid at interpolation_freq around crossings 2. Interpolates data values onto this grid 3. Finds timestamps closest to threshold value

This provides sub-sample temporal resolution for event boundaries.

environmentaltools.temporal.interpolation_nearest(group, threshold)[source]

Find nearest interpolated value to threshold.

Helper function to locate the timestamp where interpolated values are closest to the threshold value.

Parameters:
  • group (pd.DataFrame) – DataFrame with ‘value’ column containing interpolated data

  • threshold (float) – Target threshold value

Returns:

Single-row DataFrame with the value closest to threshold

Return type:

pd.DataFrame

Notes

Uses absolute difference to find the nearest match. Returns the row with minimum absolute difference (value - threshold).

Event Analysis

environmentaltools.temporal.extreme_indexes(data, extreme_start, extreme_end)[source]

Create boolean mask for extreme event periods.

Generates a boolean array indicating which time steps fall within any extreme event period.

Parameters:
  • data (pd.Series) – Time series data (used for length reference)

  • extreme_start (np.ndarray) – Array of event start indices

  • extreme_end (np.ndarray) – Array of event end indices (exclusive)

Returns:

Boolean array where True indicates time steps within extreme events

Return type:

np.ndarray

Notes

The function: 1. Creates condition matrix checking if each index falls within any event 2. Uses broadcasting: (n_events, n_timesteps) comparison 3. Returns True if index >= start AND < end for any event

Examples

>>> data = pd.Series(range(100))
>>> starts = np.array([10, 50])
>>> ends = np.array([20, 60])
>>> mask = extreme_indexes(data, starts, ends)
>>> mask[15]  # Within first event
True
>>> mask[30]  # Between events
False
environmentaltools.temporal.near_events(data, cycles_start, cycles_end, minimum_interarrival_time, interpolation)[source]

Identify non-independent events for merging.

Determines which consecutive events are separated by less than the minimum interarrival time, indicating they should be merged into single events.

Parameters:
  • data (pd.Series) – Time series data with datetime index

  • cycles_start (np.ndarray) – Array of cycle start indices

  • cycles_end (np.ndarray) – Array of cycle end indices

  • minimum_interarrival_time (pd.Timedelta) – Minimum time gap required between independent events

  • interpolation (bool) – Whether interpolation was used for threshold crossings

Returns:

Boolean array indicating which events are too close to the previous event

Return type:

np.ndarray

Notes

Implements the independence criterion for extreme events:

  • Events separated by less than minimum_interarrival_time are considered part of the same storm system

  • Typically set based on decorrelation time scale of the process

The interpolation flag adjusts index selection:

  • True: Uses exact crossing times (cycles_end - 1)

  • False: Uses data sample indices (cycles_start - 1, cycles_end)

Examples

>>> minimum_gap = pd.Timedelta(hours=72)
>>> near = near_events(data, starts, ends, minimum_gap, interpolation=False)
>>> # Merge events where near is True

References