Temporal Module

Temporal module for time series analysis and simulation.

This module provides comprehensive tools for statistical characterization and simulation of environmental time series, including: - Marginal and joint distribution fitting - Non-stationary analysis - Extreme value analysis (POT and annual maxima) - Time series simulation - Classification and regime analysis

The subpackage temporal package aimed at providing users with a friendly, general code to statistically characterize a vector random process (RP) to obtain realizations of it. It is implemented in Python - an interpreted, high-level, object-oriented programming language widely used in the scientific community - and it makes the most of the Python packages ecosystem. Among the existing Python packages, it uses Numpy, which is the fundamental package for scientific computing in Python [1], SciPy, which offers a wide range of optimization and statistics routines [2]. Pandas [3] to analyse and manipulate data.

The tools implemented in the package named temporal allow to capture the statistical properties of a non stationary (NS) vector RP by using compound or piecewise parametric PMs to properly describe all the range of values and to simulate uni- or multivariate time series with the same random behavior. The statistical parameters of the distributions are assumed to depend on time and are expanded into a Generalized Fourier Series (GFS) [4] in order to reproduce their NS behavior. The applicability of the present approach has been illustrated in several works with different purposes, among others: (i) the observed wave climate variability in the preceding century and expected changes in projections under a climate change scenario [5]; (ii) the optimal design and management of an oscillating water column system [6] [7], (iii) the planning of maintenance strategies of coastal structures [8], (iv) the analysis of monthly Wolf sunspot number over a 22 year period [4], and (v) the simulation of estuarine water conditions for the management of the estuary [9].

Analysis Functions

Marginal Fitting

`fit_marginal_distribution`(df, parameters[, ...])	Fits a stationary (or not), simple or mixed probability model to data.
`check_marginal_params`(param)	Checks the input parameters and includes some required arguments for the computation
`add_noise_to_array`(data, variables[, ...])	Adds small random noise to the selected variable(s) in a time series for better estimations.
`look_models`(data, variable[, percentiles, ...])	Fit multiple probability models to data and rank by estimation quality.

Storm Analysis

`storm_series`(data, cols, info)	Extract storm events from time series data using threshold-based criteria.
`storm_properties`(data, cols, info)	Extract and characterize individual storm event properties.

Dependencies

`dependencies`(df, param)	Fit temporal dependency structure using Vector Autoregression (VAR) model.
`check_dependencies_params`(param)	Checks the input parameters and includes some required arguments for the computation of multivariate dependencies
`fit_var_model`(data, order)	Computes the coefficients of the VAR(p) model and chooses the model with lowest BIC.
`varfit_OLS`(y, z)	Estimates the parameters of VAR using the RMSE described in Lutkepohl (ecs.
`ensemble_dt`(models[, percentiles])	Compute ensemble of temporal dependency parameters from multiple models.

Indicators and Confidence

`iso_indicators`(indicators, reference, variable)	Compute indicators from iso-probability lines of non-stationary CDF.
`confidence_bands`(rmse, n, confidence_level)	Calculate confidence bands for model predictions using RMSE.
`generate_outputfilename`(parameters)	_summary_

Statistical Fitting Functions (Core)

Stationary Analysis

`stationary_analysis`(df, param)	Fit stationary probability distribution models to time series data.
`fit_distribution`(data, bins, model)	Fit a probability distribution and compute goodness-of-fit metric.

Non-Stationary Analysis

`nonstationary_analysis`(df, param)	Perform non-stationary probability analysis with time-varying parameters.
`nonst_fit`(df, param)	Fits a non-stationary probability model
`fourier_initialization`(df, param)	Initialize Fourier expansion parameters using moving window analysis.
`fourier_expansion`(data, par, param)	Estimate time-varying distribution parameters using Fourier expansion.
`initial_params`(param, par, pos, imode, comp)	Prepares the parameters from previous fit to the following
`matching_lower_bound`(par)	Matching conditions between two probability models (PMs).

Distribution Functions

`fit`(df_, param_, par0, mode_, ref)	Fits the data to the given probability model
`negative_log_likelihood`(par, df, imod, ...)	Compute negative log-likelihood for time-varying distribution parameters.
`get_params`(df, param, par, imod, t_expans)	Gets the parameters of the probability models for fitting
`ppf`(df, param)	Computes the inverse of the probability function
`cdf`(df, param[, ppf])	Computes the cumulative distribution function
`transform`(data, params)	Apply power transformation to normalize data distribution.
`inverse_transform`(data, params[, ensemble])	Reverse power transformation to original data scale.
`numerical_cdf_pdf_at_n`(n, param, variable[, ...])	Compute the cdf and pdf numerically at a given time "n" (normalized year)
`ensemble_cdf`(data, param, variable[, nodes])	Computes the ensemble of the cumulative distribution function for several models
`ensemble_ppf`(df, param, variable[, nodes])	Computes the inverse of the cumulative distribution function
`params_t_expansion`(mod, param, nper)	Computes the oscillatory dependency in time of the parameters

Simulation Functions

`do_simulations`(param[, tidalConstituents, ...])	Simulates time series with the probability model given in param and the temporal dependency given in df_dt.
`check_parameters`(param)	_summary_ TODO
`var_simulation`(par, lsim, distribution)	Creates new normalized multivariate time series
`class_seasons`(date[, type_])	Obtains the season of any date

Regime Analysis Functions

Confidence Intervals and Bootstrapping

`confidence_intervals`(boot, alpha, resample, ...)	Calculate confidence intervals using percentile or BCa bootstrap methods.
`bootstrapping`(peaks, tri, method, func, ...)	Perform bootstrap resampling for return period estimation.

Probability Models

`probability_model_fit`(data, tr, method, ...)	Estimate probability distribution parameters and return period values.
`l_mom`(data, func)	Estimate distribution parameters using L-moments method.
`lmom_genpareto`(data, tr, nyears)	Fit Generalized Pareto Distribution using L-moments with diagnostic metrics.

Extreme Value Analysis

`pot_method`(df, var_, window_size[, alpha, ...])	Perform Peaks Over Threshold (POT) analysis with multiple thresholds.
`au2`(par, data)	Calculate Anderson-Darling test statistic for GPD goodness-of-fit.
`fit_gpd_bootstrap`(df_eventos, alpha, umb, ...)	Compute GPD parameters with bootstrap resampling and confidence intervals.
`annual_maxima_method`(df, alpha, method, ...)	Fit GEV or Gumbel distribution to annual maxima with bootstrap CI.

Classification Functions

Storm Classification

`class_storm_seasons`(df_vars_ciclos[, type_])	Splits the data into seasons.
`classification`(cases, cases_sha, data, ...)	Classifies data using various machine learning classifiers.

MDA and Reconstruction

`maximum_dissimilarity_algorithm`(data, ...[, ...])	Implements the Maximum Dissimilarity Algorithm (Camus et al. 2011).
`reconstruction`(cases_deep, data_deep, ...[, ...])	Reconstructs deep water variables from shallow water data using regression methods.
`regression`(base_train, target_train, base_pred)	Performs regression using various interpolation and machine learning methods.
`normalize`(data, variables[, circular])	Normalizes data using the maximum distance between values

Copula Analysis

Bivariate Dependence Modeling

class environmentaltools.temporal.Copula(X, Y, family, *args)[source]

Bases: object

Copula-based bivariate dependence modeling.

Estimates copula parameters for joint random variables and generates correlated samples. Supports Clayton, Frank, and Gumbel copula families.

Parameters:

X (np.ndarray) – Data of the first variable (1-dimensional)
Y (np.ndarray) – Data of the second variable (1-dimensional, same size as X)
family (str) – Copula family: ‘clayton’, ‘frank’, or ‘gumbel’
*args (tuple, optional) – Theoretical marginal distributions: (F1, p1, F2, p2) where F1, F2 are scipy.stats distributions and p1, p2 are parameter tuples

X

First variable data

Type:: np.ndarray

Y

Second variable data

Type:: np.ndarray

family

Selected copula family

Type:: str

theta

Estimated copula parameter

Type:: float

tau

Kendall’s tau correlation coefficient

Type:: float

pr

Pearson correlation coefficient and p-value

Type:: tuple

sr

Spearman correlation coefficient and p-value

Type:: tuple

U

Generated uniform samples for first variable

Type:: np.ndarray or None

V

Generated uniform samples for second variable

Type:: np.ndarray or None

X1

Generated samples in original scale for first variable

Type:: np.ndarray

Y1

Generated samples in original scale for second variable

Type:: np.ndarray

Notes

Based on the ambhas 0.4.0 module by Sat Kumar Tomer. Original: http://civil.iisc.ernet.in/~satkumar/

Modifications include: - Conditional copulas for all families - Replaced statistics package with sklearn.neighbors.KernelDensity

The copula parameter theta is estimated from Kendall’s tau: - Clayton: theta = 2*tau / (1 - tau) - Frank: theta optimized via Debye function - Gumbel: theta = 1 / (1 - tau)

Examples

>>> from scipy.stats import norm, lognorm
>>> x = np.random.normal(0, 1, 1000)
>>> y = 2*x + np.random.normal(0, 0.5, 1000)
>>> cop = Copula(x, y, 'clayton')
>>> cop.generate_xy(500)
>>> print(cop.theta, cop.tau)

References

Nelsen, R.B. (2006). An Introduction to Copulas. Springer.

Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall.

estimate(data=None)[source]

Estimate ensemble statistics for Y conditioned on X values.

Parameters:

data (np.ndarray, optional) – X values at which to estimate Y statistics. If None, uses self.X.

Returns:

Y1_mean (np.ndarray) – Mean of Y for each X value
Y1_std (np.ndarray) – Standard deviation of Y for each X value
Y1_ll (np.ndarray) – 25th percentile (lower quartile) of Y for each X value
Y1_ul (np.ndarray) – 75th percentile (upper quartile) of Y for each X value

Notes

The function: 1. Generates copula ensemble if not already available (10000 samples) 2. Bins X1 samples into 50 bins 3. Computes statistics within each bin 4. Interpolates statistics linearly to provided data points

Useful for uncertainty quantification and conditional predictions.

Examples

>>> cop = Copula(x, y, 'frank')
>>> x_new = np.linspace(x.min(), x.max(), 100)
>>> y_mean, y_std, y_ll, y_ul = cop.estimate(x_new)
>>> plt.plot(x_new, y_mean)
>>> plt.fill_between(x_new, y_ll, y_ul, alpha=0.3)

generate_C(u, v)[source]

Compute copula values C(u, v) on a grid.

Parameters:

u (np.ndarray) – Grid points for first variable (uniform scale)
v (np.ndarray) – Grid points for second variable (uniform scale)

Notes

Computes the copula CDF C(u,v) = P(U ≤ u, V ≤ v) on a meshgrid using family-specific formulas:

Clayton: C(u,v) = (u^(-theta) + v^(-theta) - 1)^(-1/theta)

Frank: C(u,v) = -1/theta * log(1 + (exp(-theta*u)-1)*(exp(-theta*v)-1)/(exp(-theta)-1))

Gumbel: C(u,v) = exp(-((−log u)^theta + (−log v)^theta)^(1/theta))

Results stored in self.C as a 2D array.

Examples

>>> u = np.linspace(0.01, 0.99, 50)
>>> v = np.linspace(0.01, 0.99, 50)
>>> cop.generate_C(u, v)
>>> plt.contourf(u, v, cop.C)

generate_cond()[source]

Generate conditional samples V given U using the conditional copula.

Generates V from the conditional distribution C(V|U) where U is provided in self.U. Uses the conditional copula: C(v|u) = ∂C(u,v)/∂u

Raises:: ValueError – For Clayton: if theta <= -1 or theta == 0 For Frank: if theta == 0 For Gumbel: if theta <= 1

Notes

Conditional generation formulas:

Clayton:

V = U^(-theta) * (U^(-theta) + W^(-theta) - 1)^(-1/theta) /: (U * (U^(-theta) + W^(-theta) - 1))

Frank:

Uses exponential transformation of conditional inverse

Gumbel:

Complex expression involving logarithms and power transformations

Results stored in self.V. Requires self.U to be set beforehand.

Examples

>>> cop.U = np.array([0.2, 0.5, 0.8])
>>> cop.generate_cond()
>>> print(cop.V)

generate_uv(n=1000)[source]

Generate random uniform variables (U, V) from the copula.

Parameters:: n (int, optional) – Number of copula samples to generate. Default is 1000.
Raises:: ValueError – For Clayton: if theta <= -1 or theta == 0 For Frank: if theta == 0 For Gumbel: if theta <= 1

Notes

Generation algorithms by copula family:

Clayton: Uses conditional distribution method - U ~ Uniform(0,1) - V = U * (W^(-theta/(1+theta)) - 1 + U^theta)^(-1/theta)

Frank: Uses conditional inverse method - Solves implicit equation involving exponentials

Gumbel: Uses Marshall-Olkin algorithm - Generates stable distribution random variables - Uses rejection sampling for complex dependence structure

Results stored in self.U and self.V attributes.

Examples

>>> cop = Copula(x, y, 'gumbel')
>>> cop.generate_uv(500)
>>> print(cop.U[:5], cop.V[:5])

generate_xy(n=0)[source]

Generate random samples (X, Y) in original scale from the copula.

Parameters:: n (int, optional) – Number of samples to generate. If 0 and U is not None, generates conditional samples. Default is 0.

Notes

The function:

Generates (U, V) from copula if not already generated
Transforms to original scale using:
- Theoretical marginals (if provided in __init__)
- Empirical inverse CDFs via kernel density estimation
Stores results in self.X1 and self.Y1
Resets self.U and self.V to None

If U is already set, generates conditional V given U. Uses Epanechnikov kernel with bandwidth=0.1 for empirical CDFs.

Examples

>>> cop = Copula(x, y, 'clayton')
>>> cop.generate_xy(1000)
>>> plt.scatter(cop.X1, cop.Y1)

Utility Functions

Event Detection

environmentaltools.temporal.extreme_events(data, var_name, threshold, minimum_interarrival_time, minimum_cycle_length, interpolation=False, interpolation_method='linear', interpolation_freq='1min', truncate=False, extra_info=False)[source]

Extract storm and calm cycles from time series data using threshold-based detection.

Identifies independent extreme events (storms) and calm periods by detecting threshold crossings, merging nearby events, and filtering short-duration events. Optionally interpolates crossing times for improved temporal resolution.

Parameters:

data (pd.DataFrame) – Time series data with datetime index
var_name (str) – Name of the variable column to analyze
threshold (float) – Threshold value separating storm (cycle) and calm periods
minimum_interarrival_time (pd.Timedelta) – Minimum time between storms to ensure statistical independence
minimum_cycle_length (pd.Timedelta) – Minimum storm duration to consider it valid
interpolation (bool, default=False) – Enable interpolation to estimate precise threshold crossing times
interpolation_method (str, default='linear') – Interpolation method: ‘linear’, ‘cubic’, ‘nearest’, etc.
interpolation_freq (str, default='1min') – Frequency for interpolation grid (e.g., ‘1min’, ’10s’, ‘1h’)
truncate (bool, default=False) – Clip storm values below threshold and calm values above threshold to the threshold value
extra_info (bool, default=False) – Return additional diagnostic information

Returns:

cycles (list of pd.Series) – List of storm cycles, each as a Series with values and timestamps
calm_periods (list of pd.Series) – List of calm periods between storms
info (dict, optional) – Additional information (only if extra_info=True):
- ’data_cycles’pd.DataFrame
  Data for all storm periods
- ’data_calm_periods’pd.DataFrame
  Data for all calm periods
- ’cycles_indexes_clipped’list
  Indices where storm values were clipped (if truncate=True)
- ’calm_periods_indexes_clipped’list
  Indices where calm values were clipped (if truncate=True)
- ’interpolation_indexes’np.ndarray
  Indices of interpolated threshold crossings (if interpolation=True)

Notes

The algorithm follows these steps:

Identify initial threshold crossings
Optionally interpolate to refine crossing times
Merge nearby storms (within minimum_interarrival_time)
Remove short storms (< minimum_cycle_length)
Optionally truncate values to threshold

Storm independence is crucial for extreme value analysis. The minimum_interarrival_time should be chosen based on the decorrelation time of the process (typically 3-5 times the characteristic timescale).

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'Hs': [1.2, 2.5, 3.1, 2.8, 1.5, 0.8, 1.0, 2.9, 3.5, 2.1]
... }, index=pd.date_range('2020', periods=10, freq='6h'))
>>> cycles, calms = extreme_events(
...     df, 'Hs', threshold=2.0,
...     minimum_interarrival_time=pd.Timedelta('12h'),
...     minimum_cycle_length=pd.Timedelta('6h')
... )
>>> print(f"Found {len(cycles)} storm events")

environmentaltools.temporal.events_duration(events)[source]

Calculate duration of each event (storm or calm period).

Computes the time span of each event from its first to last timestamp.

Parameters:: events (list of pd.Series) – List of events where each element is a Series with values and datetime index representing the event time evolution
Returns:: Duration of each event (as timedelta) with the event start time as index
Return type:: pd.Series

Examples

>>> events = [
...     pd.Series([2.5, 3.0, 2.8], index=pd.date_range('2020-01-01', periods=3, freq='1h')),
...     pd.Series([2.2, 2.7], index=pd.date_range('2020-01-05', periods=2, freq='1h'))
... ]
>>> durations = events_duration(events)
>>> print(durations)

environmentaltools.temporal.values_over_threshold(data, threshold)[source]

Identify indices where data exceeds threshold using sparse arrays.

Efficiently detects contiguous blocks of values above threshold using pandas sparse arrays for memory efficiency.

Parameters:

data (pd.Series) – Time series data
threshold (float) – Threshold value for event detection

Returns:

cycles_start (np.ndarray) – Integer indices marking the start of each exceedance block
cycles_end (np.ndarray) – Integer indices marking the end (exclusive) of each exceedance block

Notes

Uses pandas sparse block arrays for efficient storage and detection of contiguous regions above threshold.

Examples

>>> data = pd.Series([1.0, 2.5, 3.0, 2.8, 1.5, 0.8, 2.9, 3.1])
>>> start, end = values_over_threshold(data, threshold=2.0)
>>> print(f"Exceedance blocks: {list(zip(start, end))}")

Interpolation

environmentaltools.temporal.interpolation_series(data, index_start, index_end)[source]

Prepare data for interpolation between cycle boundaries.

Creates a DataFrame with start and end values grouped for subsequent interpolation across threshold crossings.

Parameters:

data (pd.Series) – Time series data
index_start (array-like) – Indices just before threshold crossings
index_end (array-like) – Indices just after threshold crossings

Returns:

Combined data with ‘value’ and ‘group’ columns for interpolation

Return type:

pd.DataFrame

Notes

Groups are created to interpolate each crossing independently.

environmentaltools.temporal.interpolation_boundaries_index(data, cycles_start, cycles_end, interpolation_freq, interpolation_method, threshold)[source]

Find precise threshold crossing times through interpolation.

Interpolates between data points surrounding threshold crossings to estimate the exact time when the threshold was crossed.

Parameters:

data (pd.Series) – Time series data with datetime index
cycles_start (np.ndarray) – Indices of cycle start positions
cycles_end (np.ndarray) – Indices of cycle end positions
interpolation_freq (str) – Temporal frequency for interpolation grid (e.g., ‘1min’, ’10s’)
interpolation_method (str) – Interpolation method: ‘linear’, ‘cubic’, ‘quadratic’, etc.
threshold (float) – Threshold value to find crossing times for

Returns:

DatetimeIndex values of estimated threshold crossing times

Return type:

np.ndarray

Notes

The function: 1. Creates dense time grid at interpolation_freq around crossings 2. Interpolates data values onto this grid 3. Finds timestamps closest to threshold value

This provides sub-sample temporal resolution for event boundaries.

environmentaltools.temporal.interpolation_nearest(group, threshold)[source]

Find nearest interpolated value to threshold.

Helper function to locate the timestamp where interpolated values are closest to the threshold value.

Parameters:

group (pd.DataFrame) – DataFrame with ‘value’ column containing interpolated data
threshold (float) – Target threshold value

Returns:

Single-row DataFrame with the value closest to threshold

Return type:

pd.DataFrame

Notes

Uses absolute difference to find the nearest match. Returns the row with minimum absolute difference (value - threshold).

Event Analysis

environmentaltools.temporal.extreme_indexes(data, extreme_start, extreme_end)[source]

Create boolean mask for extreme event periods.

Generates a boolean array indicating which time steps fall within any extreme event period.

Parameters:

data (pd.Series) – Time series data (used for length reference)
extreme_start (np.ndarray) – Array of event start indices
extreme_end (np.ndarray) – Array of event end indices (exclusive)

Returns:

Boolean array where True indicates time steps within extreme events

Return type:

np.ndarray

Notes

The function: 1. Creates condition matrix checking if each index falls within any event 2. Uses broadcasting: (n_events, n_timesteps) comparison 3. Returns True if index >= start AND < end for any event

Examples

>>> data = pd.Series(range(100))
>>> starts = np.array([10, 50])
>>> ends = np.array([20, 60])
>>> mask = extreme_indexes(data, starts, ends)
>>> mask[15]  # Within first event
True
>>> mask[30]  # Between events
False

environmentaltools.temporal.near_events(data, cycles_start, cycles_end, minimum_interarrival_time, interpolation)[source]

Identify non-independent events for merging.

Determines which consecutive events are separated by less than the minimum interarrival time, indicating they should be merged into single events.

Parameters:

data (pd.Series) – Time series data with datetime index
cycles_start (np.ndarray) – Array of cycle start indices
cycles_end (np.ndarray) – Array of cycle end indices
minimum_interarrival_time (pd.Timedelta) – Minimum time gap required between independent events
interpolation (bool) – Whether interpolation was used for threshold crossings

Returns:

Boolean array indicating which events are too close to the previous event

Return type:

np.ndarray

Notes

Implements the independence criterion for extreme events:

Events separated by less than minimum_interarrival_time are considered part of the same storm system
Typically set based on decorrelation time scale of the process

The interpolation flag adjusts index selection:

True: Uses exact crossing times (cycles_end - 1)
False: Uses data sample indices (cycles_start - 1, cycles_end)

Examples

>>> minimum_gap = pd.Timedelta(hours=72)
>>> near = near_events(data, starts, ends, minimum_gap, interpolation=False)
>>> # Merge events where near is True

References

[2]

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and {van der Walt}, Stéfan J. and Brett, Matthew and Wilson, Joshua and Millman, K. Jarrod and Mayorov, Nikolay and Nelson, Andrew R. J. and Jones, Eric and Kern, Robert and Larson, Eric and Carey, C J and Polat, Ilhan and Feng, Yu and Moore, Eric W. and {VanderPlas}, Jake and Laxalde, Denis and Perktold, Josef and Cimrman, Robert and Henriksen, Ian and Quintero, E. A. and Harris, Charles R. and Archibald, Anne M. and Ribeiro, Antonio H. and Pedregosa, Fabian and {van Mulbregt}, Paul and {SciPy 1.0 Contributors} (2020). {{SciPy} 1.0: Fundamental Algorithms for Scientific Computing in Python}. Nature Methods.