Temporal Module
Temporal module for time series analysis and simulation.
This module provides comprehensive tools for statistical characterization and simulation of environmental time series, including: - Marginal and joint distribution fitting - Non-stationary analysis - Extreme value analysis (POT and annual maxima) - Time series simulation - Classification and regime analysis
The subpackage temporal package aimed at providing users with a friendly, general code to statistically characterize a vector random process (RP) to obtain realizations of it. It is implemented in Python - an interpreted, high-level, object-oriented programming language widely used in the scientific community - and it makes the most of the Python packages ecosystem. Among the existing Python packages, it uses Numpy, which is the fundamental package for scientific computing in Python [1], SciPy, which offers a wide range of optimization and statistics routines [2]. Pandas [3] to analyse and manipulate data.
The tools implemented in the package named temporal allow to capture the statistical properties of a non stationary (NS) vector RP by using compound or piecewise parametric PMs to properly describe all the range of values and to simulate uni- or multivariate time series with the same random behavior. The statistical parameters of the distributions are assumed to depend on time and are expanded into a Generalized Fourier Series (GFS) [4] in order to reproduce their NS behavior. The applicability of the present approach has been illustrated in several works with different purposes, among others: (i) the observed wave climate variability in the preceding century and expected changes in projections under a climate change scenario [5]; (ii) the optimal design and management of an oscillating water column system [6] [7], (iii) the planning of maintenance strategies of coastal structures [8], (iv) the analysis of monthly Wolf sunspot number over a 22 year period [4], and (v) the simulation of estuarine water conditions for the management of the estuary [9].
Analysis Functions
Marginal Fitting
|
Fits a stationary (or not), simple or mixed probability model to data. |
|
Checks the input parameters and includes some required arguments for the computation |
|
Adds small random noise to the selected variable(s) in a time series for better estimations. |
|
Fit multiple probability models to data and rank by estimation quality. |
Storm Analysis
|
Extract storm events from time series data using threshold-based criteria. |
|
Extract and characterize individual storm event properties. |
Dependencies
|
Fit temporal dependency structure using Vector Autoregression (VAR) model. |
|
Checks the input parameters and includes some required arguments for the computation of multivariate dependencies |
|
Computes the coefficients of the VAR(p) model and chooses the model with lowest BIC. |
|
Estimates the parameters of VAR using the RMSE described in Lutkepohl (ecs. |
|
Compute ensemble of temporal dependency parameters from multiple models. |
Indicators and Confidence
|
Compute indicators from iso-probability lines of non-stationary CDF. |
|
Calculate confidence bands for model predictions using RMSE. |
|
_summary_ |
Statistical Fitting Functions (Core)
Stationary Analysis
|
Fit stationary probability distribution models to time series data. |
|
Fit a probability distribution and compute goodness-of-fit metric. |
Non-Stationary Analysis
|
Perform non-stationary probability analysis with time-varying parameters. |
|
Fits a non-stationary probability model |
|
Initialize Fourier expansion parameters using moving window analysis. |
|
Estimate time-varying distribution parameters using Fourier expansion. |
|
Prepares the parameters from previous fit to the following |
|
Matching conditions between two probability models (PMs). |
Distribution Functions
|
Fits the data to the given probability model |
|
Compute negative log-likelihood for time-varying distribution parameters. |
|
Gets the parameters of the probability models for fitting |
|
Computes the inverse of the probability function |
|
Computes the cumulative distribution function |
|
Apply power transformation to normalize data distribution. |
|
Reverse power transformation to original data scale. |
|
Compute the cdf and pdf numerically at a given time "n" (normalized year) |
|
Computes the ensemble of the cumulative distribution function for several models |
|
Computes the inverse of the cumulative distribution function |
|
Computes the oscillatory dependency in time of the parameters |
Simulation Functions
|
Simulates time series with the probability model given in param and the temporal dependency given in df_dt. |
|
_summary_ TODO |
|
Creates new normalized multivariate time series |
|
Obtains the season of any date |
Regime Analysis Functions
Confidence Intervals and Bootstrapping
|
Calculate confidence intervals using percentile or BCa bootstrap methods. |
|
Perform bootstrap resampling for return period estimation. |
Probability Models
|
Estimate probability distribution parameters and return period values. |
|
Estimate distribution parameters using L-moments method. |
|
Fit Generalized Pareto Distribution using L-moments with diagnostic metrics. |
Extreme Value Analysis
|
Perform Peaks Over Threshold (POT) analysis with multiple thresholds. |
|
Calculate Anderson-Darling test statistic for GPD goodness-of-fit. |
|
Compute GPD parameters with bootstrap resampling and confidence intervals. |
|
Fit GEV or Gumbel distribution to annual maxima with bootstrap CI. |
Classification Functions
Storm Classification
|
Splits the data into seasons. |
|
Classifies data using various machine learning classifiers. |
MDA and Reconstruction
|
Implements the Maximum Dissimilarity Algorithm (Camus et al. 2011). |
|
Reconstructs deep water variables from shallow water data using regression methods. |
|
Performs regression using various interpolation and machine learning methods. |
|
Normalizes data using the maximum distance between values |
Copula Analysis
Bivariate Dependence Modeling
- class environmentaltools.temporal.Copula(X, Y, family, *args)[source]
Bases:
objectCopula-based bivariate dependence modeling.
Estimates copula parameters for joint random variables and generates correlated samples. Supports Clayton, Frank, and Gumbel copula families.
- Parameters:
X (np.ndarray) – Data of the first variable (1-dimensional)
Y (np.ndarray) – Data of the second variable (1-dimensional, same size as X)
family (str) – Copula family: ‘clayton’, ‘frank’, or ‘gumbel’
*args (tuple, optional) – Theoretical marginal distributions: (F1, p1, F2, p2) where F1, F2 are scipy.stats distributions and p1, p2 are parameter tuples
- X
First variable data
- Type:
np.ndarray
- Y
Second variable data
- Type:
np.ndarray
- family
Selected copula family
- Type:
str
- theta
Estimated copula parameter
- Type:
float
- tau
Kendall’s tau correlation coefficient
- Type:
float
- pr
Pearson correlation coefficient and p-value
- Type:
tuple
- sr
Spearman correlation coefficient and p-value
- Type:
tuple
- U
Generated uniform samples for first variable
- Type:
np.ndarray or None
- V
Generated uniform samples for second variable
- Type:
np.ndarray or None
- X1
Generated samples in original scale for first variable
- Type:
np.ndarray
- Y1
Generated samples in original scale for second variable
- Type:
np.ndarray
Notes
Based on the ambhas 0.4.0 module by Sat Kumar Tomer. Original: http://civil.iisc.ernet.in/~satkumar/
Modifications include: - Conditional copulas for all families - Replaced statistics package with sklearn.neighbors.KernelDensity
The copula parameter theta is estimated from Kendall’s tau: - Clayton: theta = 2*tau / (1 - tau) - Frank: theta optimized via Debye function - Gumbel: theta = 1 / (1 - tau)
Examples
>>> from scipy.stats import norm, lognorm >>> x = np.random.normal(0, 1, 1000) >>> y = 2*x + np.random.normal(0, 0.5, 1000) >>> cop = Copula(x, y, 'clayton') >>> cop.generate_xy(500) >>> print(cop.theta, cop.tau)
References
Nelsen, R.B. (2006). An Introduction to Copulas. Springer.
Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall.
- estimate(data=None)[source]
Estimate ensemble statistics for Y conditioned on X values.
- Parameters:
data (np.ndarray, optional) – X values at which to estimate Y statistics. If None, uses self.X.
- Returns:
Y1_mean (np.ndarray) – Mean of Y for each X value
Y1_std (np.ndarray) – Standard deviation of Y for each X value
Y1_ll (np.ndarray) – 25th percentile (lower quartile) of Y for each X value
Y1_ul (np.ndarray) – 75th percentile (upper quartile) of Y for each X value
Notes
The function: 1. Generates copula ensemble if not already available (10000 samples) 2. Bins X1 samples into 50 bins 3. Computes statistics within each bin 4. Interpolates statistics linearly to provided data points
Useful for uncertainty quantification and conditional predictions.
Examples
>>> cop = Copula(x, y, 'frank') >>> x_new = np.linspace(x.min(), x.max(), 100) >>> y_mean, y_std, y_ll, y_ul = cop.estimate(x_new) >>> plt.plot(x_new, y_mean) >>> plt.fill_between(x_new, y_ll, y_ul, alpha=0.3)
- generate_C(u, v)[source]
Compute copula values C(u, v) on a grid.
- Parameters:
u (np.ndarray) – Grid points for first variable (uniform scale)
v (np.ndarray) – Grid points for second variable (uniform scale)
Notes
Computes the copula CDF C(u,v) = P(U ≤ u, V ≤ v) on a meshgrid using family-specific formulas:
Clayton: C(u,v) = (u^(-theta) + v^(-theta) - 1)^(-1/theta)
Frank: C(u,v) = -1/theta * log(1 + (exp(-theta*u)-1)*(exp(-theta*v)-1)/(exp(-theta)-1))
Gumbel: C(u,v) = exp(-((−log u)^theta + (−log v)^theta)^(1/theta))
Results stored in self.C as a 2D array.
Examples
>>> u = np.linspace(0.01, 0.99, 50) >>> v = np.linspace(0.01, 0.99, 50) >>> cop.generate_C(u, v) >>> plt.contourf(u, v, cop.C)
- generate_cond()[source]
Generate conditional samples V given U using the conditional copula.
Generates V from the conditional distribution C(V|U) where U is provided in self.U. Uses the conditional copula: C(v|u) = ∂C(u,v)/∂u
- Raises:
ValueError – For Clayton: if theta <= -1 or theta == 0 For Frank: if theta == 0 For Gumbel: if theta <= 1
Notes
Conditional generation formulas:
- Clayton:
- V = U^(-theta) * (U^(-theta) + W^(-theta) - 1)^(-1/theta) /
(U * (U^(-theta) + W^(-theta) - 1))
- Frank:
Uses exponential transformation of conditional inverse
- Gumbel:
Complex expression involving logarithms and power transformations
Results stored in self.V. Requires self.U to be set beforehand.
Examples
>>> cop.U = np.array([0.2, 0.5, 0.8]) >>> cop.generate_cond() >>> print(cop.V)
- generate_uv(n=1000)[source]
Generate random uniform variables (U, V) from the copula.
- Parameters:
n (int, optional) – Number of copula samples to generate. Default is 1000.
- Raises:
ValueError – For Clayton: if theta <= -1 or theta == 0 For Frank: if theta == 0 For Gumbel: if theta <= 1
Notes
Generation algorithms by copula family:
Clayton: Uses conditional distribution method - U ~ Uniform(0,1) - V = U * (W^(-theta/(1+theta)) - 1 + U^theta)^(-1/theta)
Frank: Uses conditional inverse method - Solves implicit equation involving exponentials
Gumbel: Uses Marshall-Olkin algorithm - Generates stable distribution random variables - Uses rejection sampling for complex dependence structure
Results stored in self.U and self.V attributes.
Examples
>>> cop = Copula(x, y, 'gumbel') >>> cop.generate_uv(500) >>> print(cop.U[:5], cop.V[:5])
- generate_xy(n=0)[source]
Generate random samples (X, Y) in original scale from the copula.
- Parameters:
n (int, optional) – Number of samples to generate. If 0 and U is not None, generates conditional samples. Default is 0.
Notes
The function:
Generates (U, V) from copula if not already generated
Transforms to original scale using:
Theoretical marginals (if provided in __init__)
Empirical inverse CDFs via kernel density estimation
Stores results in self.X1 and self.Y1
Resets self.U and self.V to None
If U is already set, generates conditional V given U. Uses Epanechnikov kernel with bandwidth=0.1 for empirical CDFs.
Examples
>>> cop = Copula(x, y, 'clayton') >>> cop.generate_xy(1000) >>> plt.scatter(cop.X1, cop.Y1)
Utility Functions
Event Detection
- environmentaltools.temporal.extreme_events(data, var_name, threshold, minimum_interarrival_time, minimum_cycle_length, interpolation=False, interpolation_method='linear', interpolation_freq='1min', truncate=False, extra_info=False)[source]
Extract storm and calm cycles from time series data using threshold-based detection.
Identifies independent extreme events (storms) and calm periods by detecting threshold crossings, merging nearby events, and filtering short-duration events. Optionally interpolates crossing times for improved temporal resolution.
- Parameters:
data (pd.DataFrame) – Time series data with datetime index
var_name (str) – Name of the variable column to analyze
threshold (float) – Threshold value separating storm (cycle) and calm periods
minimum_interarrival_time (pd.Timedelta) – Minimum time between storms to ensure statistical independence
minimum_cycle_length (pd.Timedelta) – Minimum storm duration to consider it valid
interpolation (bool, default=False) – Enable interpolation to estimate precise threshold crossing times
interpolation_method (str, default='linear') – Interpolation method: ‘linear’, ‘cubic’, ‘nearest’, etc.
interpolation_freq (str, default='1min') – Frequency for interpolation grid (e.g., ‘1min’, ’10s’, ‘1h’)
truncate (bool, default=False) – Clip storm values below threshold and calm values above threshold to the threshold value
extra_info (bool, default=False) – Return additional diagnostic information
- Returns:
cycles (list of pd.Series) – List of storm cycles, each as a Series with values and timestamps
calm_periods (list of pd.Series) – List of calm periods between storms
info (dict, optional) – Additional information (only if extra_info=True):
- ’data_cycles’pd.DataFrame
Data for all storm periods
- ’data_calm_periods’pd.DataFrame
Data for all calm periods
- ’cycles_indexes_clipped’list
Indices where storm values were clipped (if truncate=True)
- ’calm_periods_indexes_clipped’list
Indices where calm values were clipped (if truncate=True)
- ’interpolation_indexes’np.ndarray
Indices of interpolated threshold crossings (if interpolation=True)
Notes
The algorithm follows these steps:
Identify initial threshold crossings
Optionally interpolate to refine crossing times
Merge nearby storms (within minimum_interarrival_time)
Remove short storms (< minimum_cycle_length)
Optionally truncate values to threshold
Storm independence is crucial for extreme value analysis. The minimum_interarrival_time should be chosen based on the decorrelation time of the process (typically 3-5 times the characteristic timescale).
Examples
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'Hs': [1.2, 2.5, 3.1, 2.8, 1.5, 0.8, 1.0, 2.9, 3.5, 2.1] ... }, index=pd.date_range('2020', periods=10, freq='6h')) >>> cycles, calms = extreme_events( ... df, 'Hs', threshold=2.0, ... minimum_interarrival_time=pd.Timedelta('12h'), ... minimum_cycle_length=pd.Timedelta('6h') ... ) >>> print(f"Found {len(cycles)} storm events")
- environmentaltools.temporal.events_duration(events)[source]
Calculate duration of each event (storm or calm period).
Computes the time span of each event from its first to last timestamp.
- Parameters:
events (list of pd.Series) – List of events where each element is a Series with values and datetime index representing the event time evolution
- Returns:
Duration of each event (as timedelta) with the event start time as index
- Return type:
pd.Series
Examples
>>> events = [ ... pd.Series([2.5, 3.0, 2.8], index=pd.date_range('2020-01-01', periods=3, freq='1h')), ... pd.Series([2.2, 2.7], index=pd.date_range('2020-01-05', periods=2, freq='1h')) ... ] >>> durations = events_duration(events) >>> print(durations)
- environmentaltools.temporal.values_over_threshold(data, threshold)[source]
Identify indices where data exceeds threshold using sparse arrays.
Efficiently detects contiguous blocks of values above threshold using pandas sparse arrays for memory efficiency.
- Parameters:
data (pd.Series) – Time series data
threshold (float) – Threshold value for event detection
- Returns:
cycles_start (np.ndarray) – Integer indices marking the start of each exceedance block
cycles_end (np.ndarray) – Integer indices marking the end (exclusive) of each exceedance block
Notes
Uses pandas sparse block arrays for efficient storage and detection of contiguous regions above threshold.
Examples
>>> data = pd.Series([1.0, 2.5, 3.0, 2.8, 1.5, 0.8, 2.9, 3.1]) >>> start, end = values_over_threshold(data, threshold=2.0) >>> print(f"Exceedance blocks: {list(zip(start, end))}")
Interpolation
- environmentaltools.temporal.interpolation_series(data, index_start, index_end)[source]
Prepare data for interpolation between cycle boundaries.
Creates a DataFrame with start and end values grouped for subsequent interpolation across threshold crossings.
- Parameters:
data (pd.Series) – Time series data
index_start (array-like) – Indices just before threshold crossings
index_end (array-like) – Indices just after threshold crossings
- Returns:
Combined data with ‘value’ and ‘group’ columns for interpolation
- Return type:
pd.DataFrame
Notes
Groups are created to interpolate each crossing independently.
- environmentaltools.temporal.interpolation_boundaries_index(data, cycles_start, cycles_end, interpolation_freq, interpolation_method, threshold)[source]
Find precise threshold crossing times through interpolation.
Interpolates between data points surrounding threshold crossings to estimate the exact time when the threshold was crossed.
- Parameters:
data (pd.Series) – Time series data with datetime index
cycles_start (np.ndarray) – Indices of cycle start positions
cycles_end (np.ndarray) – Indices of cycle end positions
interpolation_freq (str) – Temporal frequency for interpolation grid (e.g., ‘1min’, ’10s’)
interpolation_method (str) – Interpolation method: ‘linear’, ‘cubic’, ‘quadratic’, etc.
threshold (float) – Threshold value to find crossing times for
- Returns:
DatetimeIndex values of estimated threshold crossing times
- Return type:
np.ndarray
Notes
The function: 1. Creates dense time grid at interpolation_freq around crossings 2. Interpolates data values onto this grid 3. Finds timestamps closest to threshold value
This provides sub-sample temporal resolution for event boundaries.
- environmentaltools.temporal.interpolation_nearest(group, threshold)[source]
Find nearest interpolated value to threshold.
Helper function to locate the timestamp where interpolated values are closest to the threshold value.
- Parameters:
group (pd.DataFrame) – DataFrame with ‘value’ column containing interpolated data
threshold (float) – Target threshold value
- Returns:
Single-row DataFrame with the value closest to threshold
- Return type:
pd.DataFrame
Notes
Uses absolute difference to find the nearest match. Returns the row with minimum absolute difference (value - threshold).
Event Analysis
- environmentaltools.temporal.extreme_indexes(data, extreme_start, extreme_end)[source]
Create boolean mask for extreme event periods.
Generates a boolean array indicating which time steps fall within any extreme event period.
- Parameters:
data (pd.Series) – Time series data (used for length reference)
extreme_start (np.ndarray) – Array of event start indices
extreme_end (np.ndarray) – Array of event end indices (exclusive)
- Returns:
Boolean array where True indicates time steps within extreme events
- Return type:
np.ndarray
Notes
The function: 1. Creates condition matrix checking if each index falls within any event 2. Uses broadcasting: (n_events, n_timesteps) comparison 3. Returns True if index >= start AND < end for any event
Examples
>>> data = pd.Series(range(100)) >>> starts = np.array([10, 50]) >>> ends = np.array([20, 60]) >>> mask = extreme_indexes(data, starts, ends) >>> mask[15] # Within first event True >>> mask[30] # Between events False
- environmentaltools.temporal.near_events(data, cycles_start, cycles_end, minimum_interarrival_time, interpolation)[source]
Identify non-independent events for merging.
Determines which consecutive events are separated by less than the minimum interarrival time, indicating they should be merged into single events.
- Parameters:
data (pd.Series) – Time series data with datetime index
cycles_start (np.ndarray) – Array of cycle start indices
cycles_end (np.ndarray) – Array of cycle end indices
minimum_interarrival_time (pd.Timedelta) – Minimum time gap required between independent events
interpolation (bool) – Whether interpolation was used for threshold crossings
- Returns:
Boolean array indicating which events are too close to the previous event
- Return type:
np.ndarray
Notes
Implements the independence criterion for extreme events:
Events separated by less than minimum_interarrival_time are considered part of the same storm system
Typically set based on decorrelation time scale of the process
The interpolation flag adjusts index selection:
True: Uses exact crossing times (cycles_end - 1)
False: Uses data sample indices (cycles_start - 1, cycles_end)
Examples
>>> minimum_gap = pd.Timedelta(hours=72) >>> near = near_events(data, starts, ends, minimum_gap, interpolation=False) >>> # Merge events where near is True