Time Series Analysis and Model Selection for 3 ISOs using RAVEN

Team: Manjur Raj Basnet, Jacob Bryan, Hailei Wang

Introduction

Project goal and objectives

  • Understanding the historical data trends
  • Selection of appropriate model parameters capable of generating synthetic time series close to the actual historical data

Significance

  • Synthetic histories are a good representation of data sets since they produce independent and identically distributed samples with ease of control and scalability
  • A single model is capable of generating accurate synthetic histories rather than an individual selection of parameters for each model

The study is conducted for 3 different Independent System Operators (ISOs): CAISO (California), ERCOT (Texas), MISO (Mid-Continent). Fourier and ARMA (FARMA) has been used to produce synthetic time series The entire analysis is conducted in Risk Analysis Virtual Environment (RAVEN). Wasserstein distance has been used as a measure for the model selection (P, Q, Pivot length and number of clusters)

Important Terminologies

RAVEN

RAVEN is a software tool currently under development at INL used for the purpose of uncertainty quantification, model optimization and other activities In the given work, RAVEN is used to generate the time series model Some of the important RAVEN parameters are described below

Pivot

Length Pivot length is a RAVEN feature that allows segmentation of ARMA ROM (Reduced Order Model) For example, if 8760 data sets are taken based on annual hourly data, a pivot length value of 24 would mean that each segment represents data based on a 24 hour period

Number of clusters

K–means clustering is used for the data clustering Based on the value of the number of clusters, the segments are grouped together into cluster with the nearest mean

ARMA order P and Q

Appropriate selection of P and Q help in data fitting P is an autoregressive parameter whereas Q is a moving average parameter

Fourier and ARMA (FARMA)

Fourier analysis is required to capture the seasonal data trends which is then followed by ARMA to characterize the stationary processes by imposing a linear dependence among the variables and a series of white noises.

Methodology

The study consisted of multiple phases from data collection to data processing and analysis.

Data Collection and Processing

Data collection was carried out using Python scripts which used API keys to fetch data from the websites Price, demand and solar/wind data were obtained in terms of $$/ MWh and capacity factors, respectively Different parameter combination values used have been tabulated below:

Parameter Values
P [1,2,3]
Q [0,1,2,3]
Pivot Length [24, 146, 365]
Number of Clusters [2, 4, 8, 12, 16, 20]

The Python script was run separately for the 3 ISOs which took an average of about 6 hours for each ISO

Wasserstein Distance

Wasserstein Distance is a metric that measures the minimum cost of transforming one probability distribution into another In the given study, the optimal transport phenomenon is applied such that the actual data is compared with the synthetic data and the minimum effort to transform the synthetically generated time series to actual historical data is compared The Wasserstein Distance formula used for this study is given below

Wasserstein Distance formula

Based on the given combination values and the least Wasserstein distance values, following values for the model parameters were obtained

ISO Variable P Q N cluster Pivot Length
CAISO Price 3 2 8 365
CAISO Solar 1 1 20 146
CAISO Load 1 0 20 365
CAISO Wind 1 0 12 24
ERCOT Price 3 2 20 24
ERCOT Solar 2 1 20 146
ERCOT Load 2 0 20 365
ERCOT Wind 2 2 12 365
MISO Price 3 2 20 365
MISO Solar 3 2 16 365
MISO Load 1 3 8 146
MISO Wind 2 1 12 146

Results and Conclusions

Following parameter values were obtained as a result of normalization and results were plotted for the time series model

ISO P Q N cluster Pivot Length
CAISO 1 2 20 146
ERCOT 1 0 20 146
MISO 1 1 16 365
graphs

The model is found to capture the data dynamics and volatility Lower values of P and Q were found to work well for the model without resulting in data underfitting Thus, Wasserstein distance is a reliable metric for selection of model parameters