Introduction to the LSTM model for time series

Ahmad Firdaus
6 min readFeb 7, 2023

--

www.freepik.com

LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) that is specifically designed to handle the problem of long-term dependencies in time series data. It is commonly used for tasks such as sentiment analysis, language translation, and stock market prediction. An LSTM model uses gate mechanisms to control the flow of information, allowing it to remember important information over a long period of time, and forget irrelevant information. This makes it well-suited for modeling complex relationships between past and present observations in a time series.

There are several different types of LSTM models that can be used for time series forecasting, each with its own strengths and limitations. Here are a few of the most commonly used LSTM models for time series forecasting:

  1. Univariate LSTM: This type of LSTM model is used for univariate time series forecasting, where only a single variable is used to predict future values.
  2. Multivariate LSTM: This type of LSTM model is used for multivariate time series forecasting, where multiple variables are used to make predictions.
  3. Encoder-Decoder LSTM: This type of LSTM model is used for sequence-to-sequence prediction problems, where the goal is to predict an output sequence given an input sequence.
  4. Stateful LSTM: This type of LSTM model is used for time series forecasting when the state of the network from one iteration to the next must be retained.
  5. Stacked LSTM: This type of LSTM model uses multiple LSTM layers stacked on top of each other to improve the model’s ability to capture complex relationships in the time series data.

The choice of which LSTM model to use for a specific time series forecasting problem depends on the characteristics of the data and the requirements of the problem at hand.

Here’s a brief description of each type of univariate LSTM model, along with its data preparation techniques:

  1. Vanilla LSTM: A Vanilla LSTM is a simple univariate LSTM model that takes in a single time series as input and outputs a prediction for the next value in the series. The goal of a Vanilla LSTM is to learn the underlying patterns in the time series data and use that information to make future predictions.
  2. Stacked LSTM: A Stacked LSTM is a type of univariate LSTM model that uses multiple LSTM layers stacked on top of each other to improve the model’s ability to capture complex relationships in the time series data. The goal of a Stacked LSTM is to capture more complex patterns in the time series data than a single-layer LSTM model.
  3. Bidirectional LSTM: A Bidirectional LSTM is a type of univariate LSTM model that takes in a time series in both forward and backward directions, allowing the model to capture dependencies in the time series data in both directions. The goal of a Bidirectional LSTM is to capture dependencies in the time series data that may not be captured by a Vanilla LSTM or a Stacked LSTM.
  4. CNN LSTM: A CNN LSTM is a type of univariate LSTM model that uses a Convolutional Neural Network (CNN) layer before the LSTM layer to automatically extract features from the time series data. The goal of a CNN LSTM is to capture patterns in the time series data at multiple scales, allowing the model to make more accurate predictions.

It’s important to note that the choice of a univariate LSTM model will depend on the characteristics of the time series data and the requirements of the forecasting problem. Each type of univariate LSTM model has its own strengths and limitations, and some models may be more suitable for certain types of time series data than others.

Here is the code example of univariate LSTM.

the Vanilla LSTM model by Jason Brownlee, PhD. from site

# univariate lstm example
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=200, verbose=0)
# demonstrate prediction
x_input = array([70, 80, 90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

Now you can change the model to other types, here is the example of Stacked LSTM.

...
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

The example of a Bidirectional LSTM.


# define model
model = Sequential()
model.add(Bidirectional(LSTM(50, activation='relu'), input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

A Convolutional Neural Network (CNN), is a form of neural network designed to handle two-dimensional image information. This type of network is known to be highly effective in automatically identifying and learning features from univariate time series data, which is one-dimensional sequence data.

The initial step involves dividing the input sequences into smaller sequences that can be processed by the CNN model. For instance, we can divide the univariate time series data into input/output pairs, where each pair has four-time steps as input and one-time step as output. These pairs can then be further split into two sub-samples, each containing two-time steps. The CNN will analyze each sub-sample of two-time steps and produce a time series of sub-sample interpretations, which will then be fed as input to the LSTM model for processing.

...
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))

Here is full example of CNN LSTM Model

# univariate cnn lstm example
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 4
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=500, verbose=0)
# demonstrate prediction
x_input = array([60, 70, 80, 90])
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)

References:

--

--

Ahmad Firdaus

Data science passionate about uncovering insights and solving complex problems. Background in mathematics from Kyushu Univ. Skilled in Python, SQL, Tableau.