Time Series: Air Passengers Forecasting Using LSTM Bidirectional
LSTM-Bidirectional is a type of recurrent neural network (RNN) architecture that processes sequences in both forward and backward directions. The output of the forward and backward LSTMs are then concatenated, providing the network with more context to make predictions. This bidirectional processing can improve the performance of the network, especially when working with long sequences of data.
The benefits of a bidirectional LSTM model include:
a. Improved performance in sequence prediction tasks.
b. Better understanding of contextual information in sequences.
c. Increased accuracy for tasks with long-term dependencies.
d. Better handling of sequence data with complex patterns.
e. Better ability to capture dependencies in sequences with different time scales.
Forecasting air passenger demand using a machine learning bidirectional LSTM model involves the following steps:
- Data collection: Obtain historical air passenger data, including the number of passengers for each flight.
- Data preparation: Clean and preprocess the data, such as handling missing values and transforming the data into a suitable format for modeling.
- Model selection: Select a bidirectional LSTM model as the model architecture.
- Model training: Train the model on the preprocessed data using a suitable optimization algorithm.
- Model evaluation: Evaluate the performance of the model using metrics such as mean squared error (MSE) or mean absolute error (MAE).
- Model deployment: Deploy the trained model to make predictions on new data.
- Model updating: Regularly update the model with new data to ensure that it continues to make accurate predictions.
By using a bidirectional LSTM model, it is possible to effectively capture both short-term and long-term dependencies in air passenger data and make accurate predictions of future air passenger demand.
Dataset: Kaggle dataset for Air Passengers, here is the link
This dataset provides monthly totals of US airline passengers from 1949 to 1960. This dataset is taken from an inbuilt dataset of R called Air Passengers.
Split Data & Normalization
# Test size: 12 months
test_size = 12
# Dataset is split into training and test data
train = data.iloc[:len(dataset)- test_size]
test = data.iloc[-test_size:]
# Scaling the training and test data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(train)
scaled_train = scaler.transform(train)
scaled_test = scaler.transform(test)
# Importing the keras library and the TimeseriesGenerator
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
# Training Batch length
length = 11
# No. of features
n_features=1
# Creating the time series generator
time_series_generator = TimeseriesGenerator(scaled_train, scaled_train, length=length, batch_size=1)
Training and testing a bidirectional LSTM model
# Importing the necessary libraries to create/construct the deep neural network model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,LSTM
from tensorflow.keras import initializers
import tensorflow as tf
from keras.layers import Bidirectional
# Use of random seed to get the same results at every run
tf.random.set_seed(0)
np.random.seed(0)
# Use of the he_uniform initializer to set the initial random weights of the model layers.
initializer = tf.keras.initializers.he_uniform(seed=0)
model = Sequential() # Initially, the network model is defined
# Selected Activation function has been the rectified linear unit
model.add(Bidirectional(LSTM(100, activation='relu', input_shape=(length, n_features),kernel_initializer=initializer)))
# The output layer consists of 1 neuron with a 'linear' activation fuction
model.add(Dense(1,activation='linear',kernel_initializer=initializer))
# The model is compiled using MSE as loss function and Adam as optimizer
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=opt, loss='mse')
# A validation generator is constructed in a similar way to the previous time_series_generator with the only difference being
# the use of scaled_test values for validation purposes
time_series_val_generator = TimeseriesGenerator(scaled_test,scaled_test, length=length, batch_size=1)
# The model is trained for n-epochs. At each epoch both training and validation losses can be observed/ lets see n- epochs
model.fit_generator(time_series_generator,epochs=30,shuffle=False,
validation_data=time_series_val_generator, verbose=0)
LSTM Performance on the training set:
LSTM Performance on the training set actually fits with the actual data, this is indicated that the model can learn well, let’s see how it works on the test dataset.
the prediction LSTM on the test set works well, with a mean test error distribution is 6.371454 using below calculation:
# LSTM Predictions percent Error distribution plot
test_err=abs((test['Passengers']-test['LSTM Predictions'])/test['Passengers'])*100
test_err=pd.DataFrame(test_err,columns=['Test Set Error'])
# Summary statistics of LSTM predictions percent error
test_err.describe().transpose()
here is a full picture of LSTM prediction on air passengers
For full code, you can visit the link here
References:
a. https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
b. https://towardsdatascience.com/exploring-the-lstm-neural-network-model-for-time-series-8b7685aa8cf