Ads Click Prediction by Classification Machine Learning

6 min readFeb 16, 2023

Online advertising has become a crucial part of digital marketing strategies. Advertisers pay for their ads to be shown to potential customers, but the success of an ad campaign depends on the users clicking on the ads. Therefore, predicting which ads are more likely to be clicked can help advertisers optimize their ad campaigns and save money.

We will use machine learning modeling to predict potential users in digital advertising on this occasion.

The business team wishes to improve their digital advertising methods in order to entice potential customers to click on a product. So that the cost incurred is not excessive.

Now, our goal is to develop a machine-learning model capable of detecting potential users who are likely to convert or be interested in an advertisement. So that we can reduce advertising costs on digital platforms.

here is the dataset to practice

Import Libraries

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from datetime import datetime as dt

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

from sklearn.metrics import accuracy_score, recall_score, precision_score
from sklearn.metrics import confusion_matrix,ConfusionMatrixDisplay

from collections import defaultdict

from warnings import filterwarnings
filterwarnings('ignore')

Load Data

# define dataframe
df = pd.read_csv('Ad Click Data.csv')
df.head()

Exploration Data Analysis

A. User Distribution

luckily the data that we will use turns out to have labels that are quite balanced so we don’t need to do further preprocessing to overcome imbalanced classes.

Daily Internet Usage Distribution

We can see the spread of daily internet usage from the EDA above (in minutes). This distribution has a few intriguing items. that consumers who use the internet infrequently have a higher chance of clicking on a product than those who use it regularly.

This would suggest that people who use the internet infrequently tend to pay closer attention to website advertisements.

Daily Time Spend Distribution

Because there is a distinct distribution of Internet use. We’re attempting to demonstrate how a user interacts with a website. It appears from the EDA above that user duration on a website and internet consumption have a similar distribution. In other words, even a brief visit to a website might yield potential users.

Internet Usage vs Time Spent on Site

After being aware that internet usage and the amount of time spent on a webpage are identical. We attempt to determine how the two features relate to the target.

According to this plot, internet usage and the amount of time spent on a website can be split into two categories: active users and non-active users.

These 2 parts may contain elements that have a significant impact on whether someone chooses to click on an advertisement or not. In contrast to inactive users, active users are more likely to dislike clicking on an advertisement, as shown in the visualization above.

In conclusion, we can optimize our advertising system for users who are not actively using the internet.

Correlation

The Pearson correlation

We can use all of the characteristics for modeling because there is no multicorrelation (correlation between variables) based on the aforementioned correlation. However, we are unable to determine the connection between the feature and the target using Pearson correlation. In order to determine the relationship between features and their targets, we will use PPS (Predictive Power Score) in the sections that follow.

Predictive Power Score (PPScore)

We will concentrate solely on the Clicked on Ad feature based on the correlation graphic created earlier using PPS. Since that variable is our intended target, we will use it.

Quite relevant characteristics to the target:

Internet usage every day,
Age,
Income Area, and
Daily Time Spent on Site This correlation graphic can provide as modeling guidance.

Data Preprocessing

In the case of data preprocessing, we need clean data so that it can be applied to several machine learning models.

The steps we need to do are:

Handle Missing Value
Extract Datetime Data
Split Targets and features
Create One-hot encoding for categorical features

## UDF for Feature Extraction
def extract_day_of_week(time):
    return dt.strptime(time,'%m/%d/%Y %H:%M').weekday()

def extract_day_of_month(time):
    return dt.strptime(time,'%m/%d/%Y %H:%M').day

def extract_month(time):
    return dt.strptime(time,'%m/%d/%Y %H:%M').month

Handle Missing Value

df['Daily Time Spent on Site'].fillna(df['Daily Time Spent on Site'].mean(),inplace=True)
df['Area Income'].fillna(df['Area Income'].mean(),inplace=True)
df['Daily Internet Usage'].fillna(df['Daily Internet Usage'].mean(),inplace=True)
df['Gender'].fillna(df['Gender'].mode()[0],inplace=True)

Extract Datetime Data

df['day_of_week'] = df['Timestamp'].apply(extract_day_of_week)
df['day_of_month'] = df['Timestamp'].apply(extract_day_of_month)
df['month'] = df['Timestamp'].apply(extract_month)

df = df.drop(labels=['Timestamp'],axis=1)

Split Target and Features

X = df.drop(labels=['Clicked on Ad'],axis=1)
y = np.where(df['Clicked on Ad']=='No',0,1)

Get Dummies for All Categorical Features

X_dummy = pd.get_dummies(X)
X_dummy

Build Model

The modeling stage comes next, where we create a model with a high level of precision. Since the number of categories on the target we choose is balanced, we will use accuracy metrics.

The steps for modeling are as follows:

Dividing the test and train datasets
Train employs standard data (Experiment 1)
Workout with normalization (Experiment 2)

Splitting Train and Test Dataset

X_train,X_test,y_train,y_test = train_test_split(X_dummy,y,test_size = 0.3,stratify=y,random_state = 123)

print('Dimensi Train:',X_train.shape)
print('Dimensi Test:',X_test.shape)

# Dimensi Train: (700, 2214)
# Dimensi Test: (300, 2214)

## UDF for experimenting several classification models
def experiment(X_train,X_test,y_train,y_test):
    """
    This function want to do an experiment for several models.
    We just need data input

    Parameter
    ---------
    X_train = training data contains several features
    X_test = testing data contains several features
    y_train = train target
    y_test = test target
    """
    result = defaultdict(list)
    
    knn = KNeighborsClassifier()
    logreg = LogisticRegression()
    dtc = DecisionTreeClassifier()
    rf = RandomForestClassifier()
    grad = GradientBoostingClassifier()
    
    list_model = [('K-Nearest Neighbor',knn),
                  ('Logistic Regression',logreg),
                  ('Decision Tree',dtc),
                  ('Random Forest',rf),
                  ('Gradient Boosting',grad)
                 ]
    
    for model_name,model in list_model:
        start = dt.now()
        model.fit(X_train,y_train)
        duration = (dt.now()-start).total_seconds()
        
        y_pred = model.predict(X_test)
        
        accuracy = accuracy_score(y_test,y_pred)
        recall = recall_score(y_test,y_pred)
        precision = precision_score(y_test,y_pred)
        
        result['model_name'].append(model_name)
        result['model'].append(model)
        result['accuracy'].append(accuracy)
        result['recall'].append(recall)
        result['precision'].append(precision)
        result['duration'].append(duration)
        
    return result

First Experiment

The outcome of modeling with default data is as follows (simple preprocessing). The decision tree classifier has the highest accuracy, according to the modeling findings. The random forest, however, is predicated on the highest level of precision. The accuracy obtained from some models, such as logistic regression and k-nearest neighbor, is not very excellent.

Second Experiment

from sklearn.preprocessing import MinMaxScaler
minmax_scaler = MinMaxScaler()
X_train_minmax = minmax_scaler.fit_transform(X_train)
X_test_minmax = minmax_scaler.transform(X_test)

However, based on this approach, we will select a random forest as the best model because it has the most accuracy and precision. After applying the min max scaler, we see considerable gains in numerous models.

Evaluation

final_model = result2['model'][3]
y_pred = final_model.predict(X_test_minmax)

#-------------------------------------------------

cm = confusion_matrix(y_test,y_pred)

disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=final_model.classes_)
plt.figure(figsize=(13,10))
disp.plot()
plt.show()

We wish to analyze our model’s performance in depth using the random forest model as a foundation.

The random forest generates a very effective confusion matrix.

We can observe that there is very little prediction error (purple cells). The accuracy, precision, and recall will be good with the findings that follow.

For complete code, you can visit my GitHub here.