top of page

Maximizing CTR in eCommerce with Explainable AI and Machine Learning

Updated: May 13

In this blogpost, we will review a CTR (Clickthrough rate) use case by exploring a publicly available dataset from Facebook Ads (more details below). The purpose of the use case is to illustrate why explainable AI methods are not just "nice to have" charts in some reports.

What is Explainable AI?


Think of explainable AI (XAI) as AI with a "show your work" feature. Imagine when you ask a friend why they made a certain decision, and they give you a clear explanation that you can understand.

With XAI, AI systems are designed to explain their decisions in a way that makes sense to humans. Instead of just getting a prediction or result, we also get insights into why the AI made that choice. It's a conversation with the AI to understand its reasoning, which is pretty cool and helpful, especially in important areas such as healthcare or credit risk, or many other domains of high risk. XAI is employed and demanded in many cases for a variety of reasons.

Over the recent years, there has been a plethora of proposed methods in the scientific literature that tries to delve into the explanations of models to make the decision-making process transparent. Transparency in such processes is mandatory and since 2023 it is required by law:
Article 13 of the EU AI Act provides the requirement of transparency and provision of information for high-risk AI systems, according to which “high-risk AI systems shall be designed and developed in such a way to ensure that their operation is sufficiently transparent to enable providers and users to reasonably understand the system’s functioning.

More often than not, the constant demand for better results can lead to very complex AI models. From the one hand, having "better" results (by better, I imply better based on a specific KPI) may lead to more sales or better customer satisfaction and so on. On the other hand, this increasing complexity creates the so-called "black-box" models which are so complex that not even humans can really understand how these models come up with their predictions. For example, deep neural networks that contain million to billions of parameters can be considered as black box models.

We can separate the XAI methods as follows:


Explainable AI Categorization
Explainable AI Categorization


Local XAI methods are like zooming in on a specific decision or prediction made by an AI system. Imagine putting a magnifying glass on that one particular instance and asking the AI, "Hey, why did you do this?" These methods focus on explaining individual outputs or decisions, giving you insights into why the AI made that specific choice. It's helpful when you want to understand the reasoning behind a particular prediction or decision without diving into the entire AI model's complexity. Think of it as getting a detailed explanation for a single action rather than the whole process.

On the other hand, global XAI methods are the complete opposite. These methods are stepping back and look at the big picture of how the AI system works overall. Instead of focusing on just one decision, global methods aim to explain the entire AI model's behavior or patterns. These methods are useful for understanding the overall logic and trends in the AI's decision-making process, helping you grasp its general behavior and performance.

We can separate XAI methods by model-specific and model agnostic. Model-specific methods are designed to work with a particular type of AI model, such as neural networks or decision trees, providing detailed insights into its unique structure and parameters. In contrast, model-agnostic XAI methods offer more universal explanations that can be applied to any AI model, focusing on general patterns and insights across different architectures, making them versatile tools for understanding the decision-making process of various types of AI models.

In this post, we will review 2 well-known techniques called SHAP and LIME.

1. SHAP (SHapley Additive exPlanations) falls under the category of model-agnostic XAI methods. It is designed to provide explanations for the output of any machine learning model by computing Shapley values, which are a concept from cooperative game theory. SHAP can be applied to a wide range of models, including but not limited to linear models, tree-based models (such as decision trees and random forests), support vector machines (SVMs), and deep learning models (like neural networks).

2. LIME (Local Interpretable Model-agnostic Explanations) is also categorized as a model-agnostic XAI method. LIME works by approximating the behavior of a black-box model locally around a specific instance of interest by training an interpretable model, such as a linear regression or decision tree, on perturbed samples generated around that instance. This approach allows LIME to provide explanations for individual predictions of any machine learning model, regardless of its underlying architecture or complexity, making it a versatile tool for local interpretability.

 

Usecase: Increase CTR using actionable insights from XAI methods


In this use case, we don't use XAI for just understanding the model's behavior. We aim to use the explanations to make the advertisments better and increase their CTR. Lets introduce first the dataset. The dataset is a collection of 3.5K social media ads (Facebook) released by the US House of Representatives in May 2018. Table 1 describes the features and Table 2 descibes the meta-features.


Table 1

Name

Type

Example

AdText

Unstructured Text

It is an American history. African-American citizens sat behind signs like these on city buses.

Clicks

Number

32

Impressions

Number

321

Age

Text

18 - 65+, 20 - 45

CreationDate

Date

06:16:15 08:20:31 AM

EndDate

Date

06:17:15 08:20:30 AM

Behaviors

Text

New smartphone and tablet users, Multicultural Affinity

AdSpend

Number

599

ExcludedConnections

Text

Exclude people who like Memopolis

The dataset contains other features as well but they were not used in this analysis. Meta data were generated based on the Table 1 and are described in Table 2.

Table 2

Name

Based On

Type

Example

Description

Days

CreationDate & EndDate

Number

4

Number of days the ad was visible

total_word_count

AdText

Number

12

Number of total words

capital_word_count

AdText

Number

2

Number of capitalized words

noun_count

AdText

Number

2

Using POS Tagger, get #NOUNS

verbs_count

AdText

Number

1

Using POS Tagger, get #VERBS

sent_class: pos/neg/neu

AdText

Hot encoding

[1 0 0]

Using transformer, get sentiment

question_count

AdText

Number

1

Number of "?"

exclamation_count

AdText

Number

1

Number of "!"

behaviours_cnt

Behaviors

Number

3

Number of different behaviors

exclude_cnt

ExcludedConnections

Number

1

Number of different excluded groups

min_age/max_age

Age

Number

15, 55

Number of min/max age

For POS Tagger, spacy model was employed, while for sentiment annotation hugging-face transformer was employed. Data also contained variations of the same ad using sligthly different text. These ads were merged together after been detected by preprocessing the AdText field, which resulted in a dataset of 2.3K samples.

The AdSpend feature was removed after the initial analysis for two reasons: 1) money spent on the ad was dominating the global as well as the local impact in the decision making. This makes sense of course, since more money means more visibility, more exposure and so on. 2) The actionable suggestions should not affect the budget of the customer rather improving the ad itself.

 

Objective


Employ a classifier to assess the effectiveness of advertisements by distinguishing between high-quality and low-quality ones. When an advertisement is identified as low-quality, utilize explainable AI techniques to propose actionable recommendations for improving it into a high-quality ad. Quality is defined based on a threshold of 50 clicks, where ads receiving fewer than 50 clicks are categorized as low-quality (class label 1), and those exceeding 50 clicks are considered high-quality (class label 0). This categorization has resulted in a dataset comprising 711 low-quality ads and 1622 high-quality ads. The concept involves employing the classifier as a referee to assess the qualitative aspects of an advertisement and determine the degree of quality, utilizing the predicted probabilities as a measure.


For base model, CatBoostClassifier was employed and fitted on the whole dataset. After training the model had good performance overall, scoring around 90% in ROC AUC.
Model's performance in terms of AUC ROC
Model's performance in terms of AUC ROC

CatBoostClassifier is an ensemble method that uses trees as base learners. Therefore, it is easy to extract what are the most influential features that the "model itself" assumes as most important. Let's see which are the top 5 below.


Model's feature importance
Model's feature importance

So, total word count, days, capital word count, noun count and verb count seem to be the top 5 most important features based on the classifier iteself. Now lets see what SHAP figures out by testing the model.

SHAP values of the model's features
SHAP values of the model's features

As we can see, 3 out of 5 features are intersecting. SHAP excluded noun and verb count and used min and max age instead.

Now lets take a single sample which belongs to the poor quality class and is predicted as poor quality. The example is drawn based on its predictive probability e.g., high predicted probability means that the model is very confident of the sample's poor quality. For single predictions and explantions we use LIME. The instance which is selected was predicted with 85% probability being (correctly) classified as of poor quality. LIME scores are shown below.


Explanations for the predicted sample using LIME
Explanations for the predicted sample using LIME

What this figure says is practically that the total amount of words, min age and the non count are playing a positive role for the instance to be labeled as poor while days and end age push the instance to the good quality class. By taking a closer look, we can already see come actionable suggestions.


The sample is labeled as poor (label 1) and we can see that the total word is already too high. Lets start reducing this feature and observe how the model will respond w.r.t the poor class.

Probability shift w.r.t the class due to varying number of words
Probability shift w.r.t the class due to varying number of words

As expected, by reducing only the number of total words, the model's confidence is reduced up to the point (60 words) that the class label changes (43% probability to be labeled as poor). So, recommending the reduction of words to the seller is an easy and inexpensive solution to turn the ad from poor to good quality. Although this process is simple, it is also quite naive. Now, lets consider instead of having 1 tunable feature that we set the tunable features from the XAI as inputs to an AutoML optimization solver (bayesian optimization) and let it decide which is the optimum combination that minimizes the confidence to the poor class. By doing this, we also control which feature is most appropriate for the user to change e.g., the seller does not want to change the geographical region but they can change the age group. The results are shown below.

Now the results look much better! From 85% the probability was reduced to 16% for the poor quality (label 1) class. The seach optimization was able to find the combination of perturbations that provides the best suggestion.

LLMs to the rescue!

Now that we know how many words are needed for the model to change its decision, the next question is: can we provide automatically suggestions of alternative text that fits our criteria?

Well the answer is yes obviously! LLMs shuch as ChatGPT, LLama and others can be employed to serve our need.


There is a plethora of open source models that can be used to assist us. We just need to provide the right template based on the type of suggestions we get from the XAI e.g., if the most influential features are number of total words and number of verbs then the tamplate would be something like:


Act as a advertisement expert. Use 20 words where 7 are verbs to reconstruct the following advertisment to an improved version: "African-American soldiers played a decisive role in the US Army on the western frontier during the Plains Wars, but it's not mentioned in our history books."

The employed open source LLM responds back the following:


"African-American soldiers were pivotal in shaping US Army history on the western frontier, yet textbooks overlook their significant contributions."

And this example illustrates a very simple case. Now by scaling the combination of XAI methods and LLMs, we can provide better suggestions to poor advertisments to our customers automatically.


In case we want to take it one step further, we can let the customer/user choose from a list of suggestions which improved advertisment serves their needs. The selection (or customer preference) can be utilized for A/B testing in order to improve the recommendations in the future.



Conclusion


In conclusion, using XAI for increasing CTR is quite easy and straight forward approach. By providing actionable suggestions to sellers, the quality of the ads can drastically improve. The process not only enhances the recommendation process but also provides valuable insights into which specific features should be modified. The results demonstrate a substantial improvement, with the probability of an ad being classified as poor quality reduced from 85% to 16%, underscoring the effectiveness of this refined approach in enhancing ad quality. Furthermore, LLMs present an exciting opportunity to automate the process, seamlessly identifying and enhancing advertisements.


** If you are interested in other ML use-cases, please contact me using the form (and also include a publicly available dataset for this case, I'm always curious to explore new problems).


Comments


Commenting has been turned off.
bottom of page