Naive Bayes classifiers are a type of machine learning algorithm that can be used to make predictions based on a set of input features. They are called "naive" because they make the assumption that each input feature is independent of every other feature. Despite this simplification, naive Bayes classifiers have been found to be surprisingly effective in many real-world applications, including text classification, spam filtering, and fraud detection.

The underlying principle of the naive Bayes classifier is Bayes' theorem, which is a way of calculating the probability of an event based on prior knowledge of conditions that might be related to the event. In the case of the naive Bayes classifier, the event is the predicted class of a new input instance, and the conditions are the values of its input features.

## A classical Bayesian classification use case: spam email detection

To illustrate how the naive Bayes classifier works, let's consider a simple example of text classification. Suppose we have a collection of emails that have been labeled as either "spam" or "not spam," and we want to build a classifier that can automatically classify new emails as either spam or not spam.

To train the naive Bayes classifier, we first need to compute the probabilities of each feature given each class. For example, we might compute the probability that an email containing the word "viagra" is spam, and the probability that an email containing the word "viagra" is not spam. We can do this by counting the number of occurrences of each feature in each class, and dividing by the total number of instances in that class.

Once we have these probabilities, we can use Bayes' theorem to compute the probability of each class given a new input instance. For example, suppose we receive a new email that contains the words "cheap viagra" and "limited time offer." We can compute the probability that this email is spam by multiplying together the probabilities of each feature given the spam class, and then multiplying by the prior probability of the spam class. We can do the same calculation for the not spam class, and then compare the two probabilities to make a prediction.

### Reminder: the Bayes theorem:

The Bayes theorem is as follows: P(A|B).P(B) = P(B|A).P(A), where P(A|B) reads "Probability of A knowing B".

The Bayes theorem can also be rewriten as:

In real life, the Bayes theorem can be applied to situations in which there are several prediciting features, instead of a single feature B.

Let

be the "feature vector"). Under the **assumption that all features are independant from each other**, the multi-dimension Bayes theorem is:

## How the Bayesian's classifier math works:

Here's a simple example of how this calculation might work in practice. We will show how to build a naïve Bayes classifier which takes into account two features (the occurences of two different keywords) to make classification predictions.

Suppose we have a training set of 100 emails, 40 of which are spam and 60 of which are not spam. We also have a vocabulary of 1,000 words that we will use as our input features.

To train the naive Bayes classifier, we first count the number of occurrences of each word in each class. For example, we might find that the word "viagra" occurs 20 times in the spam emails and 5 times in the not spam emails. Likewise, the word "cheap" occurs 2 times in the spam emails, and 1 time in the not spam emails.

The quantitative details are provided below:

We can then compute the probabilities of each word given each class by dividing these counts by the total number of instances in each class.

Next, suppose we receive a new email that contains the words "cheap viagra" and "limited time offer." We can compute the probability that this email is spam by multiplying the probabilities of each word given the spam class, and then multiplying by the prior probability of the spam class. We can do the same calculation for the not spam class, and then compare the two probabilities to make a prediction.

In this case, the mail contains both "Viagra" and "Cheap", hence both features of the features vector are true.

Applying Bayes formula, the partial probability for the mail to be spam, knowing "viagra" and "cheap" is:

Using the numeric values:

Likewise, we can compute the value of the mail **not being a spam** knowing "Viagra" and "Cheap". In this case, the value is 0.11.

The final probability for the mail to be a spam, knowing "Viagra" and "Cheap" is then:

We can compute the probabilities of an email being a spam for all four combinations of features "Viagra" and "Cheap":

The results above lead us to the following comments:

Adding an additional keyword ("cheap") to the first keyword ("viagra"), increases the predictor's level of information or accuracy. The probability of an email being a spam increases from 80% to 92.31% by adding the "Cheap" keyword condition.

Likewise, the absence of both keywords decreases the probability, albeit marginaly, of the email being a spam further from 26,67% down to 26%.

## Bayesian classifiers are good enough for many real world use cases

This example shows that bayesian classifiers are quite simple and good enough tools for real life cases.

Basedig provides services for data analysis and classification. Do not hesitate to contact us if you have applications to be addressed.

## Comments