These kind of supervised machine learning tasks, including text classification, are perfect applications for this sort of methodology. The Naive Bayes classifier is one of these techniques. Additionally, it belongs to the family of generative learning algorithms, which aims to model the distribution of inputs known to belong to a certain class. Contrary to discriminative classifiers like logistic regression, it does not learn which characteristics are most crucial for drawing differences between classes, hence it is unable to distinguish between various types of data.

A brief overview of Bayesian statistics

Because it is based on Bayes’ Theorem, the Naive Bayes technique is often known as a probabilistic classifier. Without providing a fundamental understanding of Bayesian statistics first, it would be hard to explain this method. We may “invert” conditional probabilities using this theorem, sometimes referred to as Bayes’ Rule. If you need a refresher on conditional probabilities, they are shown by the formula below and indicate the likelihood of an event occurring provided that another event has already occurred. 

Formula for conditional probabilities with an illustration Spam transmission

By using sequential occurrences, Bayes’ Theorem differs from earlier theories of probability; the probability may vary as more information is acquired. The terms “prior probability” and “posterior probability,” respectively, are used to describe these two probabilities. The original likelihood of an occurrence before it is contextualised within a particular situation is known as the prior probability, often referred to as the marginal probability. The posterior probability calculates an event’s chance of happening after further data has been acquired. This is a part of the naive bayes algorithm.

In the literature on statistics and machine learning, the subject of medical testing is often given as an example of this idea (link goes off-site to a third party). Take Jane for example, who is interested in learning whether she has diabetes so she may begin taking preventive steps. Assume that the total population’s previous risk of acquiring diabetes is 5%. In contrast, if her test is successful, the prior probability will be revised to account for the new information and change into the posterior probability. Based on Bayes’ Theorem, the following equation may be used to describe this situation: 

Using the Naive Bayes Formula in the Real World

However, because our knowledge of prior probabilities is likely to be inaccurate owing to other factors like food, age, family history, etc., we often employ probability distributions from random samples in order to simplify the equation. 

Formula for Bayes’ Theorem

Operation at Nave Bayes has restarted. Given their varied assumptions, naive Bayes-based classifiers carry out their tasks in a unique manner. In a naive Bayes model, predictions are supposed to be conditionally independent or to have no correlations with one another. It also presumes that, in the big picture, each aspect has the same weight. Despite the fact that these presumptions are often broken in practical situations (for example, a word that follows another word in an email is reliant on the word that came before it), they make classification problems more manageable by making them more tractable computationally. In other words, the calculation of the model will be much simplified as we will only require one probability rather than separate ones for each variable. Despite its irrational assumption of independence, the classification method performs well when a small sample size is used.