1. Explain “Bayes Theorem” as you have understood in your words (2 Marks).
2. Elaborate the operation of the “Naïve Bayes classifier”, considering any dataset available (4 Marks).
3. What are the other types of “Naïve Bayes classifier” and how do you decide what to use? (6 Marks)
4. Download the “SPAM.CSV” dataset from Kaggle and create a SPAM filter using “Naïve Bayers classifier”. Support your answer is all required step, explanations and associated screenshots (8 Marks).
Answers
Explanation:
1. Bayes' theorem thus gives the probability of an event based on new information that is, or may be related, to that event. The formula can also be used to see how the probability of an event occurring is affected by hypothetical new information, supposing the new information will turn out to be true.
2.Here’s a situation you’ve got into in your data science project:
You are working on a classification problem and have generated your set of hypothesis, created features and discussed the importance of variables. Within an hour, stakeholders want to see the first cut of the model.
What will you do? You have hundreds of thousands of data points and quite a few variables in your training data set. In such a situation, if I were in your place, I would have used ‘Naive Bayes‘, which can be extremely fast relative to other classification algorithms. It works on Bayes theorem of probability to predict the class of unknown data sets.
3.There are three types of Naive Bayes model under the scikit-learn library:
Gaussian: It is used in classification and it assumes that features follow a normal distribution.
Multinomial: It is used for discrete counts. ...
Bernoulli: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones).
4. By far, we have developed many machine learning models, generated numeric predictions on the testing data, and tested the results. And we did everything offline. In reality, generating predictions is only part of a machine learning project, although it is the most important part in my opinion.
Considering a system using machine learning to detect spam SMS text messages. Our ML systems workflow is like this: Train offline -> Make model available as a service -> Predict online.
A classifier is trained offline with spam and non-spam messages.The trained model is deployed as a service to serve users.
Figure 1
When we develop a machine learning model, we need to think about how to deploy it, that is, how to make this model available to other users.
Kaggle and Data science bootcamps are great for learning how to build and optimize models, but they don’t teach engineers how to take them to the next step, where there’s a major difference between building a model, and actually getting it ready for people to use in their products and services.
In this article, we will focus on both: building a machine learning model for spam SMS message classification, then create an API for the model, using Flask, the Python micro framework for building web applications.This API allows us to utilize the predictive capabilities through HTTP requests. Let’s get started!