<summary>     Table of contents   </summary>   {: .text-delta }

  1. Links
  2. Machine Learning
  3. Steps in Machine Learning
  4. Exploratory data analysis - EDA
  5. Definitions
  6. Machine Learning algorithms
  7. Supervised Learning
    1. Regression
    2. Linear Regression
    3. Logistic Regression
  8. Unsupervised Learning
    1. Unsupervised Learning - Examples
  9. Reinforcement Learning
    1. Facebook - Horizon
  10. Machine Learning modules
    1. SciPy
    2. scikit-learn
  11. Consumer Behavior
  12. Bernoulli
    1. Example Stress
    2. Example Twitter
    3. Example Spam
      1. Test of the algorithm
  13. Links

Machine Learning

Machine learning is all around us. From antilock braking systems, to autopilot systems in airplanes and cars, smart speakers, which serve as personal digital assistants, to systems that learn our movie preferences and recommend what to watch next in Netflix.

Machine Learning is the process of creating models that can perform a certain task without the need for a human explicitly programming it to do something

Steps in Machine Learning

There are six major steps in the machine learning process.

  1. Data Collection
  2. Data Exploration
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Insights

Exploratory data analysis - EDA

In statistics, Exploratory Data Analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.

EDA

Definitions

Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, including learning, reasoning and self-correction

Machine learning (ML) is an application of AI generating systems that can learn and improve without being programmed

Deep Learning (DL) is a subset of Machine Learning and Artificial Intelligence. The term refers to a particular approach used for creating and training neural networks that are considered highly promising decision-making nodes

Machine Learning algorithms

Supervised Learning

The area of Machine Learning where you have a set of independent variables which helps you to analyse the dependent variable and the relation between them

Whatever you want to predict is called as Dependent Variable, while variables that you use to predict are called as Independent Variables

You want to predict the age of the person based on the person’s height and weight, then height and weight will be the independent variables, while age will be the dependent variable

Supervised learning is the most popular paradigm for machine learning. It is the easiest to understand and the simplest to implement.

Regression

Regression is the kind of Supervised Learning that learns from the Labelled Datasets and is then able to predict a continuous-valued output for the new data given to the algorithm.

It is used whenever the output required is a number such as money or height etc.

  • Linear Regression
  • Logistic Regression

bg right:30% 90%

Linear Regression

This algorithm assumes that there is a linear relationship between the 2 variables, Input (X) and Output (Y), of the data it has learnt from.

The Input variable is called the Independent Variable and the Output variable is called the Dependent Variable.

When unseen data is passed to the algorithm, it uses the function, calculates and maps the input to a continuous value for the output.

bg right:60% 100%

Logistic Regression

This algorithm predicts discrete values for the set of Independent variables that have been passed to it.

It does the prediction by mapping the unseen data to the logit function that has been programmed into it.

The algorithm predicts the probability of the new data and so it’s output lies between the range of 0 and 1.

bg right:30% 90%

Unsupervised Learning

You don’t have any dependent variable. You just have collection of variables and try to find out similarity between them and classify them into clusters

Unsupervised Learning - Examples

  • Customer segmentation, or understanding different customer groups around which to build marketing or other business strategies
  • Recommender systems, which involve grouping together users with similar viewing patterns in order to recommend similar content.
  • Anomaly detection, including fraud detection or detecting defective mechanical parts - predictive maintenance

Reinforcement Learning

It is the training of machine learning models to make a sequence of decisions.

The machine learns to achieve a goal in an uncertain, potentially complex environment. The machine employs trial and error to come up with a solution to the problem.

Facebook - Horizon

Facebook has developed an open-source reinforcement learning platform — Horizon. The platform uses reinforcement learning to optimize large-scale production systems.

Facebook has used Horizon internally

  • To personalize suggestions
  • Deliver more meaningful notifications to users
  • Optimize video streaming quality

Read more about Horizon

Machine Learning modules

SciPy is an ecosystem of Python libraries for mathematics, science and engineering.

It is an add-on to Python that you will need for machine learning.

The SciPy ecosystem is comprised of the following core modules relevant to machine learning:

  • NumPy: A foundation for SciPy that allows you to efficiently work with data in arrays.
  • Matplotlib: Allows you to create 2D charts and plots from data.
  • Pandas: Tools and data structures to organize and analyze your data.

To be effective at machine learning in Python you must use thise modules:

  • We will prepare your data as NumPy arrays for modeling in machine learning algorithms.
  • We will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots and charts of your data.
  • We will use Pandas to load, explore, and better understand your data.

SciPy

SciPy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems.

scikit-learn

The scikit-learn library is how you can develop and practice machine learning in Python. It is built upon and requires the SciPy ecosystem.

The name scikit suggests that it is a SciPy plug-in or toolkit. The focus of the library is machine learning algorithms for classification, regression, clustering and more. It also provides tools for related tasks such as evaluating models, tuning parameters and pre-processing data.

1
2
import scipy
import sklearn

Consumer Behavior

Using AI and Machine Learning to Predict Consumer Behavior

Companies understand that predicting customer behavior fills the gap in the markets and identifies products that are needed and which could generate bigger revenue.

Bernoulli

Bernoulli Naive Bayes is one of the variants of the Naive Bayes algorithm in machine learning. It is very useful to be used when the dataset is in a binary distribution where the output label is either present or absent (0 or 1).

  • Is it possible to detect stress from text string on the internet - In a chat for example - with machine Learning?
  • It it possible to detect hatefully language on Twitter?
  • Can you use ML to detect spam?

Example Stress

Stress detection with machine learning from text strings. The dataset for this task contains data posted on subreddit’s related to mental health.

This dataset contains various mental health problems shared by people about their life. This dataset is labelled as 0 and 1, where 0 indicates no stress and 1 indicates stress.

The data is: stress_csv.csv

I have used Jupyter Lab for this - DetectStress.ipynb

Example Twitter

Use the Bernoulli algorithm for detection of the language on Twitter.

Data is in twitter.csv

Example Spam

This example look’s at the text in the csv file: spam.csv

The text is labeled in the colum class with:

  • ham = Not spam
  • spam = Spam

Test of the algorithm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB

data = pd.read_csv("spam.csv", encoding= 'latin-1')
data = data[["class", "message"]]

x = np.array(data["message"])
y = np.array(data["class"])

cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.33, random_state=42)

model = BernoulliNB(binarize=0.0)
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

Links


Table of contents