# Example of Logistic Regression
Your goal is to build a logistic regression model in Python in order to determine whether candidates would get admitted to a prestigious university.

Here, there are two possible outcomes: Admitted (represented by the value of ‘1’) vs. Rejected (represented by the value of ‘0’).

You can then build a logistic regression in Python, where:

- The dependent variable represents whether a person gets admitted
- The 3 independent variables are the GMAT score, GPA and Years of work experience

In [20]:
#!pip3 install scikit-learn
#!pip3 install seaborn

In [21]:
# Import Libaries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
import matplotlib.pyplot as plt

# Data
Note that the dataset contains 40 observations. In practice, you’ll need a larger sample size to get more accurate results.

In [22]:
# Dataframe
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
              'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
              'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
              'admitted': [1,1,0,1,0,1,0,1,1,0,0,1,1,0,1,0,0,1,0,0,1,0,0,0,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1]
              }

df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','admitted'])
print(df)

    gmat  gpa  work_experience  admitted
0    780  4.0                3         1
1    750  3.9                4         1
2    690  3.3                3         0
3    710  3.7                5         1
4    680  3.9                4         0
5    730  3.7                6         1
6    690  2.3                1         0
7    720  3.3                4         1
8    740  3.3                5         1
9    690  1.7                1         0
10   610  2.7                3         0
11   690  3.7                5         1
12   710  3.7                6         1
13   680  3.3                4         0
14   770  3.3                3         1
15   610  3.0                1         0
16   580  2.7                4         0
17   650  3.7                6         1
18   540  2.7                2         0
19   590  2.3                3         0
20   620  3.3                2         1
21   600  2.0                1         0
22   550  2.3                4         0
23   550  2.7   

Set the independent variables (represented as X) and the dependent variable (represented as y)

In [23]:
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']

Apply train_test_split. For example, you can set the test size to 0.25. The model testing will be based on 25% of the dataset, while the model training will be based on 75% of the dataset

In [24]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

### Apply the logistic regression

In [25]:
logistic_regression = LogisticRegression()
logistic_regression.fit(X_train, y_train)
y_pred = logistic_regression.predict(X_test)

### Create the Confusion Matrix

In [None]:
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)

print('Accuracy: ', metrics.accuracy_score(y_test, y_pred))
plt.show()

- Accuracy of 0.8
- TP = True Positives = 4
- TN = True Negatives = 4
- FP = False Positives = 1
- FN = False Negatives = 1

Accuracy = (TP+TN)/Total = (4+4)/10 = 0.8

The accuracy is therefore 80% for the test set

### Analysing the Resulta

In [None]:
# Test data
print (X_test)

In [None]:
# Prediction
# 1 = admitted, while 0 = rejected
print(y_pred)

### Checking the Prediction for a New Set of Data
You have a new set of data, with 5 new candidates

Your goal is to use the existing logistic regression model to predict whether the new candidates will get admitted.

In [None]:
new_candidates = {'gmat': [590, 740, 680, 610, 710],
                  'gpa': [2, 3.7, 3.3, 2.3 ,3],
                  'work_experience': [3, 4, 6, 1, 5]
                  }

df2 = pd.DataFrame(new_candidates, columns = ['gmat', 'gpa', 'work_experience'])

In [None]:
print(df2)

In [None]:
y_pred = logistic_regression.predict(df2)

In [None]:
print(df2)
print(y_pred)

The first and fourth candidates are not expected to be admitted, while the other candidates are expected to be admitted.