Let's start by importing the libraries¶

#Importing NumPY and Pandas Library
import numpy as np
import pandas as pd

#Importing Data Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Read the dataset¶

result = pd.read_csv("StudentsPerformance.csv")

Let's start exploring the data¶

result.head()

Lets change the column name according to our convenience¶

result.columns = map(str.upper, result.columns)
result.head()

Checking the type of data in each column¶

result.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   GENDER                       1000 non-null   object
 1   RACE/ETHNICITY               1000 non-null   object
 2   PARENTAL LEVEL OF EDUCATION  1000 non-null   object
 3   LUNCH                        1000 non-null   object
 4   TEST PREPARATION COURSE      1000 non-null   object
 5   MATH SCORE                   1000 non-null   int64 
 6   READING SCORE                1000 non-null   int64 
 7   WRITING SCORE                1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB

Let's check for null values in each column¶

result.isna().sum()

GENDER                         0
RACE/ETHNICITY                 0
PARENTAL LEVEL OF EDUCATION    0
LUNCH                          0
TEST PREPARATION COURSE        0
MATH SCORE                     0
READING SCORE                  0
WRITING SCORE                  0
dtype: int64

Statistical Analysis on the data¶

result.describe()

The lowest marks in Math is 0, in Reading is 17, and writing is 10 !¶

The highest in all the 3 subjects is 100 !¶

Lets analyse the marks of the students in Math, Reading and Writing¶

sns.pairplot(result, hue = 'GENDER', palette = 'coolwarm')

<seaborn.axisgrid.PairGrid at 0x20eaa10aac8>

Heatmap for our data¶

# Matrix form for correlation data
result.corr()

sns.heatmap(result.corr(), cmap = 'PuRd', annot = True)

<matplotlib.axes._subplots.AxesSubplot at 0x20eaad2d288>

We can interpret that reading and writing score are highly correlated, while the writing score and math score is the least, compared to others!¶

Lets add an "AVERAGE" column to the dataset¶

result["AVERAGE"] = (result["MATH SCORE"] + result["READING SCORE"] + result["WRITING SCORE"])/3

result.head()

Lets find out the relation between Reading and Writing Score¶

sns.lmplot(x='READING SCORE',y='WRITING SCORE',data=result,hue='GENDER')

<seaborn.axisgrid.FacetGrid at 0x20eaa10ac88>

Let's find out how Gender affects Ethnicity and Math Scores¶

result.pivot_table(values='MATH SCORE',index='GENDER',columns='RACE/ETHNICITY')

pvresult = result.pivot_table(values='MATH SCORE',index='GENDER',columns='RACE/ETHNICITY')
sns.heatmap(pvresult, annot = True)

<matplotlib.axes._subplots.AxesSubplot at 0x20eab1ec308>

We can see that Group E Males have the highest average Math Scores, while Group A Females have the least!¶

Similarly lets analyse for Reading Scores¶

result.pivot_table(values='READING SCORE',index='GENDER',columns='RACE/ETHNICITY')

pvresult = result.pivot_table(values='READING SCORE',index='GENDER',columns='RACE/ETHNICITY')
sns.heatmap(pvresult,cmap='YlOrRd',linecolor='white',linewidths=1, annot = True)

<matplotlib.axes._subplots.AxesSubplot at 0x20eab2e64c8>

We can see that Group E females on an average, fair better than others in Reading!¶

Now, for Writing Scores¶

result.pivot_table(values='WRITING SCORE',index='GENDER',columns='RACE/ETHNICITY')

pvresult = result.pivot_table(values='WRITING SCORE',index='GENDER',columns='RACE/ETHNICITY')
sns.heatmap(pvresult, cmap = 'YlGnBu',linecolor='black',linewidths=1, annot = True)

<matplotlib.axes._subplots.AxesSubplot at 0x20eab3a1b88>

We can interpret that Group A Males have the least writing skills while Group E females have the most!¶

Lets analyse the average scores now¶

result.pivot_table(values='AVERAGE',index='GENDER',columns='RACE/ETHNICITY')

pvresult = result.pivot_table(values='AVERAGE',index='GENDER',columns='RACE/ETHNICITY')
sns.heatmap(pvresult, cmap = 'Reds', annot = True)

<matplotlib.axes._subplots.AxesSubplot at 0x20eab45b088>

On an average, we can say that Group E Females score the best, while Group A males, least!¶

Lets see the percentage distribution of Males and Females in the Dataset¶

(result.GENDER.value_counts()/len(result)) * 100

female    51.8
male      48.2
Name: GENDER, dtype: float64

gender = result['GENDER'].value_counts()
labels = result.GENDER.unique()
plt.pie(gender,labels=labels,autopct="%1.1f%%",shadow=True,explode=(0.04,0.04),startangle=90)
plt.title('GENDER DISTRIBUTION',fontsize=15)
plt.show()

result.GENDER.value_counts()

female    518
male      482
Name: GENDER, dtype: int64

sns.countplot(x='GENDER', data=result, palette = 'magma')

<matplotlib.axes._subplots.AxesSubplot at 0x20eab5523c8>

We can see that the gender distribution is almost 50-50 !¶

gender = result.groupby("GENDER")
gender.mean()

gender.describe().transpose()

Therefore, it is safe to assume that Males are slightly better than Females in Math, while Females outscore Males in Reading and Writing !¶

Finding out the percentage of students who have taken Test Preparation Course Prior to taking Tests¶

(result['TEST PREPARATION COURSE'].value_counts()/len(result)) * 100

none         64.2
completed    35.8
Name: TEST PREPARATION COURSE, dtype: float64

test = result['TEST PREPARATION COURSE'].value_counts()
labels = result["TEST PREPARATION COURSE"].unique()
plt.pie(test,labels=labels,autopct="%1.1f%%",shadow=True,explode=(0.04,0.04),startangle=90)
plt.title('TEST PREPARATION COURSE',fontsize=15)
plt.show()

tpc = result.groupby("TEST PREPARATION COURSE")
tpc.mean()

We can say that Test Preparation Course has definitely improved the scores of students!¶

Now, lets see how Test Preparation Course has helped students in improving their Test Scores, Gender wise¶

fig, ax = plt.subplots(1, 3, figsize=(16,4))
sns.violinplot(x="TEST PREPARATION COURSE", y='MATH SCORE', data=result,hue='GENDER',split=True,palette='PuRd', ax = ax[0])
sns.violinplot(x="TEST PREPARATION COURSE", y='READING SCORE', data=result,hue='GENDER',split = True, 
               palette='Purples', ax = ax[1])
sns.violinplot(x="TEST PREPARATION COURSE", y='WRITING SCORE', data=result,hue='GENDER',split = True, 
               palette='RdPu', ax = ax[2])

<matplotlib.axes._subplots.AxesSubplot at 0x20eab67bac8>

Lets see how Test Preparation Course has helped to improve the average marks of the students¶

sns.boxplot(x="TEST PREPARATION COURSE", y="AVERAGE", hue = "GENDER", data = result)

<matplotlib.axes._subplots.AxesSubplot at 0x20eab8f09c8>

We can see that definitely, Test Preparation Course has helped improve their scores!¶

Does a Parent's Level of Education influence the student's performance? Lets find out!¶

p_edu = result.groupby("PARENTAL LEVEL OF EDUCATION")
p_edu.mean()

fig, ax = plt.subplots(3, 1, figsize=(16,16))

sns.boxplot(x = 'PARENTAL LEVEL OF EDUCATION', y = 'MATH SCORE', data = result, ax = ax[0], palette = "magma")

sns.boxplot(x = 'PARENTAL LEVEL OF EDUCATION', y = 'READING SCORE', data = result, ax = ax[1], palette = "plasma")

sns.boxplot(x = 'PARENTAL LEVEL OF EDUCATION', y = 'WRITING SCORE', data = result, ax = ax[2], palette = "inferno")

<matplotlib.axes._subplots.AxesSubplot at 0x20eab77f388>

Now, lets see how Parental Level of Education has affected the average scores¶

sns.boxplot(x="TEST PREPARATION COURSE", y='AVERAGE', data=result,hue='GENDER', palette='inferno')

<matplotlib.axes._subplots.AxesSubplot at 0x20eacf2f3c8>

Yeah! Parental Level of Education does improve the scores of students!¶

Lets find the count of students belonging to a particular Race/Ethnicity¶

# Lets find the percentage distribution
(result["RACE/ETHNICITY"].value_counts()/len(result)) * 100

group C    31.9
group D    26.2
group B    19.0
group E    14.0
group A     8.9
Name: RACE/ETHNICITY, dtype: float64

sns.countplot(x='RACE/ETHNICITY', data=result, palette = 'Reds')
sns.despine()

A majority of the students belong to Group C, while Group A has the least number of students!¶

sns.boxplot(x = 'RACE/ETHNICITY', y = 'AVERAGE', data = result, palette = "magma")

<matplotlib.axes._subplots.AxesSubplot at 0x20eaca883c8>

Therefore, we can see that Group E students have a higher average than others!¶

Lets see how the distribution Parental Level Of Education varies with Race/Ethnicity¶

plt.figure(figsize = (16,5))
sns.countplot(x="PARENTAL LEVEL OF EDUCATION", hue="RACE/ETHNICITY", data=result, palette='viridis')

<matplotlib.axes._subplots.AxesSubplot at 0x20eacb56248>

Lets find out the percentage of students who receive standard and reduced Lunch¶

(result["LUNCH"].value_counts()/len(result)) * 100

standard        64.5
free/reduced    35.5
Name: LUNCH, dtype: float64

lunch = result['LUNCH'].value_counts()
labels = result["LUNCH"].unique()
plt.pie(test,labels=labels,autopct="%1.1f%%",shadow=True,explode=(0.04,0.04),startangle=90)
plt.title('LUNCH DISTRIBUTION',fontsize=15)
plt.show()

# Plotting the figures
fig, ax = plt.subplots(3, 1, figsize=(16,16))
sns.swarmplot(x="RACE/ETHNICITY", y='MATH SCORE', data=result,hue='LUNCH',palette='Purples', ax = ax[0])
sns.swarmplot(x="RACE/ETHNICITY", y='READING SCORE', data=result,hue='LUNCH', palette='Blues', ax = ax[1])
sns.swarmplot(x="RACE/ETHNICITY", y='WRITING SCORE', data=result,hue='LUNCH', palette='Greens', ax = ax[2])

<matplotlib.axes._subplots.AxesSubplot at 0x20eacc9bfc8>

Lets see if Lunch affects the scores of students¶

p_edu = result.groupby("LUNCH")
p_edu.mean()

Students with Standard Lunch seem to score better than those with Free/Reduced Lunch !¶

Lets see how type of Lunch differs due to Race/Ethnicity¶

sns.countplot(x="RACE/ETHNICITY", hue="LUNCH", data=result, palette='Oranges')

<matplotlib.axes._subplots.AxesSubplot at 0x20eacd7ef08>

Group C receives the majority of free/reduced Lunches while Group A receives the least¶

Is Free/Reduced Lunch Gender Biased? Lets find out!¶

sns.countplot(x="LUNCH", data=result,hue = 'GENDER', palette='YlGnBu')

<matplotlib.axes._subplots.AxesSubplot at 0x20eacd6f708>

	MATH SCORE	READING SCORE	WRITING SCORE
count	1000.00000	1000.000000	1000.000000
mean	66.08900	69.169000	68.054000
std	15.16308	14.600192	15.195657
min	0.00000	17.000000	10.000000
25%	57.00000	59.000000	57.750000
50%	66.00000	70.000000	69.000000
75%	77.00000	79.000000	79.000000
max	100.00000	100.000000	100.000000

	MATH SCORE	READING SCORE	WRITING SCORE
MATH SCORE	1.000000	0.817580	0.802642
READING SCORE	0.817580	1.000000	0.954598
WRITING SCORE	0.802642	0.954598	1.000000

RACE/ETHNICITY	group A	group B	group C	group D	group E
GENDER
female	58.527778	61.403846	62.033333	65.248062	70.811594
male	63.735849	65.930233	67.611511	69.413534	76.746479

RACE/ETHNICITY	group A	group B	group C	group D	group E
GENDER
female	69.000000	71.076923	71.944444	74.046512	75.840580
male	61.735849	62.848837	65.424460	66.135338	70.295775

RACE/ETHNICITY	group A	group B	group C	group D	group E
GENDER
female	67.861111	70.048077	71.777778	75.023256	75.536232
male	59.150943	60.220930	62.712230	65.413534	67.394366

	gender	race/ethnicity	parental level of education	lunch	test preparation course	math score	reading score	writing score
0	female	group B	bachelor's degree	standard	none	72	72	74
1	female	group C	some college	standard	completed	69	90	88
2	female	group B	master's degree	standard	none	90	95	93
3	male	group A	associate's degree	free/reduced	none	47	57	44
4	male	group C	some college	standard	none	76	78	75

RACE/ETHNICITY	group A	group B	group C	group D	group E
GENDER
female	65.129630	67.509615	68.585185	71.439276	74.062802
male	61.540881	63.000000	65.249400	66.987469	71.478873

	MATH SCORE	READING SCORE	WRITING SCORE	AVERAGE
GENDER
female	63.633205	72.608108	72.467181	69.569498
male	68.728216	65.473029	63.311203	65.837483

	GENDER	female	male
MATH SCORE	count	518.000000	482.000000
	mean	63.633205	68.728216
	std	15.491453	14.356277
	min	0.000000	27.000000
	25%	54.000000	59.000000
	50%	65.000000	69.000000
	75%	74.000000	79.000000
	max	100.000000	100.000000
READING SCORE	count	518.000000	482.000000
	mean	72.608108	65.473029
	std	14.378245	13.931832
	min	17.000000	23.000000
	25%	63.250000	56.000000
	50%	73.000000	66.000000
	75%	83.000000	75.000000
	max	100.000000	100.000000
WRITING SCORE	count	518.000000	482.000000
	mean	72.467181	63.311203
	std	14.844842	14.113832
	min	10.000000	15.000000
	25%	64.000000	53.000000
	50%	74.000000	64.000000
	75%	82.000000	73.750000
	max	100.000000	100.000000
AVERAGE	count	518.000000	482.000000
	mean	69.569498	65.837483
	std	14.541809	13.698840
	min	9.000000	23.000000
	25%	60.666667	56.000000
	50%	70.333333	66.333333
	75%	78.666667	76.250000
	max	100.000000	100.000000

	MATH SCORE	READING SCORE	WRITING SCORE	AVERAGE
TEST PREPARATION COURSE
completed	69.695531	73.893855	74.418994	72.669460
none	64.077882	66.534268	64.504673	65.038941

	MATH SCORE	READING SCORE	WRITING SCORE	AVERAGE
PARENTAL LEVEL OF EDUCATION
associate's degree	67.882883	70.927928	69.896396	69.569069
bachelor's degree	69.389831	73.000000	73.381356	71.923729
high school	62.137755	64.704082	62.448980	63.096939
master's degree	69.745763	75.372881	75.677966	73.598870
some college	67.128319	69.460177	68.840708	68.476401
some high school	63.497207	66.938547	64.888268	65.108007

	MATH SCORE	READING SCORE	WRITING SCORE	AVERAGE
LUNCH
free/reduced	58.921127	64.653521	63.022535	62.199061
standard	70.034109	71.654264	70.823256	70.837209

Let's start by importing the libraries¶

Read the dataset¶

Let's start exploring the data¶

Lets change the column name according to our convenience¶

Checking the type of data in each column¶

Let's check for null values in each column¶

Statistical Analysis on the data¶

The lowest marks in Math is 0, in Reading is 17, and writing is 10 !¶

The highest in all the 3 subjects is 100 !¶

Lets analyse the marks of the students in Math, Reading and Writing¶

Heatmap for our data¶

We can interpret that reading and writing score are highly correlated, while the writing score and math score is the least, compared to others!¶

Lets add an "AVERAGE" column to the dataset¶

Lets find out the relation between Reading and Writing Score¶

Let's find out how Gender affects Ethnicity and Math Scores¶

We can see that Group E Males have the highest average Math Scores, while Group A Females have the least!¶

Similarly lets analyse for Reading Scores¶

We can see that Group E females on an average, fair better than others in Reading!¶

Now, for Writing Scores¶

We can interpret that Group A Males have the least writing skills while Group E females have the most!¶

Lets analyse the average scores now¶

On an average, we can say that Group E Females score the best, while Group A males, least!¶

Lets see the percentage distribution of Males and Females in the Dataset¶

We can see that the gender distribution is almost 50-50 !¶

Therefore, it is safe to assume that Males are slightly better than Females in Math, while Females outscore Males in Reading and Writing !¶

Finding out the percentage of students who have taken Test Preparation Course Prior to taking Tests¶

We can say that Test Preparation Course has definitely improved the scores of students!¶

Now, lets see how Test Preparation Course has helped students in improving their Test Scores, Gender wise¶

Lets see how Test Preparation Course has helped to improve the average marks of the students¶

We can see that definitely, Test Preparation Course has helped improve their scores!¶

Does a Parent's Level of Education influence the student's performance? Lets find out!¶

Now, lets see how Parental Level of Education has affected the average scores¶

Yeah! Parental Level of Education does improve the scores of students!¶

Lets find the count of students belonging to a particular Race/Ethnicity¶

A majority of the students belong to Group C, while Group A has the least number of students!¶

Therefore, we can see that Group E students have a higher average than others!¶

Lets see how the distribution Parental Level Of Education varies with Race/Ethnicity¶

Lets find out the percentage of students who receive standard and reduced Lunch¶

Lets see if Lunch affects the scores of students¶

Students with Standard Lunch seem to score better than those with Free/Reduced Lunch !¶

Lets see how type of Lunch differs due to Race/Ethnicity¶

Group C receives the majority of free/reduced Lunches while Group A receives the least¶

Is Free/Reduced Lunch Gender Biased? Lets find out!¶

The number of females receiving Standard or Free/Reduced Lunch is higher in both the cases!¶