Descriptive Statistics

Go to Problems

Measure of Central Tendency

Measures of central tendency, also called measures of a central location, could be defined as a single value that attempts to describe the data by identifying the central position within that set of data. These are also referred to as summary statistics. These measures usually contain: 

Mean:

Generally denoted by the average of the numbers, sum of all the numbers by the total numbers. Mathematically, it is denoted by: 


Where the term ‘’ defines mean, ‘Σ Xi’ represents the sum of all data values, and N defines the size of the data space.

E.g.: For an array = [1, 2, 3, 4, 5], mean would be sum of the array / length = 15/5 = 3

 

At times, we get confused about the notations used to represent the mean. Is it 𝛍 or ? In general, 𝛍 is used for the population mean whereas is used for the sample mean.

Mean as a central tendency has certain use-cases and drawbacks as well. Speaking of use-cases, it involves all the data points of the sample or population. Though often the mean is not one of the actual values, it is the value that minimizes the error than all other data points. In addition, it is the central tendency with the sum of deviations of each value from it is always 0.

Given the wide range of applications, it also comes with some constraints. Mean is susceptible to the influence of outliers as it involves all the data points. Speaking of, Median is preferred over mean where we have skewness in the data points. 

 

Median:

The middle element of the numbers when they are sorted in an order. 

n denoting the size of the sample space X

E.g., for an array say, [1, 3, 4, 5, 7], the median is the array[4/2] = array[2] = 4

 

Mode:

The most frequently occurring number in the data.

E.g., for an array, say, [1, 3, 2, 3, 4, 5], the mode will be 3 in this case.

It is mainly considered while dealing with categorical values to observe the rank in appearances or the count.

Serious about Learning Data Science and Machine Learning ?

Learn this and a lot more with Scaler's Data Science industry vetted curriculum.
Measures of central tendency
Problem Score Companies Time Status
Change in mean and median 30
2:39
New average 30
2:33
Suitable mean 30
2:36
Measures of variability
Problem Score Companies Time Status
How much did he score? 30
3:21
IQR outlier detection 50
29:06
Variability measures 50
36:02
Distribution analysis: univariate
Problem Score Companies Time Status
Median over mean 30
3:20
Difference 30
0:53
Univariate 30
2:57
Missing info 30
1:22
!univariate 30
2:22