Formula for coefficient of kurtosis and skewness in probability
Answers
A fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis.
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case.
The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set.
For univariate data Y1, Y2, ..., YN, the formula for skewness is:
g1=∑Ni=1(Yi−Y¯)3/Ns3
where Y¯ is the mean, s is the standard deviation, and N is the number of data points. Note that in computing the skewness, the s is computed with N in the denominator rather than N - 1.
The above formula for skewness is referred to as the Fisher-Pearson coefficient of skewness. Many software programs actually compute the adjusted Fisher-Pearson coefficient of skewness
G1=N(N−1)−−−−−−−−√N−2∑Ni=1(Yi−Y¯)3/Ns3
This is an adjustment for sample size. The adjustment approaches 1 as N gets large. For reference, the adjustment factor is 1.49 for N = 5, 1.19 for N = 10, 1.08 for N = 20, 1.05 for N = 30, and 1.02 for N = 100.
The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. If the data are multi-modal, then this may affect the sign of the skewness.
Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.
It should be noted that there are alternative definitions of skewness in the literature. For example, the Galton skewness (also known as Bowley's skewness) is defined as
Galton skewness=Q1+Q3−2Q2Q3−Q1
where Q1 is the lower quartile, Q3 is the upper quartile, and Q2 is the median.
The Pearson 2 skewness coefficient is defined as
Sk2=3(Y¯−Y~)s
where Y~ is the sample median.
There are many other definitions for skewness that will not be discussed here.
Definition of Kurtosis For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:
kurtosis=∑Ni=1(Yi−Y¯)4/Ns4
where Y¯ is the mean, s is the standard deviation, and N is the number of data points. Note that in computing the kurtosis, the standard deviation is computed using N in the denominator rather than N - 1.
Alternative Definition of Kurtosis The kurtosis for a standard normal distribution is three. For this reason, some sources use the following definition of kurtosis (often referred to as "excess kurtosis"):
kurtosis=∑Ni=1(Yi−Y¯)4/Ns4−3
This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "heavy-tailed" distribution and negative kurtosis indicates a "light tailed" distribution.
Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear.