In stats, why is the median sometimes a better representation of the middle value compared to the mean ? Also, when comparing two sets of data, how does the median suggest anything about the skewness of the data set ?
A good question. When we are analysing a large data set with a symmetric distribution (i.e. normally called a Gaussian curve or a normal distribution), it is generally the case that the mean is ~ equal to the median. But when I have a skewed distribution, the mean is also skewed towards the majority of the points - especially because of the fundamental nature of the standard mean formula which is (all x sum)/(number of x). Where if there are a lot of values near one and some values near 4, the mean will be near 1 - but that is not necessarily true for the mean. This is because median is semi-independent of the concentration of points as we are deducting the points from the edges towards the center, hence giving a more
"central" value.
When we a comparing two data sets with a different skew but similar x-values, I would think that (if the values of x were within a certain equal range) the direction the median is from the middle of the range represents the skew of the distribution. Suppose we have two data sets confined within the values x=1 and x=5. One is skewed towards 1 (who median will be closer to 3 than 1) and the other skewed towards 4.5 (whose median is closer to 3 than 4.5). Hence, from the two medians, we can tell that since the first distribution is a bit towards 1 and not exactly at 3, it is skewed towards it whereas it is the opposite for the second distribution.
I might not have been very clear but I am trying my best. Hope it helps (i have forgotten which is positively skewed and which is negatively skewed hence the little confusion in the discussion. Sry)