Statistic Params
Types of data
- Numerical
- Discrete
- Continues
- Categorical
- Ordinal (Ex: Movie ratings)
Measures of central tendency
- Mean
- Median
- Mode
Variance : Signifies how spread out the data is, Mathematically variance() is sum of squared differences from the mean
Standard Deviation : Its just square root of variance(), Usually used to identify outliers
Population vs Sample
Population is whole group where as sample is part of a population that is used to describe the characteristics like mean, standard deviation etc...
Population variance and Sample variance
- Population variance
- Sample variance
Why there is n-1 in denominator of sample variance as an estimate of population variance?
This correction is called Bessel's correction.
Theoretical Explanation : The standard deviation calculated with a divisor of is a standard deviation calculated from the sample as an estimate of the standard deviation of the population from which the sample was drawn. Because the observed values fall, on average, closer to the sample mean than to the population mean, the standard deviation which is calculated using deviations from the sample mean underestimates the desired standard deviation of the population. Using instead of as the divisor corrects for that by making the result a little bit bigger.
Degrees of Freedom : It is the number of values in the calculation that are free to vary.
In the above case of values the degrees of freedom is , For estimating population variance from sample variance we typically divide by degrees of freedom as opposed by sample size.
- Mathematical Proof : Check here
Other measures
- Z-Score : How many standard deviations away from the mean is a data point
- Coefficient of variation : Relative measure of variability (Standard deviation relative to the mean)