Probability
Basic Rules
Union :
p(A∨B)=p(A)+p(B)−p(A∧B)
Joint :
p(A,B)=p(A∧B)=p(A∣B)p(B)
p(A)=b∑p(A,B)=b∑p(A∣B=b)p(B=b)
Conditional :
p(A∣B)=p(B)p(A,B) if p(B)>0
Bayes-Rule :
p(X=x∣Y=y)=p(Y=y)p(X=x,Y=y)=∑x′p(X=x′)p(Y=y∣X=x′)p(X=x)p(Y=y∣X=x)
Un-conditional Independent :
p(X,Y)=p(X)p(Y)
Conditional Independent :
p(X,Y∣Z)=p(X∣Z)p(Y∣Z)
Density function
For continues data
Examples
- Uniform Distribution
- Normal/Gaussian Distribution
- Exponential Distribution
Mass function
For discrete data
Examples
- Binomial probability mass function
- Poisson probability mass function
Mean And Variance
Mean (μ)
Discrete : E[X]≜x∈X∑x p(x)
Continues : E[X]≜∫Xx p(x) dx
Variance (σ2)
Var[X]≜E[(X−μ)2]
E[X2]=μ2+σ2
std[X]≜√var[X]
Percentiles and Moments
Percentiles
In a dataset the point at which x% of the data is less than that value
Moments
Moments in mathematical statistics involve a basic calculation. These calculations can be used to find a probability distribution's mean, variance and skewness.
In a discrete random variable,
The sth moment of the data set with values x1,x2,x3,...xn is given by the formula:
(x1s+x2s+x3s+...+xns)/n
First Moment is Simple Mean
Moments about the mean
((x1−μ)s+(x2−μ)s+(x3−μ)s+...+(xn−μ)s)/n
- First moment about mean is zero
- Second moment about mean is variance
- Third moment about mean is skew
- Fourth moment about mean is kurtosis
The above applies same for continues random variable
Covariance and Corelation
Describe the degree to which two random variables or sets of random variables tend to deviate from their expected values in similar ways
Covariance
Cov(X,Y)=σXY=E[X−E[X]]E[Y−E[Y]]
Cov(X,Y)=E[XY]−E[X]E[Y]
If x is a d-dimensional random vector, its covariance matrix is defined to be the following symmetric, positive definite matrix:
Cov[x]=E[(x−E[x])(x−E[x])T]=⎝⎜⎜⎜⎜⎜⎜⎛var[X1]cov[X2,X1]⋅⋅⋅cov[Xd,X1]cov[X1,X2]var[X2]⋅⋅⋅cov[Xd,X2]⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅cov[X1,Xd]cov[X2,Xd]⋅⋅⋅var[Xd]⎠⎟⎟⎟⎟⎟⎟⎞
Covariances can be between 0 and infinity. Sometimes it is more convenient to work with a normalized measure, with a finite upper bound.
Corelation
Cor(X,Y)=σXσYCov(X,Y)
−1≤Cor[X,Y]≤1
If there is a linear relationship between X and Y then Corr[X,Y]=1
If X and Y are independent, meaning p(X,Y)=p(X)p(Y), then cov[X,Y]=0, and hence corr[X,Y]=0 so they are uncorrelated, Converse is not true, Uncorrelated does not imply independent