Logistic Regression As Neural Network
Binary Classification
Notations :
(x,y): −>x ϵ Rnx,y ϵ {0,1}
m: Number of training examples
X:X ϵ Rnxx m
Y:Y ϵ R1 x m
Logistic Regression
Given x we want y^=P(y=1∣x)
Parameters w ϵ Rnx, b ϵ R
Output y^=σ(wTx+b) where σ represents sigmoid function
σ(z)=1+e−z1
Cost Function
Given {(x(1),y(1))...(x(m),y(m))}, we want y^(i)=y(i)
Loss function: L(y^,y)=21(y^−y)2
But the above function is not generally used as the above optimization problem is not convex
So we define loss function as
L(y^,y)=−(y logy^+(1−y)log(1−y^))
Loss function is for single training example
Cost Function :
J(w,b)=m1(i=1∑nL(y^(i),y(i)))
Gradient Descent
w:=w−α∂w∂J(w,b)
b:=b−α∂b∂J(w,b)
Here α denotes learning rate
Given x,y
z=(Σwixi)+b
a=σ(z)
L(a,y)=−(y log(a)+(1−y)log(1−a))
Derivatives:
∂a∂L=∂a=1−a1−y−ay
∂z∂L=∂z=a−y
∂wi∂L=∂wi=xi∂z
∂b=∂z
So final values for wi:=wi−α∂wi
The above is for one training example, We need to do the same process for m training examples.
Instead of using loops we can use matrix operations to solve it efficiently, Here are the operations we need to make
Z=wTX+b=>np.dot(wT,X)+b
A=σ(Z)
dZ=A−Y
dw=m1X(dZ)T
db=m1np.sum(dZ)
w:=w−αdw
b:=b−αdb