Logistic Regression As Neural Network

Binary Classification

Notations :

(x,y): >x ϵ Rnx,y ϵ {0,1}(x, y): \space -> x \space\epsilon\space \Bbb{R}^{n_x}, y \space\epsilon\space \{ 0, 1 \} mm: Number of training examples X:X ϵ Rnxx mX: X \space\epsilon\space \Bbb{R}^{n_x \mathbf{x} \space m} Y:Y ϵ R1 x mY: Y \space\epsilon\space \Bbb{R}^{1 \space \mathbf{x} \space m}

Logistic Regression

Given xx we want y^=P(y=1x)\hat{y} = P(y = 1|x)

Parameters w ϵ Rnx, b ϵ Rw \space\epsilon\space \Bbb{R}^{n_x}, \space b \space\epsilon\space \Bbb{R}

Output y^=σ(wTx+b)\hat{y} = \sigma(w^Tx + b) where σ\sigma represents sigmoid function

σ(z)=11+ez\sigma(z) = \dfrac{1}{1 + e^{-z}}

Cost Function

Given {(x(1),y(1))...(x(m),y(m))}\{ (x^{(1)}, y^{(1)}) ... (x^{(m)}, y^{(m)}) \}, we want y^(i)=y(i)\hat{y}^{(i)} = y^{(i)}

Loss function: L(y^,y)=12(y^y)2L(\hat{y}, y) = \frac{1}{2}(\hat{y} - y)^2

But the above function is not generally used as the above optimization problem is not convex

So we define loss function as L(y^,y)=(y logy^+(1y)log(1y^))L(\hat{y}, y) = -(y \space log\hat{y} + (1-y)log(1-\hat{y}))

Loss function is for single training example

Cost Function :

J(w,b)=1m(i=1nL(y^(i),y(i)))J(w, b) = \frac{1}{m}(\displaystyle\sum_{i=1}^nL(\hat{y}^{(i)}, y^{(i)}))

Gradient Descent

w:=wαJ(w,b)ww := w - \alpha \dfrac{\partial J(w, b)}{\partial w} b:=bαJ(w,b)bb:= b - \alpha \dfrac{\partial J(w, b)}{\partial b}

Here α\alpha denotes learning rate

Given x,yx, y

z=(Σwixi)+bz = (\Sigma w_ix_i) + b a=σ(z)a = \sigma(z) L(a,y)=(y log(a)+(1y)log(1a))L(a, y) = -(y \space log(a) + (1-y)log(1-a))

Derivatives:

La=a=1y1aya\dfrac{\partial L}{\partial a} = \partial a = \dfrac{1-y}{1-a} - \dfrac{y}{a}

Lz=z=ay\dfrac{\partial L}{\partial z} = \partial z = a - y

Lwi=wi=xiz\dfrac{\partial L}{\partial w_i} = \partial w_i = x_i \partial z

b=z\partial b = \partial z

So final values for wi:=wiαwiw_i := w_i - \alpha \partial w_i

The above is for one training example, We need to do the same process for m training examples.

Instead of using loops we can use matrix operations to solve it efficiently, Here are the operations we need to make

Z=wTX+b=>np.dot(wT,X)+bZ = w^TX + b => np.dot(w^T, X) + b A=σ(Z)A = \sigma(Z) dZ=AYdZ = A - Y dw=1mX(dZ)Tdw = \dfrac{1}{m}X(dZ)^T db=1mnp.sum(dZ)db = \dfrac{1}{m} np.sum(dZ) w:=wαdww := w - \alpha dw b:=bαdbb := b - \alpha db

results matching ""

    No results matching ""