우선 Feed forward neural network의 forward layer 쌓기.

shape 끼어 맞추기.
i번째 Layer의 출력물 𝑎^[i] 는 다음 Layer i+1 번째 Layer의 Input으로 들어간다.
W : Weight로 간선을 생각하면 된다. 크기는 (이전 Layer의 노드 수 x 다음 Layer의 노드 수)
b : bias 편향 -> y = ax + b에서 b역할을 맡음. 크기 : (1 x 출력 Layer의 노드 수)
Z : Layer 통과 후 Activation Fuction을 거치기 전 단계.
Activation Fuction을 넣어주는 이유 : 비선형 함수를 넣지 않는 다면 XOR 문제를 해결할 수 없기 때문.

조건

input feature map size = 400.
output classes = 10
number of hidden nodes = 50, 20, 30

Layer 1

$Z^{[1]} = x \times W^{[1]} + b^{[1]}$

$x \in \mathbb{R}^{1 \times 400}$
$b^{[1]} \in \mathbb{R}^{1 \times 50}$
$W^{[1]} \in \mathbb{R}^{400 \times 50}$
$\therefore Z^{[1]} \in \mathbb{R}^{1 \times 50}$

output of Layer 1

$a^{[1]} = ReLU(Z^{[1]})$

Layer 2

$Z^{[2]} = a^{[1]} \times W^{[2]} + b^{[2]}$

$a^{[1]} \in \mathbb{R}^{1 \times 50}$
$b^{[2]} \in \mathbb{R}^{1 \times 20}$
$W^{[2]} \in \mathbb{R}^{50 \times 20}$
$\therefore Z^{[2]} \in \mathbb{R}^{1 \times 20}$

output of Layer 2

$a^{[2]} = ReLU(Z^{[2]})$

Layer 3

$Z^{[3]} = a^{[2]} \times W^{[3]} + b^{[3]}$

$a^{[2]} \in \mathbb{R}^{1 \times 20}$
$b^{[3]} \in \mathbb{R}^{1 \times 30}$
$W^{[3]} \in \mathbb{R}^{20 \times 30}$
$\therefore Z^{[3]} \in \mathbb{R}^{1 \times 30}$

output of Layer 3

$a^{[3]} = ReLU(Z^{[3]})$

Layer 4

$Z^{[4]} = a^{[3]} \times W^{[4]} + b^{[4]}$

$a^{[4]} \in \mathbb{R}^{1 \times 30}$
$b^{[4]} \in \mathbb{R}^{1 \times 10}$
$W^{[4]} \in \mathbb{R}^{30 \times 10}$
$\therefore Z^{[4]} \in \mathbb{R}^{1 \times 10}$

Cross Entropy

Cross Entropy는 분류모델 평가척도로 사용됩니다.
Input data x^(i) 에 대한 정답 y의 확률 -> 분류 문제에서 정답은 1, 오답은 0
Input data x^(i) 에 대해 Theta를 넣고 예측한 model의 예측값 y는 logit에 softmax를 취한 값.
훈련 데이터는 정답이 O, X로 갈리기 때문에 1 아니면 0 -> one-hot Encoding
Logits 이란? 모델의 예측값이 softmax와 같은 Activation Function에 들어가기 전 값.
$\therefore logits = Z^{[4]}$

softmax
- activation fuction 중 하나로 예측값을 확률 분포값으로 만들어줌. 합쳐서 1.
- 해당 값의 exponential 값을 모든 출력값에 exponential 해주고 더 해 나눠줌.
$softmax(z)_i = \frac{e^{zi}}{\Sigma_j e^{zj} }$
Cross Entropy =
$J(\theta)$
$J(\theta) = -\frac{1}{m} \sum_{i = 1}^{m}\sum_{y}P_{data}(y|x^{(i)})\bullet log P_{model}(y|x^{(i)}; \theta)$
$J(\theta) = -\frac{1}{m} \sum_{i = 1}^{m}one-hot(y^{(i)})^T\bullet log h_{\theta}(x^{(i)})$

다음시간엔 지금 구한 Cross Entropy 식 부터 시작하여 꺼꾸로 올라가며 Backpropagation에 대해서 알아보자

다음과 같은 loss 계산식이 있다고 할 때.

위의 식을 밑에다 대입하면

o = one_hot(y) - log( softmax(Z))

e = j(theta)

'Deep Learning (Computer Vision)' 카테고리의 다른 글

[DenseNet] 논문 톺아보기 - Densely Connected Convolutional Networks (0)	2023.09.11
Deep Learning & Computer Vision & NLP 용어 정리 (1)	2023.08.18
Backpropagation 에서 전치의 발생. (0)	2022.05.02

Deep Learning Post

MLP에서의 Forward pass, Layer shape 맞추기.

우선 Feed forward neural network의 forward layer 쌓기.

조건

Layer 1

output of Layer 1

Layer 2

output of Layer 2

Layer 3

output of Layer 3

Layer 4

Cross Entropy

다음시간엔 지금 구한 Cross Entropy 식 부터 시작하여 꺼꾸로 올라가며 Backpropagation에 대해서 알아보자

'Deep Learning (Computer Vision)' 카테고리의 다른 글

티스토리툴바

MLP에서의 Forward pass, Layer shape 맞추기.

우선 Feed forward neural network의 forward layer 쌓기.

조건

Layer 1

output of Layer 1

Layer 2

output of Layer 2

Layer 3

output of Layer 3

Layer 4

Cross Entropy

다음시간엔 지금 구한 Cross Entropy 식 부터 시작하여 꺼꾸로 올라가며 Backpropagation에 대해서 알아보자

'Deep Learning (Computer Vision)' 카테고리의 다른 글

관련글

티스토리툴바