ML Coursera 4 - w4: NN - Representation

Posted on 16/09/2018, in Machine Learning.

This note was first taken when I learnt the machine learning course on Coursera.
Lectures in this week: Lecture 8.

In this post

Neural Networks
- Forward propagation: vectorized implementation
Examples and Intuitions I
Examples and Intuitions II
Multiclass classification
Exercise Programmation: Multi-class Classification and Neural Networks

Neural Networks

Check [this link](https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e) to see the basic idea of NN and forward/backward propagation.

Download Lecture 8.

Algorithms try to mimic the brain.
80s, 90s (very old)
activation function = sigmoid (logistic) $g(z)$
(sometimes) $\theta$ called weights parameters
Neural network: first layer (input layer), intermidiate layers (hidden layer) and the last layer (output layer)
Notations:
- $a_i^{(j)}$ = “activation” of unit $i$ in layer $j$
  
  $$ a_{\text{unit}}^{(\text{layer})} $$
- $\Theta^{(j)}$ = matrix of weights controlling function mapping from layer $j$ to layer $j+1$
- $\Theta \in \mathbb{R}^{m\times n+1}$ where $m$ = number of unit in current layer, $n$ = number of units in previous layer (not include unit 0).
  
  $$ \begin{align} \Theta^{(\text{previous})} &: \text{previous layer} \to \text{current layer} \\ \Theta^{(\text{previous})} &\in \mathbb{R}^{\text{current} \times (\text{previous}+1)} \end{align} $$
  
  $$ a_{k\in \text{current units}}^{(\text{current layer})} = g\left( \sum_{i\in \text{prev units}} \Theta_{ki}^{(\text{prev layer})} a_i^{(\text{prev layer})} \right) $$

Forward propagation: vectorized implementation

Neural network 3

$$ \begin{align} z^{(2)} &= \Theta^{(1)} a^{(1)}\\ a^{(2)} &= g(z^{(2)}) \end{align} $$

Neural network is look like logistic regression, except that instead of using $x_1, x_2, \ldots$, it’s using $a^{(j)}_1, a^{(j)}_2, \ldots$
Neural network learning its own features.
Architectures = how different neurons connected to each other.
Final

$$ \begin{align} h_{\theta}(x) = a^{(j+1)} = g(z^{(j+1)}) = g(\Theta^{(j)}a^{(j)}). \end{align} $$

Examples and Intuitions I

Neural network 4

Neural network 5

Examples and Intuitions II

Neural network 6

Neural network 7

Multiclass classification

Neural network 8

Neural network 9

Exercise Programmation: Multi-class Classification and Neural Networks

Check instruction and explanation ex3.

See again Multiclass classification: one-vs-all.

See again Advanced Optimization.

lrCostFunction.m

This ex is the same with the one in the previous week.

h = sigmoid(X*theta); % hypothesis
J = 1/m * ( -y' * log(h) - (1-y)' * log(1-h) ) + lambda/(2*m) * sum(theta(2:end).^2);
  
grad(1,1) = 1/m * X(:,1)' * (h-y);
grad(2:end,1) = 1/m * X(:,2:end)' * (h-y) + lambda/m * theta(2:end,1);

oneVsAll.m

initial_theta = zeros(n + 1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c = 1:num_labels
	  [theta] = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options);
	  all_theta(c,:) = theta(:);
end

fmincg works similarly to fminunc, but is more more efficient for dealing with a large number of parameters.

predictOneVsAll.m

h = X * all_theta';
[~, p] = max(h,[],2);

predict.m : “you implemented multi-class logistic regression to recognize handwritten digits. However, logistic regression cannot form more complex hypotheses as it is only a linear classifier.”

  X = [ones(m, 1) X]; % add column 1 to a2
  z2 = X * Theta1';
  a2 = sigmoid(z2);
  a2 = [ones(m, 1) a2]; % add column 1 to a2
  h = a2 * Theta2';
  [~, p] = max(h,[],2);

Next to Week 5.