menu
Anh-Thi DINH

ML Coursera 4 - w4: NN - Representation

Posted on 16/09/2018, in Machine Learning.

This note was first taken when I learnt the machine learning course on Coursera.
Lectures in this week: Lecture 8.

settings_backup_restore Go back to Week 3.

Neural Networks

Check [this link](https://medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e) to see the basic idea of NN and forward/backward propagation.

cloud_download Download Lecture 8.
  • Algorithms try to mimic the brain.
  • 80s, 90s (very old)
  • activation function = sigmoid (logistic) $g(z)$
  • (sometimes) $\theta$ called weights parameters
  • Neural network: first layer (input layer), intermidiate layers (hidden layer) and the last layer (output layer)

    Neural network 1

  • Notations:
    • $a_i^{(j)}$ = “activation” of unit $i$ in layer $j$

      $$ a_{\text{unit}}^{(\text{layer})} $$
    • $\Theta^{(j)}$ = matrix of weights controlling function mapping from layer $j$ to layer $j+1$
    • $\Theta \in \mathbb{R}^{m\times n+1}$ where $m$ = number of unit in current layer, $n$ = number of units in previous layer (not include unit 0).

      $$ \begin{align} \Theta^{(\text{previous})} &: \text{previous layer} \to \text{current layer} \\ \Theta^{(\text{previous})} &\in \mathbb{R}^{\text{current} \times (\text{previous}+1)} \end{align} $$

      Neural network 2

      $$ a_{k\in \text{current units}}^{(\text{current layer})} = g\left( \sum_{i\in \text{prev units}} \Theta_{ki}^{(\text{prev layer})} a_i^{(\text{prev layer})} \right) $$

Forward propagation: vectorized implementation

Neural network 3

$$ \begin{align} z^{(2)} &= \Theta^{(1)} a^{(1)}\\ a^{(2)} &= g(z^{(2)}) \end{align} $$
  • Neural network is look like logistic regression, except that instead of using $x_1, x_2, \ldots$, it’s using $a^{(j)}_1, a^{(j)}_2, \ldots$
  • Neural network learning its own features.
  • Architectures = how different neurons connected to each other.
  • Final

    $$ \begin{align} h_{\theta}(x) = a^{(j+1)} = g(z^{(j+1)}) = g(\Theta^{(j)}a^{(j)}). \end{align} $$

Examples and Intuitions I

Neural network 4

Neural network 5

Examples and Intuitions II

Neural network 6

Neural network 7

Multiclass classification

Neural network 8

Neural network 9


Exercise Programmation: Multi-class Classification and Neural Networks

settings_backup_restore See again Multiclass classification: one-vs-all.
settings_backup_restore See again Advanced Optimization.
  • lrCostFunction.m

    This ex is the same with the one in the previous week.

    h = sigmoid(X*theta); % hypothesis
    J = 1/m * ( -y' * log(h) - (1-y)' * log(1-h) ) + lambda/(2*m) * sum(theta(2:end).^2);
      
    grad(1,1) = 1/m * X(:,1)' * (h-y);
    grad(2:end,1) = 1/m * X(:,2:end)' * (h-y) + lambda/m * theta(2:end,1);
    
  • oneVsAll.m

    initial_theta = zeros(n + 1, 1);
    options = optimset('GradObj', 'on', 'MaxIter', 50);
    for c = 1:num_labels
    	  [theta] = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options);
    	  all_theta(c,:) = theta(:);
    end
    

    info fmincg works similarly to fminunc, but is more more efficient for dealing with a large number of parameters.

  • predictOneVsAll.m

    h = X * all_theta';
    [~, p] = max(h,[],2);
    
  • predict.m : “you implemented multi-class logistic regression to recognize handwritten digits. However, logistic regression cannot form more complex hypotheses as it is only a linear classifier.”

      X = [ones(m, 1) X]; % add column 1 to a2
      z2 = X * Theta1';
      a2 = sigmoid(z2);
      a2 = [ones(m, 1) a2]; % add column 1 to a2
      h = a2 * Theta2';
      [~, p] = max(h,[],2);
    
keyboard_arrow_right Next to Week 5.
Top