Guideline for CMU Deep Learning

#Course matrial

#Notes

  • L02 What can a network represent

    • As an universal Boolean function / classifiers / approximators
    • Discuss the depth and width in network
  • L03 Learning the network

    • Empirical Risk
    • Optimization problem statement
  • L03.5 A brief note on derivatives

    • Multiple variables
    • Minimization
  • L04 Backpropagation

    • Chain rule / Subgradient
    • Backpropagation / Vector formulation
  • L05 Convergence

    • Backpropagation prefers consistency over perfection(which is good)
    • Second-order method problem / learning rate choose
  • L06 Optimization

    • Rprop / Quickprop
    • Momentum / Nestorov’s Accelerated Gradient
    • Batch / Stochastic / Mini-batch gradient descent
  • L07 Optimizers and regularizers

    • Second moments: RMS Prop / Adam
    • Batch normalization
    • Regularizer / dropout
  • L08 Motivation of CNN

    • The need for shift invariance
    • Scan network / Why distributing scan / Receptive Field / Stride / Pooling
  • L09 Cascade Correlation

    • Why Is Backprop So Slow?
    • The advantages of cascade correlation
  • L10 CNN architecture

    • Architecture / size of parameters / convolution layer / maxpooling
  • L11 Using CNNs to understand the neural basis of vision (guest lecture)

  • L12 Backpropagation in CNNs

    • Computing $\nabla_{Z(l)} D i v$ / $\nabla_{Y(l-1)} D i v$ / $\nabla_{w(l)} D i v$
      • Regular convolution running on shifted derivative maps using flipped filter
    • Derivative of Max pooling / Mean pooling
    • Transposed Convolution / Depth-wise convolution
    • Le-net 5 / AlexNet / VGGNet / Googlenet / Resnet / Densenet
  • L13 Recurrent Networks

    • Model / Architecture
    • Back Propagation Through Time
    • Bidirectional RNN
  • L14 Stability analysis and LSTMs

    • Stability: memory ability / saturate / different activation
    • Vanishing gradient
    • LSTM: architecture / forward / backward
    • Gated Recurrent Units (GRU)
  • L15 Divergence of RNN

    • One to one / Many to many / Many to one / Seq2seq divergence
  • Language modelling: Representing words

  • L16 Connectionist Temporal Classification

    • Sequence to sequence model / time synchronous / order synchronous
    • Iterative estimate output table: viterbi algorithm / expected divergence
    • Repetitive decoding problem / Beam search
  • L17 Seq2seq & Attention

    • Autoencoder / attention weight / beam search
  • L18 Representation

    • Autoencoder / non-linear manifold
  • L19 Hopfield network

    • Loopy network / energy / content-addressable memory
    • Store a specific pattern / orthogonal patterns
  • L20 Boltzmann machines 1

    • Training hopfield nets: Geometric approach / Optimization
    • Boltzmann Distribution
  • L21 Boltzmann machines 2

    • Stochastic system: Boltzmann machines
    • Training / Sampling of this model , as well as Restricted Boltzmann Machines
  • L22 Variational Autoencoders 1

    • Generative models: PCA, Mixture Gaussian, Factor analysis, Autoencoder
    • EM algorithm for generative model
  • L23 Variational Autoencoders 1

    • Non-linear Gaussian Model
    • VAEs

#Ref

#Summary

#RNN

  • Recurrent networks are poor at memorization
    • Memory can explode or vanish depending on the weights and activation
  • They also suffer from the vanishing gradient problem during training
    • Error at any time cannot affect parameter updates in the too-distant past
  • LSTMs are an alternative formalism where memory is made more directly dependent on the input, rather than network parameters/structure
    • Through a “Constant Error Carousel” memory structure with no weights or activations, but instead direct switching and “increment/decrement” from pattern recognizers
    • Do not suffer from a vanishing gradient problem but do suffer from exploding gradient issue
Load Comments?