# Motivation of CNN

ใScan for patternsใ

## #Movivation

- Find a word in a
**signal** of find a item in **picture**
- The need for
**shift invariance**
- The
*location* of a pattern is not important

- So we can
**scan** with a same MLP for the pattern
- Just one giant network
- Restriction: All subnets are identical

- Regular networks vs. scanning networks
- In a
**regular MLP** every neuron in a layer is connected by a **unique** weight to every unit in the previous layer
- In a
**scanning MLP** each neuron is connected to a **subset** of neurons in the previous layer
- The weights matrix is sparse
- The weights matrix is block structured with identical blocks
**The network is a shared-parameter model**

## #Modifications

**Order changed**
- Intuitivly, scan at one position and get output, then scan next place
- But we can also first scan all the position at one layer, then the next layer
- The result is the
**same**

**Distrubuting the scan**
- Evaluate small pattern in the first layer
- The higher layer implicitly learns the
**arrangement** of sub patterns that represents the **larger** pattern
**Why distribute?**
- More generalizable
- Distribution forces
*localized* patterns in lower layers

- Number of parameters
- Fewer parameters
- Significant gains from shared computation

## #Terminology

- The pattern in the
**input** image that each filter sees is its ใReceptive Fieldใ
- Stride
- Effectively increasing the granularity of the scan
- This will result in a reduction of the size of the resulting maps
- Non-overlapped strides
- Partition the output of the layer into blocks, no overlap
- Within each block only retain the highest value

- Pooling
- We would like to account for some jitter in the first-level patterns
- Max pooling
- Is just a neuron

- This entire structure is called a Convolutional Neural Network
- The 1-D scan version of the convolutional neural network is the
*time-delay neural network*
- Used primarily for speech recognition
- Max pooling optional: jitter matters in speech