#Hopfield Net
- So far, neural networks for computation are all feedforward structures
#Loopy network

- Each neuron is a perceptron with +1/-1 output
- Every neuron receives input from every other neuron
- Every neuron outputs signals to every other neuron

- At each time each neuron receives a βfieldβ $\sum_{j \neq i} w_{j i} y_{j}+b_{i}$
- If the sign of the field matches its own sign, it does not respond
- If the sign of the field opposes its own sign, it βflipsβ to match the sign of the field

- If the sign of the field at any neuron opposes its own sign, it βflipsβ to match the field
- Which will change the field at other nodes
- Which may then flip... and so on...
#Filp behavior
-
Let $y^{-}_{i}$ be the output of the $i$-th neuron just before it responds to the current field
-
Let $y_{i}^{+}$ be the output of the $i$-th neuron just after it responds to the current field
-
if $y_{i}^{-}=\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$, then $y_{i}^{+} = -y_{i}^{-}$
-
If the sign of the field matches its own sign, it does not flip
-
$$ y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=0 $$
-
-
if $y_{i}^{-}\neq\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$, then $y_{i}^{+} = -y_{i}^{-}$
-
$$ y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=2 y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) $$
-
This term is always positive!
-
-
Every flip of a neuron is guaranteed to locally increase $y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$
#Globally
- Consider the following sum across all nodes
$$
\begin{array}{c}
D\left(y_{1}, y_{2}, \ldots, y_{N}\right)=\sum_{i} y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) \\
=\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i}
\end{array}
$$
- Assume $w_{ii} = 0$
- For any unit $k$ that βflipsβ because of the local field
$$ \Delta D\left(y_{k}\right)=D\left(y_{1}, \ldots, y_{k}^{+}, \ldots, y_{N}\right)-D\left(y_{1}, \ldots, y_{k}^{-}, \ldots, y_{N}\right) $$
$$ \Delta D\left(y_{k}\right)=\left(y_{k}^{+}-y_{k}^{-}\right)\left(\sum_{j \neq k} w_{j k} y_{j}+b_{k}\right) $$
- This is always positive!
- Every flip of a unit results in an increase in $D$
#Overall
- Flipping a unit will result in an increase (non-decrease) of
$$ D=\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i} $$
- $D$ is bounded
$$ D_{\max }=\sum_{i, j \neq i}\left|w_{i j}\right|+\sum_{i}\left|b_{i}\right| $$
- The minimum increment of $D$ in a flip is
$$ \Delta D_{\min }=\min _{i,{y_{i}, i=1 . \ldots N}} 2|\sum_{j \neq i} w_{j i} y_{j}+b_{i}| $$
- Any sequence of flips must converge in a finite number of steps
- Think of this as an infinite deep network where every weights at every layers are identical
- Find the maximum layer!
#The Energy of a Hopfield Net
- Define the Energy of the network as
$$ E=-\sum_{i, j \neq i} w_{i j} y_{i} y_{j}-\sum_{i} b_{i} y_{i} $$
- Just the negative of $D$
- The evolution of a Hopfield network constantly decreases its energy
- This is analogous to the potential energy of a spin glass(Magnetic diploes)
- The system will evolve until the energy hits a local minimum
- We remove bias for better understanding

- The network will evolve until it arrives at a local minimum in the energy contour
#Content-addressable memory

- Each of the minima is a βstoredβ pattern
- If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern
- This is a content addressable memory
- Recall memory content from partial or corrupt values
- Also called associative memory
- Evolve and recall pattern by content, not by location
#Evolution

- The network will evolve until it arrives at a local minimum in the energy contour
- We proved that every change in the network will result in decrease in energy
- So path to energy minimum is monotonic
#For 2-neuron net

- Symmetric
- $-\frac{1}{2} \mathbf{y}^{T} \mathbf{W} \mathbf{y}=-\frac{1}{2}(-\mathbf{y})^{T} \mathbf{W}(-\mathbf{y})$
- If $\hat{y}$ is a local minimum, so is $-\hat{y}$
#Computational algorithm

- Very simple
- Updates can be done sequentially, or all at once
- Convergence when it deos not chage significantly any more
$$ E=-\sum_{i} \sum_{j>i} w_{j i} y_{j} y_{i} $$
#Issues
#Store a specific pattern
- A network can store multiple patterns
- Every stable point is a stored pattern
- So we could design the net to store multiple patterns
- Remember that every stored pattern $P$ is actually two stored patterns, $P$ and $-P$
- How could the quadrtic function have multiple minimum? (Convex function)
- Input has constrain (belong to $(-1,1)$ )
- Hebbian learning: $w_{j i}=y_{j} y_{i}$
- Design a stationary pattern
- $\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=y_{i} \quad \forall i$
- So
- $\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=\operatorname{sign}\left(\sum_{j \neq i} y_{j} y_{i} y_{j}\right)$
- $\quad=\operatorname{sign}\left(\sum_{j \neq i} y_{j}^{2} y_{i}\right)=\operatorname{sign}\left(y_{i}\right)=y_{i}$
- Energy
- $\begin{aligned} E=&-\sum_{i} \sum_{j<i} w_{j i} y_{j} y_{i}=-\sum_{i} \sum_{j<i} y_{i}^{2} y_{j}^{2} \\ &=-\sum_{i} \sum_{j<i} 1=-0.5 N(N-1) \end{aligned}$
- This is the lowest possible energy value for the network

- Stored pattern has lowest energy
- No matter where it begin, it will evolve into yellow pattern(lowest energy)
#How many patterns can we store?
- To store more than one pattern
$$ w_{j i}=\sum_{\mathbf{y}_{p} \in\left{\mathbf{y}_{p}\right}} y_{i}^{p} y_{j}^{p} $$
- ${y_P}$ is the set of patterns to store
- Super/subscript $p$ represents the specific pattern
- Hopfield: For a network of neurons can store up to ~$0.15N$ patterns through Hebbian learning(Provided in PPT)
#Orthogonal/ Non-orthogonal patterns
- Orthogonal patterns
-
-
Patterns are local minima (stationary and stable)
- No other local minima exist
- But patterns perfectly confusable for recall
-
- Non-orthogonal patterns
-
- Patterns are local minima (stationary and stable)
- No other local minima exist
- Actual wells for patterns
- Patterns may be perfectly recalled! (Note K > 0.14 N)
- No other local minima exist
-
- Two orthogonal 6-bit patterns
-
- Perfectly stationary and stable
- Several spurious βfake-memoryβ local minima..
-
#Observations
-
Many βparasiticβ patterns
- Undesired patterns that also become stable or attractors
-
Patterns that are non-orthogonal easier to remember
- I.e. patterns that are closer are easier to remember than patterns that are farther!!
-
Seems possible to store K > 0.14N patterns
- i.e. obtain a weight matrix W such that K > 0.14N patterns are stationary
- Possible to make more than 0.14N patterns at-least 1-bit stable