Boltzmann Machines 2

This lecture redefined a regular Hopfield net as a stochastic system: Boltzmann machines. And talked about the training, sampling issues of Boltzmann machines model, introduced Restricted Boltzmann Machines, which is a common used model in practice.

Boltzmann Machines 1

Training hopfield nets Geometric approach Behavior of $\mathbf{E}(\mathbf{y})=\mathbf{y}^{T} \mathbf{W y}$ with $\mathbf{W}=\mathbf{Y} \mathbf{Y}^{T}-N_{p} \mathbf{I}$ is identical to behavior with $W=YY^T$ Energy landscape only differs by an additive constant Gradients and location of minima remain same (Have the same eigen vectors) Sine : $\mathbf{y}^{T}\left(\mathbf{Y} \mathbf{Y}^{T}-N_{p} \mathbf{I}\right) \mathbf{y}=\mathbf{y}^{T} \mathbf{Y} \mathbf{Y}^{T} \mathbf{y}-N N_{p}$ We use $\mathbf{y}^{T} \mathbf{Y} \mathbf{Y}^{T} \mathbf{y}$ for analyze

Self-Supervised Learning: a suvey

Self-Supervised Representation Learning Broadly speaking, all the generative models can be considered as self-supervised, but with different goals: Generative models focus on creating diverse and realistic images While self-supervised representation learning care about producing good features generally helpful for many tasks Image based Distortion Exemplar-CNN (Dosovitskiy et al., 2015) Rotation of an entire image (Gidaris et al.

Hopfield network

Hopfield Net So far, neural networks for computation are all feedforward structures Loopy network Each neuron is a perceptron with +1/-1 output Every neuron receives input from every other neuron Every neuron outputs signals to every other neuron At each time each neuron receives a “field” $\sum_{j \neq i} w_{j i} y_{j}+b_{i}$ If the sign of the field matches its own sign, it does not respond If the sign of the field opposes its own sign, it “flips” to match the sign of the field If the sign of the field at any neuron opposes its own sign, it “flips” to match the field Which will change the field at other nodes Which may then flip.

Loss functioin in neural network

Kullback-Leibler divergence Information theory Quantify information of intuition1 Likely events should have low information content Less likely events should have higher information content Independent events should have additive information. For example, finding out that a tossed coin has come up as heads twice should convey twice as much information as finding out that a tossed coin has come…

Representation

Logistic regression This the perceptron with a sigmoid activation It actually computes the probability that the input belongs to class 1 Decision boundaries may be obtained by comparing the probability to a threshold These boundaries will be lines (hyperplanes in higher dimensions) The sigmoid perceptron is a linear classifier Estimating the model Given: Training data: $\left(X_{1}, y_{1}\right),\left(X_{2}, y_{2}\right), \ldots,\left(X_{N}, y_{N}\right)$ $X$ are vectors, $y$ are binary (0/1) class values Total probability of data $$ \begin{array}{l} P\left(\left(X_{1}, y_{1}\right),\left(X_{2}, y_{2}\right), \ldots,\left(X_{N}, y_{N}\right)\right)= \prod_{i} P\left(X_{i}, y_{i}\right) \\

炼丹心法

What is the difference between an autoencoder and an encoder decoder? 参数 神经网络的调参顺序? 神经网络中的Epoch、Iteration、Batchsize 神经网络…

Pytorch 小技巧

Tensor Torch 的广播机制 Contigious vs non-contigious tensor Optimizer 如果使用 cuda,那么要先构造 optimizers 再创建 model,参考官方文档。 Effect of calling model.cuda() after constructing an optimizer 查看参数个数…

Seq2seq and attention model

Generating Language Synthesis Input: symbols as one-hot vectors Dimensionality of the vector is the size of the 「vocabulary」 Projected down to lower-dimensional “embeddings” The hidden units are (one or more layers of) LSTM units Output at each time: A probability distribution that ideally assigns peak probability to the next word in the sequence Divergence $$ \operatorname{Div}(\mathbf{Y}_{\text {target}}(1 \ldots T), \mathbf{Y}(1 \ldots T))=\sum_{t}\operatorname{Xent}(\mathbf{Y}_{\text {target}}(t), \mathbf{Y}(t))=-\sum_{t} \log Y(t, w_{t+1}) $$