Deep learning — Initiate a thin network from a wider pre-trained network model

Ryan
3 min readJan 23, 2021

My blog: https://compassinbabel.org/post/1e251080-889b-4fc8-a2fa-2458ea6fdce7

Models such as VGG-16 have huge number of parameters and it is not practical to train those models on a home computer. However, it’s a known fact that using pre-trained model can help improve the performance. A natural idea is to reduce a large pre-trained model to a reasonable size so that it becomes possible to train it on an ordinary computer.

In Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification, the authors proposed a method to build a thin network from a pre-trained network model.

First, we can have a unified way to represent layers in a neural network. Let xl

and yl denote the input and output of layer l, then both fully connected layer and convolutional layer can be represented by

where Wl is the parameter matrix of layer l

Our objective is to reduce the size of the network, which can be done by selecting a subset of it. A neural network is characterized by its weights so that question becomes how we can select a subset of weights. Given the structure of neural networks, it’s not surprising that the authors proposed a layer-by-layer approach.

The main idea is that we start from the first layer and select a subset of rows in the parameter matrix. Note that a selection of rows in the weight matrix corresponds to a selection of elements in the layer output.

For example, the figure below shows the weight matrix and the input of layer l

. As we can see, the input dimension is 5 and the output dimension is 7. To reduce the size of this layer, we could select a subset of rows in the weight matrix. The selected parts are highlighted in blue in the figure below.

In summary, the row-selection is used to reduce the size of a layer and the column selection is an adjustment of dimension.

The remaining question is how we select the rows in the weight matrix. In the paper, a method called greedy Simultaneous Orthogonal Matching Pursuit(SOMP) is used. We could find more details in the paper ALGORITHMS FOR SIMULTANEOoUS SPARSE APPROXIMATIONPART I: GREEDY PURSUIT.

--

--