New Initialization Mechanisms for Convolutional Neural Networks

Keyu Long, Tiffany Yu, Grace Lam, Daniel Shi, Licheng Hu
Github Repo Report

Introduction

Neural networks, mirroring the human brain’s pattern recognition abilities, have revolutionized machine learning. Their success spans various fields, from healthcare to finance, showcasing exceptional problem-solving and data interpretation capabilities. As a neural net- works increasingly outperform traditional algorithms, it becomes crucial to unravel their complex learning mechanisms. Previous studies Beaglehole et al. (2023) and Radhakrishnan et al. (2023) formulated Convolutional Neural Feature Ansatz, demonstrating that features selected by convolutional networks can be recovered by computing the average gradient outer product (AGOP) of the trained network with respect to image patches given by empirical covariance matrices of filters at any given layer. Concurrently, these investigations identified an Average Gradient Outer Product (AGOP) and Neural Feature Matrix (NFM) as key elements characterizing feature learning in neural networks. Meanwhile, another critical aspect of deep learning that influences neural network performance- mance and convergence is the method of initialization. Proper Initialization is crucial; due to the use of backpropagation in neural networks, improper initialization can lead to the vanishing or exploding gradient problem, thereby affecting the overall training process. Our study aims to combine the concepts of NFM and AGOP with initialization methods. This exploration seeks to address the question: How does the application of the Neural Feature Matrix and Average Gradient Outer Product as initialization affect the performance of neural networks?

Feature Learning with NFM(Neural Feature Matrix) and AGOP(Average Gradient Outer Product)

Screenshot 2024-03-14 at 4 24 55 PM

output

Dataset

To investigate the application of the Neural Feature Matrix (NFM) and Average Gradient Outer Product (AGOP) as initialization methods, we will be examining their performance across four different datasets: SVHN, CIFAR-10, CIFAR-100, and Tiny ImageNet.

Methods Overview

Training graphs

cifar_100_acc

In our training graph we can see both training and validation accuarcy of Kaiming_NFM outperform the Kaiming_uniform which is the default initialization method used in pytorch on CIFAR-100. We also ploted the difference of validation loss and training loss for each initialization method acrossed training process. When the difference is negative, the validation loss is lower than the training loss. This can be a sign of underfitting, meaning the model is not capturing the underlying trends in the training data enough. When the difference is positive (above 0 on the y-axis), the validation loss is higher than the training loss and indicate overfitting. From the graph, our initialization method of agop and nfm showed robust to overfit than other initialization method and Kaiming_nfm also outperformed among all initialization methods with Kaiming initialization scaling.

Results and Disscussion

table

NFM initialization outperformed all other initialization methods, achieving the highest validation accuracy across all the datasets. With Kaiming NFM, in particular, achieving remarkable results in 3 of the 4 datasets. It was able to attain highest validation accuracies of 93.23% on SVHN, 48.33% on CIFAR100, and 33.78% on Tiny-ImageNet.

Our initialization methods can be viewed as a soft way of transferring learning. We used the pre-trained model on ImageNet and train the model on simpler datasets. It is worth investigating what if we initialize with model pre-trained on simpler datasets and train on more complex datasets. Some other potential areas of improvement include:

Conclusion

In conclusion, our study observed the impact of advanced initialization methods on the performance of neural networks, with a specific focus on the Neural Feature Matrix (NFM) and Average Gradient Outer Product (AGOP). The investigation centered on the application of these methods to the VGG11 model, trained on a range of datasets from SVHN to Tiny ImageNet.

Our findings suggest that the integration of NFM and AGOP with traditional initialization methods can lead to substantial improvements in validation accuracy. Notably, the Kaiming NFM initialization outperformed standard practices in several datasets, marking a signifi- cant step forward in neural network training strategies. The implications of this advance- ment are profound, offering potential enhancements in various applications where deep learning models are pivotal.

Acknowledgement and References

[1] Daniel Beaglehole, Adityanarayanan Radhakrishnan, Parthe Pandit, and Mikhail Belkin. Mechanism of feature learning in convolutional neural networks, 2023.

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, 2015.

[3] Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit, and Mikhail Belkin.Mechanism for feature learning in neural networks and backpropagation free machine learning models. Science, 0(0).