Knowledge Distillation with Pruning and Low-Rank Representation Paper

Knowledge Distillation with Pruning and Low-Rank Representations Paper

This project focuses on compressing large-scale machine learning models while maintaining their accuracy. It explores the effectiveness of knowledge distillation, pruning, and low-rank representations in neural networks, specifically for Vision Transformers (ViTs) in image classification and BERT in natural language processing.

Knowledge Distillation: Transferring knowledge from a larger "teacher" model to a smaller "student" model, enhancing the student's performance using the teacher's soft outputs.
Pruning: Reducing neural network complexity by removing less important parameters through structured pruning (blocks of weights), global pruning (smallest weights), and layer pruning (entire transformer blocks).
Low-Rank Representations: Using two smaller matrices to represent a full-rank matrix, reducing model size and computational needs.

Vision Transformer (ViT): Achieved a 76.08x reduction in parameter count and increased the student model's accuracy to 75.78% by combining knowledge distillation with structured pruning.
BERT Model: Achieved a 59.7% compression rate and 85.3% accuracy using unstructured pruning and knowledge distillation on the GLUE SST-2 dataset.
Low-Rank Compression: Achieved a 9x compression rate on ResNet18 with minimal accuracy loss, recovering performance through knowledge distillation.
The layered approach to model compression, using knowledge distillation followed by pruning, could optimize neural networks in resource-constrained environments.