Knowledge Distillation with Pruning and Low-Rank Representations Paper
Project Overview
- This project focuses on compressing large-scale machine learning models while maintaining their accuracy. It explores the effectiveness of knowledge distillation, pruning, and low-rank representations in neural networks, specifically for Vision Transformers (ViTs) in image classification and BERT in natural language processing.
Methodology and Key Techniques
- Knowledge Distillation: Transferring knowledge from a larger "teacher" model to a smaller "student" model, enhancing the student's performance using the teacher's soft outputs.
- Pruning: Reducing neural network complexity by removing less important parameters through structured pruning (blocks of weights), global pruning (smallest weights), and layer pruning (entire transformer blocks).
- Low-Rank Representations: Using two smaller matrices to represent a full-rank matrix, reducing model size and computational needs.
Key Achievements and Findings
- Vision Transformer (ViT): Achieved a 76.08x reduction in parameter count and increased the student model's accuracy to 75.78% by combining knowledge distillation with structured pruning.
- BERT Model: Achieved a 59.7% compression rate and 85.3% accuracy using unstructured pruning and knowledge distillation on the GLUE SST-2 dataset.
- Low-Rank Compression: Achieved a 9x compression rate on ResNet18 with minimal accuracy loss, recovering performance through knowledge distillation.
- The layered approach to model compression, using knowledge distillation followed by pruning, could optimize neural networks in resource-constrained environments.