Knowledge Distillation with Pruning and Low-Rank Representations Paper

Project Overview

  • This project focuses on compressing large-scale machine learning models while maintaining their accuracy. It explores the effectiveness of knowledge distillation, pruning, and low-rank representations in neural networks, specifically for Vision Transformers (ViTs) in image classification and BERT in natural language processing.

Methodology and Key Techniques

  • Knowledge Distillation: Transferring knowledge from a larger "teacher" model to a smaller "student" model, enhancing the student's performance using the teacher's soft outputs.
  • Pruning: Reducing neural network complexity by removing less important parameters through structured pruning (blocks of weights), global pruning (smallest weights), and layer pruning (entire transformer blocks).
  • Low-Rank Representations: Using two smaller matrices to represent a full-rank matrix, reducing model size and computational needs.

Key Achievements and Findings

  • Vision Transformer (ViT): Achieved a 76.08x reduction in parameter count and increased the student model's accuracy to 75.78% by combining knowledge distillation with structured pruning.
  • BERT Model: Achieved a 59.7% compression rate and 85.3% accuracy using unstructured pruning and knowledge distillation on the GLUE SST-2 dataset.
  • Low-Rank Compression: Achieved a 9x compression rate on ResNet18 with minimal accuracy loss, recovering performance through knowledge distillation.
  • The layered approach to model compression, using knowledge distillation followed by pruning, could optimize neural networks in resource-constrained environments.
Next
Next

Image2StyleGAN++