A collection of PyTorch-based computer vision projects spanning image classification, video action recognition, knowledge distillation, and model compression. Most repositories are re-implementations or extensions of published conference papers, developed for academic exploration and prototyping.
- Image Classification — ZCls → RotNet → ZCls2 → facenet → metric-learning
- Video Action Recognition — X3D → TSM → TSN → SlowFast → TRN → C3D → Non-local
- Knowledge Distillation & Model Compression — SSL → KnowledgeReview → overhaul → NetworkSlimming → ReviewKD → slimming
- Fine-Grained Visual Recognition — DCL
- Infrastructure & Utilities — ZTransforms
-
ZCls — A modular image classification training framework supporting 20+ network architectures including ResNet, MobileNet, Vision Transformer, ResNeXt, RepVGG, ShuffleNet, GhostNet, MNasNet, and more. Features LMDB-accelerated data loading, mixed-precision training (AMP), gradient clipping, CutMix-MixUp augmentation, and Distributed Data Parallel (DDP). The most comprehensive training infrastructure in this organization, serving as the backbone for many other projects.
-
RotNet (archived) — Self-supervised pretraining by predicting image rotation angles as a pretext task. A simple yet effective method for learning visual representations without labels.
-
ZCls2 (archived) — Experimental successor to ZCls with faster data pipelines and NVIDIA Apex AMP support. Focused on improving training throughput and efficiency.
-
facenet — PyTorch re-implementation of FaceNet (CVPR 2015). A unified embedding for face recognition and clustering using triplet loss, with multi-GPU and mixed-precision training support.
-
metric-learning — Deep metric learning training and evaluation pipeline, implementing contrastive loss, triplet loss, and related methods for embedding learning.
-
X3D — Re-implementation of X3D: Expanding Architectures for Efficient Video Recognition (CVPR 2020). A family of 3D ConvNets that expand a base 2D architecture along multiple dimensions (frame rate, duration, resolution, width, depth) to achieve optimal speed-accuracy trade-offs.
-
TSM — Re-implementation of Temporal Shift Module (ICCV 2019). A zero-parameter, zero-FLOPs temporal reasoning module that shifts part of the channels along the temporal dimension, enabling 2D CNNs to achieve 3D CNN-level performance.
-
TSN — Re-implementation of Temporal Segment Networks (ECCV 2016). A framework for long-range temporal modeling using sparse frame sampling and segmental consensus, establishing a foundation for modern video recognition.
-
SlowFast — Re-implementation of SlowFast Networks for Video Recognition (CVPR 2019). A dual-pathway architecture with a slow pathway operating at low frame rate and a fast pathway at high frame rate, capturing both spatial semantics and motion dynamics.
-
TRN — Re-implementation of Temporal Relational Reasoning in Videos (ECCV 2018). Learns temporal relations between frames at multiple time scales through a temporal relational network module.
-
C3D — Re-implementation of Learning Spatiotemporal Features with 3D Convolutional Networks (ICCV 2015). A classic 3D ConvNet architecture that learns spatiotemporal features directly from video frames.
-
Non-local — Re-implementation of Non-local Neural Networks (CVPR 2018). Captures long-range dependencies in video by computing weighted averages of all positions in the feature map, a precursor to modern attention mechanisms.
-
SSL — Re-implementation of Learning Structured Sparsity in Deep Neural Networks (NIPS 2016). Applies group Lasso regularization to learn structured sparsity at the filter, channel, and layer levels, enabling efficient network pruning.
-
KnowledgeReview — Re-implementation of Distilling Knowledge via Knowledge Review (CVPR 2021). A distillation method that connects a student network's multiple layers to all corresponding layers of a teacher network through a "review" mechanism, significantly improving feature-based knowledge transfer.
-
overhaul — Re-implementation of A Comprehensive Overhaul of Feature Distillation (ICCV 2019). Improves feature-based knowledge distillation with a redesigned loss function based on ReLU activation patterns and margin relu.
-
NetworkSlimming — Re-implementation of Learning Efficient Convolutional Networks through Network Slimming (ICCV 2017). A channel pruning method that applies L1-norm regularization to scaling factors in Batch Normalization layers and removes channels with small scaling factors.
-
ReviewKD — A variant implementation of knowledge review distillation.
-
slimming — An alternative implementation of the Network Slimming paper (ICCV 2017), providing a simplified codebase for channel pruning.
- DCL — Re-implementation of Destruction and Construction Learning for Fine-grained Image Recognition (CVPR 2019). Destroys local regions of an image to force the network to learn fine-grained discriminative parts, then reconstructs global context through a construction module.
- ZTransforms — A torchvision-like data augmentation library built on top of albumentations. Provides a familiar API for image transformations with the speed and variety of albumentations backends.
All code was developed for academic exploration and prototyping. Use at your own risk.