ZJCV

A collection of PyTorch-based computer vision projects spanning image classification, video action recognition, knowledge distillation, and model compression. Most repositories are re-implementations or extensions of published conference papers, developed for academic exploration and prototyping.

Overview

Image Classification — ZCls → RotNet → ZCls2 → facenet → metric-learning
Video Action Recognition — X3D → TSM → TSN → SlowFast → TRN → C3D → Non-local
Knowledge Distillation & Model Compression — SSL → KnowledgeReview → overhaul → NetworkSlimming → ReviewKD → slimming
Fine-Grained Visual Recognition — DCL
Infrastructure & Utilities — ZTransforms

Image Classification & Representation Learning

ZCls — A modular image classification training framework supporting 20+ network architectures including ResNet, MobileNet, Vision Transformer, ResNeXt, RepVGG, ShuffleNet, GhostNet, MNasNet, and more. Features LMDB-accelerated data loading, mixed-precision training (AMP), gradient clipping, CutMix-MixUp augmentation, and Distributed Data Parallel (DDP). The most comprehensive training infrastructure in this organization, serving as the backbone for many other projects.
RotNet (archived) — Self-supervised pretraining by predicting image rotation angles as a pretext task. A simple yet effective method for learning visual representations without labels.
ZCls2 (archived) — Experimental successor to ZCls with faster data pipelines and NVIDIA Apex AMP support. Focused on improving training throughput and efficiency.
facenet — PyTorch re-implementation of FaceNet (CVPR 2015). A unified embedding for face recognition and clustering using triplet loss, with multi-GPU and mixed-precision training support.
metric-learning — Deep metric learning training and evaluation pipeline, implementing contrastive loss, triplet loss, and related methods for embedding learning.

Video Action Recognition

X3D — Re-implementation of X3D: Expanding Architectures for Efficient Video Recognition (CVPR 2020). A family of 3D ConvNets that expand a base 2D architecture along multiple dimensions (frame rate, duration, resolution, width, depth) to achieve optimal speed-accuracy trade-offs.
TSM — Re-implementation of Temporal Shift Module (ICCV 2019). A zero-parameter, zero-FLOPs temporal reasoning module that shifts part of the channels along the temporal dimension, enabling 2D CNNs to achieve 3D CNN-level performance.
TSN — Re-implementation of Temporal Segment Networks (ECCV 2016). A framework for long-range temporal modeling using sparse frame sampling and segmental consensus, establishing a foundation for modern video recognition.
SlowFast — Re-implementation of SlowFast Networks for Video Recognition (CVPR 2019). A dual-pathway architecture with a slow pathway operating at low frame rate and a fast pathway at high frame rate, capturing both spatial semantics and motion dynamics.
TRN — Re-implementation of Temporal Relational Reasoning in Videos (ECCV 2018). Learns temporal relations between frames at multiple time scales through a temporal relational network module.
C3D — Re-implementation of Learning Spatiotemporal Features with 3D Convolutional Networks (ICCV 2015). A classic 3D ConvNet architecture that learns spatiotemporal features directly from video frames.
Non-local — Re-implementation of Non-local Neural Networks (CVPR 2018). Captures long-range dependencies in video by computing weighted averages of all positions in the feature map, a precursor to modern attention mechanisms.

Knowledge Distillation & Model Compression

SSL — Re-implementation of Learning Structured Sparsity in Deep Neural Networks (NIPS 2016). Applies group Lasso regularization to learn structured sparsity at the filter, channel, and layer levels, enabling efficient network pruning.
KnowledgeReview — Re-implementation of Distilling Knowledge via Knowledge Review (CVPR 2021). A distillation method that connects a student network's multiple layers to all corresponding layers of a teacher network through a "review" mechanism, significantly improving feature-based knowledge transfer.
overhaul — Re-implementation of A Comprehensive Overhaul of Feature Distillation (ICCV 2019). Improves feature-based knowledge distillation with a redesigned loss function based on ReLU activation patterns and margin relu.
NetworkSlimming — Re-implementation of Learning Efficient Convolutional Networks through Network Slimming (ICCV 2017). A channel pruning method that applies L1-norm regularization to scaling factors in Batch Normalization layers and removes channels with small scaling factors.
ReviewKD — A variant implementation of knowledge review distillation.
slimming — An alternative implementation of the Network Slimming paper (ICCV 2017), providing a simplified codebase for channel pruning.

Fine-Grained Visual Recognition

DCL — Re-implementation of Destruction and Construction Learning for Fine-grained Image Recognition (CVPR 2019). Destroys local regions of an image to force the network to learn fine-grained discriminative parts, then reconstructs global context through a construction module.

Infrastructure & Utilities

ZTransforms — A torchvision-like data augmentation library built on top of albumentations. Provides a familiar API for image transformations with the speed and variety of albumentations backends.

All code was developed for academic exploration and prototyping. Use at your own risk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZJCV

Overview

Image Classification & Representation Learning

Video Action Recognition

Knowledge Distillation & Model Compression

Fine-Grained Visual Recognition

Infrastructure & Utilities

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!