Skip to content
This organization was marked as archived by an administrator on Jun 6, 2026. It is no longer maintained.
@ZJCV

ZJCV

computer vision & deep learning

A collection of PyTorch-based computer vision projects spanning image classification, video action recognition, knowledge distillation, and model compression. Most repositories are re-implementations or extensions of published conference papers, developed for academic exploration and prototyping.


Overview

  • Image Classification — ZCls → RotNet → ZCls2 → facenet → metric-learning
  • Video Action Recognition — X3D → TSM → TSN → SlowFast → TRN → C3D → Non-local
  • Knowledge Distillation & Model Compression — SSL → KnowledgeReview → overhaul → NetworkSlimming → ReviewKD → slimming
  • Fine-Grained Visual Recognition — DCL
  • Infrastructure & Utilities — ZTransforms

Image Classification & Representation Learning

  • ZCls — A modular image classification training framework supporting 20+ network architectures including ResNet, MobileNet, Vision Transformer, ResNeXt, RepVGG, ShuffleNet, GhostNet, MNasNet, and more. Features LMDB-accelerated data loading, mixed-precision training (AMP), gradient clipping, CutMix-MixUp augmentation, and Distributed Data Parallel (DDP). The most comprehensive training infrastructure in this organization, serving as the backbone for many other projects.

  • RotNet (archived) — Self-supervised pretraining by predicting image rotation angles as a pretext task. A simple yet effective method for learning visual representations without labels.

  • ZCls2 (archived) — Experimental successor to ZCls with faster data pipelines and NVIDIA Apex AMP support. Focused on improving training throughput and efficiency.

  • facenet — PyTorch re-implementation of FaceNet (CVPR 2015). A unified embedding for face recognition and clustering using triplet loss, with multi-GPU and mixed-precision training support.

  • metric-learning — Deep metric learning training and evaluation pipeline, implementing contrastive loss, triplet loss, and related methods for embedding learning.

Video Action Recognition

  • X3D — Re-implementation of X3D: Expanding Architectures for Efficient Video Recognition (CVPR 2020). A family of 3D ConvNets that expand a base 2D architecture along multiple dimensions (frame rate, duration, resolution, width, depth) to achieve optimal speed-accuracy trade-offs.

  • TSM — Re-implementation of Temporal Shift Module (ICCV 2019). A zero-parameter, zero-FLOPs temporal reasoning module that shifts part of the channels along the temporal dimension, enabling 2D CNNs to achieve 3D CNN-level performance.

  • TSN — Re-implementation of Temporal Segment Networks (ECCV 2016). A framework for long-range temporal modeling using sparse frame sampling and segmental consensus, establishing a foundation for modern video recognition.

  • SlowFast — Re-implementation of SlowFast Networks for Video Recognition (CVPR 2019). A dual-pathway architecture with a slow pathway operating at low frame rate and a fast pathway at high frame rate, capturing both spatial semantics and motion dynamics.

  • TRN — Re-implementation of Temporal Relational Reasoning in Videos (ECCV 2018). Learns temporal relations between frames at multiple time scales through a temporal relational network module.

  • C3D — Re-implementation of Learning Spatiotemporal Features with 3D Convolutional Networks (ICCV 2015). A classic 3D ConvNet architecture that learns spatiotemporal features directly from video frames.

  • Non-local — Re-implementation of Non-local Neural Networks (CVPR 2018). Captures long-range dependencies in video by computing weighted averages of all positions in the feature map, a precursor to modern attention mechanisms.

Knowledge Distillation & Model Compression

  • SSL — Re-implementation of Learning Structured Sparsity in Deep Neural Networks (NIPS 2016). Applies group Lasso regularization to learn structured sparsity at the filter, channel, and layer levels, enabling efficient network pruning.

  • KnowledgeReview — Re-implementation of Distilling Knowledge via Knowledge Review (CVPR 2021). A distillation method that connects a student network's multiple layers to all corresponding layers of a teacher network through a "review" mechanism, significantly improving feature-based knowledge transfer.

  • overhaul — Re-implementation of A Comprehensive Overhaul of Feature Distillation (ICCV 2019). Improves feature-based knowledge distillation with a redesigned loss function based on ReLU activation patterns and margin relu.

  • NetworkSlimming — Re-implementation of Learning Efficient Convolutional Networks through Network Slimming (ICCV 2017). A channel pruning method that applies L1-norm regularization to scaling factors in Batch Normalization layers and removes channels with small scaling factors.

  • ReviewKD — A variant implementation of knowledge review distillation.

  • slimming — An alternative implementation of the Network Slimming paper (ICCV 2017), providing a simplified codebase for channel pruning.

Fine-Grained Visual Recognition

  • DCL — Re-implementation of Destruction and Construction Learning for Fine-grained Image Recognition (CVPR 2019). Destroys local regions of an image to force the network to learn fine-grained discriminative parts, then reconstructs global context through a construction module.

Infrastructure & Utilities

  • ZTransforms — A torchvision-like data augmentation library built on top of albumentations. Provides a familiar API for image transformations with the speed and variety of albumentations backends.

All code was developed for academic exploration and prototyping. Use at your own risk.

Pinned Loading

  1. overhaul overhaul Public archive

    [ICCV 2019] A Comprehensive Overhaul of Feature Distillation

    Python 7

  2. SSL SSL Public archive

    [NIPS 2016] Learning Structured Sparsity in Deep Neural Networks

    Python 20 5

  3. RotNet RotNet Public archive

    Image rotation correction based on DeepLearning

    Python 28 7

  4. TSN TSN Public archive

    [ECCV 2016] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

    Python 7 3

  5. X3D X3D Public archive

    [CVPR 2020] X3D: Expanding Architectures for Efficient Video Recognition

    Python 23 4

  6. DCL DCL Public archive

    [CVPR 2019] Destruction and Construction Learning for Fine-grained Image Recognition

    Python 1

Repositories

Showing 10 of 20 repositories

Top languages

Loading…

Most used topics

Loading…