1、Image/Video Captioning(图像/视频字幕)
- Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
- Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
⭐code
🏠project - Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
⭐code - MeaCap: Memory-Augmented Zero-shot Image Captioning
⭐code - Sieve: Multimodal Dataset Pruning using Image Captioning Models
- [EVCap: Retrieval-Augmented Image Captioning with External Visual--Name Memory for Open-World Comprehension]
- EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
- 视频描述/字幕
- Streaming Dense Video Captioning
⭐code
⭐code - Video ReCap: Recursive Captioning of Hour-Long Videos
⭐code
🏠project
🌻dataset - Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
- VideoCon: Robust Video-Language Alignment via Contrast Captions
⭐code
🏠project - Retrieval-Augmented Egocentric Video Captioning
- Streaming Dense Video Captioning
- 密集字幕
- 生成图解说明
2、Image/Video Compression(图像/视频压缩)
- 视频压缩
- 图像压缩
- Towards Backward-Compatible Continual Learning of Image Compression
⭐code - Generative Latent Coding for Ultra-Low Bitrate Image Compression
- Dual Prior Unfolding for Snapshot Compressive Imaging
- Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
- SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image
⭐code - JDEC: JPEG Decoding via Enhanced Continuous Cosine CoefficientsJPEG 解码
- Learned Lossless Image Compression based on Bit Plane Slicing
- Towards Backward-Compatible Continual Learning of Image Compression
3、Image/Video Super-Resolution(图像超分辨率)
- Image Processing GNN: Breaking Rigidity in Super-Resolution
- Learning Large-Factor EM Image Super-Resolution with Generative Priors
- Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
- Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World
- Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
⭐code - Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
- SinSR: Diffusion-Based Image Super-Resolution in a Single Step
⭐code - CAMixerSR: Only Details Need More "Attention"
- Text-guided Explorable Image Super-resolution
- CFAT: Unleashing Triangular Windows for Image Super-resolution
- SeD: Semantic-Aware Discriminator for Image Super-Resolution
- Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
- Boosting Flow-based Generative Super-Resolution Models via Learned Prior
⭐code - Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
⭐code - AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
⭐code - Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
- DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF超分辨率
- Neural Super-Resolution for Real-time Rendering with Radiance Demodulation
- Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
⭐code - Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
- CoSeR: Bridging Image and Language for Cognitive Super-Resolution
⭐code
🏠project - Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution
- Bilateral Event Mining and Complementary for Event Stream Super-Resolution
- 盲图像超分辨率
- 真实世界超分辨率 Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution
- VSR
- Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
- Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
- Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
🏠project - Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
⭐code
- 文本图像超分
4、Image Classification(图像分类)
- Fair-VPT: Fair Visual Prompt Tuning for Image Classification
- Logarithmic Lenses: Exploring Log RGB Data for Image Classification
- SLICE: Stabilized LIME for Consistent Explanations for Image Classification
- Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
- MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
- SURE: SUrvey REcipes for building reliable and robust deep networks
⭐code - A Bayesian Approach to OOD Robustness in Image Classification
- Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
- Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
⭐code - Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
⭐code - Deep Imbalanced Regression via Hierarchical Classification Adjustment
- Large Language Models are Good Prompt Learners for Low-Shot Image Classification
⭐code - Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
- Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
- Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
⭐code - In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
- 域泛化图像分类
- 长尾识别
- 小样本图像分类
- 零样本分类
- 细粒度
- 开集分类
- 小样本识别
- GCD(广义类别发现)