1、视频处理总汇
- Learning from One Continuous Video Stream
- Deep Video Inverse Tone Mapping Based on Temporal Clues
- VTimeLLM: Empower LLM to Grasp Video Moments
- Combining Frame and GOP Embeddings for Neural Video Representation
- Learning to Predict Activity Progress by Self-Supervised Video Alignment
- CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
- vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
⭐code - Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
⭐code - Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
- Understanding Video Transformers via Universal Concept Discovery
- Video Recognition in Portrait Mode
🏠project - VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
🏠project - Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living
⭐code - A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
- [Reliable Video Teller via Equal Distance to Visual Tokens]
- Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
🏠project - Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
- Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
- 睡眠监测
- 视频理解
- Compositional Video Understanding with Spatiotemporal Structure-based Transformers
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos
- HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
- A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
🏠project - Koala: Key Frame-Conditioned Long Video-LLM
- MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
⭐code - Abductive Ego-View Accident Video Understanding for Safe Driving Perception
🏠project - OmniVid: A Generative Framework for Universal Video Understanding
⭐code - A Unified Framework for Human-centric Point Cloud Video Understanding
- Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
- MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
🏠project - TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
⭐code - Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
⭐code
- 视频摘要
- 视频重建
- 视频表示
- 视频判读
- 电影描述
- 视频监控
- 视频预测
- 视频稳定
- 视频识别
- 视频对话
- 视频重照明
- 视频和谐化
- 视频帧插值
- Video Frame Interpolation via Direct Synthesis with the Event-based Reference
- IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
- EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
- TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
- Sparse Global Matching for Video Frame Interpolation with Large Motion
⭐code - Perception-Oriented Video Frame Interpolation via Asymmetric Blending
⭐code
👍视频插帧视觉效果新突破!上海交大提出PerVFI,视频插帧新范式 - SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
🏠project
- 视频主题交换
- 视频异常检测
- Open-Vocabulary Video Anomaly Detection
- Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
- Harnessing Large Language Models for Training-free Video Anomaly Detection
⭐code - Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
⭐code - Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
- PREGO: Online Mistake Detection in PRocedural EGOcentric Videos
- Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
⭐code - Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
- GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
🏠project大型多模态模型能否检测视频游戏故障
- 视频场景检测
- 视频镜像检测
- 自动生成电影预告片
- 视频对话式音乐推荐系统
- Video Paragraph Grounding
- video Grounding
- SnAG: Scalable and Accurate Video Grounding
⭐code - Context-Guided Spatio-Temporal Video Grounding
⭐code - Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
- What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
- SnAG: Scalable and Accurate Video Grounding