CVPR 2024 视频处理方向总汇（视频监控、视频理解、视频识别和视频预测等）

点云SLAM2025-01-17 2:35

1、视频处理总汇

Learning from One Continuous Video Stream
Deep Video Inverse Tone Mapping Based on Temporal Clues
VTimeLLM: Empower LLM to Grasp Video Moments
Combining Frame and GOP Embeddings for Neural Video Representation
Learning to Predict Activity Progress by Self-Supervised Video Alignment
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
vid-TLDR: Training Free Token Merging for Light-weight Video Transformer
⭐code
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
⭐code
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Understanding Video Transformers via Universal Concept Discovery
Video Recognition in Portrait Mode
🏠project
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
🏠project
Just Add π! Pose Induced Video Transformers for Understanding Activities of Daily Living
⭐code
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
$Reliable Video Teller via Equal Distance to Visual Tokens$
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens
🏠project
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
睡眠监测
- SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
视频理解
- Compositional Video Understanding with Spatiotemporal Structure-based Transformers
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos
- HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
- A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
  🏠project
- Koala: Key Frame-Conditioned Long Video-LLM
- MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
  ⭐code
- Abductive Ego-View Accident Video Understanding for Safe Driving Perception
  🏠project
- OmniVid: A Generative Framework for Universal Video Understanding
  ⭐code
- A Unified Framework for Human-centric Point Cloud Video Understanding
- Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
- MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
  🏠project
- TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
  ⭐code
- Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
  ⭐code
视频摘要
- Previously on ... From Recaps to Story Summarization
  🏠project
- Scaling Up Video Summarization Pretraining with Large Language Models
- CSTA: CNN-based Spatiotemporal Attention for Video Summarization
  ⭐code
视频重建
- HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
  ⭐code
视频表示
- DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
  🏠project
视频判读
- Visual Objectification in Films: Towards a New AI Task for Video Interpretation
电影描述
- MICap: A Unified Model for Identity-Aware Movie Descriptions
  🏠project
视频监控
- Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges
  🌻dataset
视频预测
- Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
- ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
  ⭐code
  🏠project
视频稳定
- Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
- 3D Multi-frame Fusion for Video Stabilization
视频识别
- OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
  ⭐code
  🏠project
视频对话
- BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
  ⭐code
视频重照明
- Real-time 3D-aware Portrait Video Relighting
视频和谐化
- Video Harmonization with Triplet Spatio-Temporal Variation Patterns
  👍VILP
视频帧插值
- Video Frame Interpolation via Direct Synthesis with the Event-based Reference
- IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
- EVS-assisted Joint Deblurring Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
- TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
- Sparse Global Matching for Video Frame Interpolation with Large Motion
  ⭐code
- Perception-Oriented Video Frame Interpolation via Asymmetric Blending
  ⭐code
  👍视频插帧视觉效果新突破！上海交大提出PerVFI，视频插帧新范式
- SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
  🏠project
视频主题交换
- VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
  🏠project
视频异常检测
- Open-Vocabulary Video Anomaly Detection
- Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
- Harnessing Large Language Models for Training-free Video Anomaly Detection
  ⭐code
- Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
  ⭐code
- Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
- PREGO: Online Mistake Detection in PRocedural EGOcentric Videos
- Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
  ⭐code
- Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
- GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
  🏠project大型多模态模型能否检测视频游戏故障
视频场景检测
- Neighbor Relations Matter in Video Scene Detection
视频镜像检测
- Effective Video Mirror Detection with Inconsistent Motion Cues
自动生成电影预告片
- Towards Automated Movie Trailer Generation
视频对话式音乐推荐系统
- MuseChat: A Conversational Music Recommendation System for Videos
Video Paragraph Grounding
- Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
video Grounding
- SnAG: Scalable and Accurate Video Grounding
  ⭐code
- Context-Guided Spatio-Temporal Video Grounding
  ⭐code
- Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
- What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

上一篇：Qt C++ QStatusbar 显示表示状态的图片

下一篇：文档智能：OCR+Rocketqa+layoutxlm ＜Rocketqa＞

热门推荐

01Qwen3-Coder 快速上手教程 | Qwen Code + Claude Code 02vue数据变化但页面不变 03全球最强模型Grok4，国内已可免费使用！（附教程）04KGG转MP3工具|非KGM文件|解密音频 05干翻 Typora！MilkUp：完全免费的桌面端 Markdown 编辑器！06【2025.7.18】更新vscode后所有.vue文件template标签后报红的临时解决办法，Vue - Official 插件3.0.2导致 07ChatGPT Agent 完全使用指南：2025年7月最新功能详解 08扣子开源本地部署教程丨Coze智能体小白喂饭级指南 09这次领先Cursor！体验了Trae 2.0 SOLO 模式，超酷！10《魔兽世界》提示lua警告的含义及解决方法