Audio-Synchronized Visual Animation.- Expressive Whole-Body 3D Gaussian Avatar.- Canonical Shape Projection is All You Need for 3D Few-shot Class Incremental Learning.- Controllable Human-Object Interaction Synthesis.- High-Fidelity and Transferable NeRF Editing by Frequency Decomposition.- DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects.- PAV: Personalized Head Avatar from Unstructured Video Collection.- Strike a Balance in Continual Panoptic Segmentation.
- In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation.- MultiDelete for Multimodal Machine Unlearning.- Unified Local-Cloud Decision-Making via Reinforcement Learning.- UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model.- Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation.- Efficient Frequency-Domain Image Deraining with Contrastive Regularization.- Stitched ViTs are Flexible Vision Backbones.- TrajPrompt: Aligning Color Trajectory with Vision-Language Representations.
- SemReg: Semantics Constrained Point Cloud Registration.- Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views.- RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception.- ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer.- Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting.- AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation.- SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition.- R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding.
- Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors.- Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering.- DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models.