OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks.- Multistain Pretraining for Slide Representation Learning in Pathology.- T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy.- Harmonizing knowledge Transfer in Neural Network with Unified Distillation.- Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data.- Click Prompt Learning with Optimal Transport for Interactive Segmentation.- 3D Human Pose Estimation via Non-Causal Retentive Networks.- OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection.
- 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry.- Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging.- Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition.- Enhancing Tampered Text Detection through Frequency Feature Fusion and Decomposition.- Modeling Label Correlations with Latent Context for Multi-Label Recognition.- LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model.- Finding a needle in a haystack: A Black-Box Approach to Invisible Watermark Detection.- DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction.
- MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos.- ARoFace: Alignment Robustness to Improve Low-quality Face Recognition.- Learning Diffusion Models for Multi-View Anomaly Detection.- Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation.- Multi-modal Relation Distillation for Unified 3D Representation Learning.- Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization.- Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation.- Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification.
- MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation.- LongVLM: Efficient Long Video Understanding via Large Language Models.- The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.