Spline-based Transformers.- Learning Pseudo 3D Guidance for View-consistent Texturing with 2D Diffusion.- TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly.- SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data.- Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models.- Adversarial Diffusion Distillation.- Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection.- Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts.
- Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation.- A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis.- Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models.- Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information.- Improving Diffusion Models for Authentic Virtual Try-on in the Wild.- Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models.- LISO: Lidar-only Self-Supervised 3D Object Detection.- Text-Conditioned Resampler For Long Form Video Understanding.
- Implicit Steganography Beyond the Constraints of Modality.- Using My Artistic Style? You Must Obtain My Authorization.- LookupViT: Compressing visual information to a limited number of tokens.- Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation.- UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration.- Non-transferable Pruning.- A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis.- Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations.
- Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning.- Affine steerers for structured keypoint description.