Research Focus
My doctoral research at the University of Trento focuses on advancing multimodal AI systems, particularly in the areas of computer vision and natural language processing. Under the guidance of Professor Nicu Sebe, I am exploring innovative approaches to image generation, evaluation, and optimization.
Key Research Areas
-
Image Generation Evaluation: Developing comprehensive evaluation frameworks for generative models, including:
- ViCE (Visual Concept Evaluation): A novel framework that mimics human cognitive behavior to assess consistency between generated images and their corresponding prompts, combining Large Language Models (LLMs) and Visual Question Answering (VQA) in a unified pipeline.
- DICE (DIfference Coherence Estimator): A multimodal approach for detecting and evaluating instruction-guided image edits using Multimodal Large Language Models, with strong correlation to human judgment.
-
Diffusion Models Optimization: Creating methodologies to optimize resource consumption in diffusion models through early hallucination detection (HEaD). My research focuses on computational efficiency for complex generative tasks, reducing generation time while maintaining output quality.
-
Multimodal AI Systems: Investigating the integration of visual and textual information in AI systems to enhance performance and create more human-like evaluation metrics for generative models. This includes developing novel approaches for instruction-guided image editing evaluation and coherence assessment.
Publications
My PhD research has contributed to multiple publications in prestigious conferences, including papers at the ACM International Conference on Multimedia, the European Conference on Computer Vision Workshops, and the International Conference on Computer Vision (ICCV 2025), exploring topics in generative AI, multimodal evaluation methodologies, and instruction-guided image editing.