Hi! I'm a PhD student at Ropert: Robotics, Computer Vision and Artificial Intelligence group at the University of Zaragoza (Unizar), Spain, supervised by Dr. Jose J. Guerrero since 2022.
My work revolves around Egocentric Vision, focusing on how it can enhance the way humans understand and interact with their surroundings. Specially, a key aspect of my work is addressing the computational demands of Multimodal models. I also have experience working with Multi-Camera Systems, including managing their inherent difficulties such as drastic viewpoint changes and camera calibration.
PhD in Computer Vision, 2022-Present
University of Zaragoza
MSc in Industrial Engineering, specialty in Industrial Automation and Robotics, 2019-2021
University of Zaragoza
BSc in Industrial Technologies Engineering, 2015-2019
University of Zaragoza
Predoctoral Researcher in Computer Vision:
Teaching Assistant:
Real-time simulator of prosthetic vision (SPV) that uses communication between a Windows computer and an Ubuntu computer through a TCP/IP socket. Supervised by Dr. Jesús Bermúdez Cameo and Dr. Alejandro Pérez Yus
Existing methods for egocentric action recognition often rely solely on RGB videos, although additional modalities, e.g., audio, can improve accuracy in challenging scenarios. However, most multimodal approaches assume all modalities are available at inference, leading to significant accuracy drops, or even failure, when inputs are missing. To address this, we introduce KARMMA, a multimodal Knowledge distillation approach for egocentric Action Recognition robust to Missing ModAlities that requires no modality alignment across all samples during training or inference. KARMMA distills knowledge from a multimodal teacher into a multimodal student that benefits from all available modalities while remaining robust to missing ones, making it suitable for diverse scenarios without retraining. Our student uses approximately 50% fewer computational resources than our teacher, resulting in a lightweight and fast model. Experiments on Epic-Kitchens and Something-Something show that our student achieves competitive accuracy while significantly reducing accuracy drops under missing modality conditions.