Publications

For the most up-to-date list, see my Google Scholar profile. Bold indicates I'm a first or co-first author.
2026
arXiv 2026
DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection
Y. El Kheir, A. Das, Y. Xiao, X. Wang, F. Kallel, E. E. Erdogan, N. T. Vu, T. Polzehl, S. Möller
A unified, configuration-driven open-source framework for deepfake audio detection research: 455+ pretrained models, 12 benchmark datasets, plug-and-play SSL frontends, classifier backends, losses, and augmentations, all via YAML.
arXiv 2026
IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)
Y. El Kheir, A. Meghanani, M. Shahin, O. Ibrahim, S. A. Chowdhury, et al.
The second iteration of the Iqra'Eval shared task, accepted as an Interspeech 2026 Challenge, expanding scope to general MSA pronunciation assessment beyond Quranic recitation.
ICASSP 2026
A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection
Y. El Kheir, F. Ritter-Gutiérrez, A. Das, T. Polzehl, S. Möller
A lightweight multi-scale convolutional adapter that adapts large SSL backbones to synthetic-speech detection with a tiny parameter footprint, while keeping cross-domain robustness.
ICASSP 2026
The DFKI-SLT System for ESDD 2026: BiCrossMamba-ST with Attentive SSL Fusion
Y. El Kheir, A. Das, E. E. Erdogan, F. Kallel, T. Polzehl, S. Möller
Our DFKI-SLT submission to the ESDD 2026 Challenge: BiCrossMamba-ST combined with an attentive SSL fusion module for end-to-end speech deepfake detection.
arXiv 2026
DFKI-Speech System for WildSpoof Challenge: A Robust Framework for SASV In-the-Wild
A. Das, Y. El Kheir, E. E. Erdogan, F. Kallel, T. Polzehl, S. Möller
Our DFKI-Speech entry to the WildSpoof Challenge: a robust framework for spoofing-aware automatic speaker verification (SASV) in unconstrained real-world conditions.
IWSDS 2026
The Complementary Role of Para-linguistic Cues for Robust Pronunciation Assessment
Y. El Kheir, S. A. Chowdhury, A. Ali
Shows that integrating para-linguistic cues (prosody, voice quality) alongside phonetic content improves robustness of pronunciation assessment systems in noisy / mismatched conditions.
2025
WASPAA 2025
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Y. El Kheir, A. Das, E. E. Erdogan, F. Ritter-Gutiérrez, T. Polzehl, S. Möller
A hybrid fusion framework integrating SSL-based representations with handcrafted spectral descriptors (MFCC, LFCC, CQCC) for robust speech deepfake detection across unseen domains.
Interspeech 2025
BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention
Y. El Kheir, T. Polzehl, S. Möller
A dual-branch spectro-temporal architecture with bidirectional Mamba blocks and mutual cross-attention. +67.74% / +26.3% relative gain over AASIST on ASVSpoof LA21 / DF21 with 28% fewer parameters.
Interspeech 2025
Towards a Unified Benchmark for Arabic Pronunciation Assessment: Quranic Recitation as Case Study
Y. El Kheir, O. Ibrahim, A. Meghanani, N. Almarwani, H. O. Toyin, S. Alharbi, et al.
First unified benchmark for Arabic pronunciation assessment using Quranic recitation.
NAACL Findings 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Y. El Kheir, Y. Samih, S. Maharjan, T. Polzehl, S. Möller
First large-scale layer-wise study of Wav2Vec 2.0, HuBERT, WavLM for deepfake detection across multilingual and partial / song / scene-based scenarios. Lower layers consistently provide the most discriminative features → ~3× faster inference at comparable EER.
ArabicNLP 2025
Iqra'Eval: A Shared Task on Qur'anic Pronunciation Assessment
Y. El Kheir, A. Meghanani, H. O. Toyin, N. Almarwani, O. Ibrahim, et al.
The first shared task benchmarking Qur'anic pronunciation assessment, with task design, baselines, and lessons learned.
arXiv 2025
Fanar: An Arabic-Centric Multimodal Generative AI Platform
Fanar Team (incl. Y. El Kheir)
Arabic-centric multimodal generative AI platform supporting language, speech, and image generation with Islamic RAG.
arXiv 2025
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
E. Asgari, Y. El Kheir, M. A. S. Javaheri
A morphology-aware extension of BPE that integrates linguistic structure into subword tokenization while preserving statistical efficiency.
arXiv 2025
Generalizable Audio Spoofing Detection using Non-Semantic Representations
A. Das, Y. El Kheir, C. Franzreb, T. Herzig, T. Polzehl, S. Möller
Data in Brief 2025
CAFE: Spontaneous code-switching speech dataset in Algerian dialect, French and English
H. E. O. Lachemat, A. Akli, N. Oukas, Y. El Kheir, S. Haboussi, S. A. Chowdhury
2024
ACL 2024
Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
Y. El Kheir, H. Mubarak, A. Ali, S. A. Chowdhury
A framework for dialectal sound and vowelization recovery in Arabic, tackling borrowed and dialectal sounds in phonologically diverse languages with limited data.
ICASSP 2024
L1-aware Multilingual Mispronunciation Detection Framework
Y. El Kheir, S. A. Chowdhury, A. Ali
L1-MultiMDD: incorporates L1-aware speech representation via an L1-L2 embedding and multi-task learning to improve multilingual MDD.
ICASSP 2024
Speech Representation Analysis Based on Inter- and Intra-Model Similarities
Y. El Kheir, A. Ali, S. A. Chowdhury
EACL 2024
LAraBench: Benchmarking Arabic AI with Large Language Models
A. Abdelali, H. Mubarak, S. Chowdhury, M. Hasanain, B. Mousi, S. Boughorbel, S. Abdaljalil, Y. El Kheir, et al.
2023
EMNLP Findings 2023
Automatic Pronunciation Assessment — A Review
Y. El Kheir, A. Ali, S. A. Chowdhury
A comprehensive review of recent advances in automatic pronunciation assessment for both phonemic and prosodic aspects, covering methods, challenges, resources, and future directions.
Interspeech 2023
QVoice: Arabic Speech Pronunciation Learning Application
Y. El Kheir, F. Khnaisser, S. A. Chowdhury, H. Mubarak, S. Afzal, A. Ali
First end-to-end mispronunciation detection system for Modern Standard Arabic.
SLaTE 2023
Multi-View Multi-Task Representation Learning for Mispronunciation Detection
Y. El Kheir, S. A. Chowdhury, A. Ali
Multiple input views with auxiliary tasks yield more distinctive phonetic representations in low-resource MDD settings.
SLaTE 2023
SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation
Y. El Kheir, S. A. Chowdhury, A. Ali, H. Mubarak, S. Afzal
Fine-grained data augmentation pipeline that generates mispronunciation errors via masked, mix-factor blending of phonetic units. SOTA on Speechocean762 (+2.0 PCC) and +4.6 F1 on AraVoiceL2.
Thesis
MSc Thesis
Mispronunciation Detection with SpeechBlender Data Augmentation Pipeline
Y. El Kheir, KTH Royal Institute of Technology (2023)
Supervisors: Dr. Ahmed Ali and Dr. Shammur Chowdhury.