Publications

For the most up-to-date list, see my Google Scholar profile. Bold indicates I'm a first or co-first author.

2026

Interspeech 2026

DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection

Y. El Kheir, A. Das, Y. Xiao, X. Wang, F. Kallel, E. E. Erdogan, N. T. Vu, T. Polzehl, S. Möller

Accepted as an Interspeech 2026 Long Track paper. A unified, configuration-driven open-source framework for deepfake audio detection research: 455+ pretrained models, 12 benchmark datasets, plug-and-play SSL frontends, classifier backends, losses, and augmentations, all via YAML.

arXiv website code 🤗 models

Interspeech 2026

IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)

Y. El Kheir, A. Meghanani, M. Shahin, O. Ibrahim, S. A. Chowdhury, et al.

The second iteration of the Iqra'Eval shared task, accepted as an Interspeech 2026 Challenge paper (first author), expanding scope to general MSA pronunciation assessment beyond Quranic recitation.

arXiv

Interspeech 2026

[Title] · Collaboration with University of Stuttgart

Y. El Kheir, et al. (University of Stuttgart)

Accepted at Interspeech 2026. Collaboration with University of Stuttgart.

ICASSP 2026

A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection

Y. El Kheir, F. Ritter-Gutiérrez, A. Das, T. Polzehl, S. Möller

A lightweight multi-scale convolutional adapter that adapts large SSL backbones to synthetic-speech detection with a tiny parameter footprint, while keeping cross-domain robustness.

arXiv

ICASSP 2026

The DFKI-SLT System for ESDD 2026: BiCrossMamba-ST with Attentive SSL Fusion

Y. El Kheir, A. Das, E. E. Erdogan, F. Kallel, T. Polzehl, S. Möller

DFKI-SLT entry to the ESDD 2026 Grand Challenge (ICASSP 2026, 97 teams): ranked 1st in Track 2 (Black-Box Low-Resource, EER 0.25%) and 2nd in Track 1 (Unseen Generators, EER 0.80%). BiCrossMamba-ST combined with an attentive SSL fusion module.

challenge results paper

arXiv 2026

DFKI-Speech System for WildSpoof Challenge: A Robust Framework for SASV In-the-Wild

A. Das, Y. El Kheir, E. E. Erdogan, F. Kallel, T. Polzehl, S. Möller

Our DFKI-Speech entry to the WildSpoof Challenge: a robust framework for spoofing-aware automatic speaker verification (SASV) in unconstrained real-world conditions.

arXiv

IWSDS 2026

The Complementary Role of Para-linguistic Cues for Robust Pronunciation Assessment

Y. El Kheir, S. A. Chowdhury, A. Ali

Shows that integrating para-linguistic cues (prosody, voice quality) alongside phonetic content improves robustness of pronunciation assessment systems in noisy / mismatched conditions.

2025

WASPAA 2025

Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection

Y. El Kheir, A. Das, E. E. Erdogan, F. Ritter-Gutiérrez, T. Polzehl, S. Möller

A hybrid fusion framework integrating SSL-based representations with handcrafted spectral descriptors (MFCC, LFCC, CQCC) for robust speech deepfake detection across unseen domains.

arXiv code

Interspeech 2025

BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

Y. El Kheir, T. Polzehl, S. Möller

A dual-branch spectro-temporal architecture with bidirectional Mamba blocks and mutual cross-attention. +67.74% / +26.3% relative gain over AASIST on ASVSpoof LA21 / DF21 with 28% fewer parameters.

arXiv

Interspeech 2025

Towards a Unified Benchmark for Arabic Pronunciation Assessment: Quranic Recitation as Case Study

Y. El Kheir, O. Ibrahim, A. Meghanani, N. Almarwani, H. O. Toyin, S. Alharbi, et al.

First unified benchmark for Arabic pronunciation assessment using Quranic recitation.

NAACL Findings 2025

Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection

Y. El Kheir, Y. Samih, S. Maharjan, T. Polzehl, S. Möller

First large-scale layer-wise study of Wav2Vec 2.0, HuBERT, WavLM for deepfake detection across multilingual and partial / song / scene-based scenarios. Lower layers consistently provide the most discriminative features → ~3× faster inference at comparable EER.

arXiv

ArabicNLP 2025

Iqra'Eval: A Shared Task on Qur'anic Pronunciation Assessment

Y. El Kheir, A. Meghanani, H. O. Toyin, N. Almarwani, O. Ibrahim, et al.

The first shared task benchmarking Qur'anic pronunciation assessment, with task design, baselines, and lessons learned.

arXiv 2025

Fanar: An Arabic-Centric Multimodal Generative AI Platform

Fanar Team (incl. Y. El Kheir)

Arabic-centric multimodal generative AI platform supporting language, speech, and image generation with Islamic RAG.

arXiv

arXiv 2025

MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies

E. Asgari, Y. El Kheir, M. A. S. Javaheri

A morphology-aware extension of BPE that integrates linguistic structure into subword tokenization while preserving statistical efficiency.

arXiv

arXiv 2025

Generalizable Audio Spoofing Detection using Non-Semantic Representations

A. Das, Y. El Kheir, C. Franzreb, T. Herzig, T. Polzehl, S. Möller

arXiv

Data in Brief 2025

CAFE: Spontaneous code-switching speech dataset in Algerian dialect, French and English

H. E. O. Lachemat, A. Akli, N. Oukas, Y. El Kheir, S. Haboussi, S. A. Chowdhury

2024

ACL 2024

Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic

Y. El Kheir, H. Mubarak, A. Ali, S. A. Chowdhury

A framework for dialectal sound and vowelization recovery in Arabic, tackling borrowed and dialectal sounds in phonologically diverse languages with limited data.

arXiv

ICASSP 2024

L1-aware Multilingual Mispronunciation Detection Framework

Y. El Kheir, S. A. Chowdhury, A. Ali

L1-MultiMDD: incorporates L1-aware speech representation via an L1-L2 embedding and multi-task learning to improve multilingual MDD.

arXiv

ICASSP 2024

Speech Representation Analysis Based on Inter- and Intra-Model Similarities

Y. El Kheir, A. Ali, S. A. Chowdhury

EACL 2024

LAraBench: Benchmarking Arabic AI with Large Language Models

A. Abdelali, H. Mubarak, S. Chowdhury, M. Hasanain, B. Mousi, S. Boughorbel, S. Abdaljalil, Y. El Kheir, et al.

PDF

2023

EMNLP Findings 2023

Automatic Pronunciation Assessment — A Review

Y. El Kheir, A. Ali, S. A. Chowdhury

A comprehensive review of recent advances in automatic pronunciation assessment for both phonemic and prosodic aspects, covering methods, challenges, resources, and future directions.

arXiv

Interspeech 2023

QVoice: Arabic Speech Pronunciation Learning Application

Y. El Kheir, F. Khnaisser, S. A. Chowdhury, H. Mubarak, S. Afzal, A. Ali

First end-to-end mispronunciation detection system for Modern Standard Arabic.

arXiv

SLaTE 2023

Multi-View Multi-Task Representation Learning for Mispronunciation Detection

Y. El Kheir, S. A. Chowdhury, A. Ali

Multiple input views with auxiliary tasks yield more distinctive phonetic representations in low-resource MDD settings.

arXiv

SLaTE 2023

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

Y. El Kheir, S. A. Chowdhury, A. Ali, H. Mubarak, S. Afzal

Fine-grained data augmentation pipeline that generates mispronunciation errors via masked, mix-factor blending of phonetic units. SOTA on Speechocean762 (+2.0 PCC) and +4.6 F1 on AraVoiceL2.

Thesis

MSc Thesis

Mispronunciation Detection with SpeechBlender Data Augmentation Pipeline

Y. El Kheir, KTH Royal Institute of Technology (2023)

Supervisors: Dr. Ahmed Ali and Dr. Shammur Chowdhury.

Yassine El Kheir

Publications