Publications
For the most up-to-date list, see my Google Scholar profile. Bold indicates I'm a first or co-first author.
2026
arXiv 2026
DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection
A unified, configuration-driven open-source framework for deepfake audio detection research: 455+ pretrained models, 12 benchmark datasets, plug-and-play SSL frontends, classifier backends, losses, and augmentations, all via YAML.
arXiv 2026
IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for Modern Standard Arabic (MSA)
The second iteration of the Iqra'Eval shared task, accepted as an Interspeech 2026 Challenge, expanding scope to general MSA pronunciation assessment beyond Quranic recitation.
ICASSP 2026
A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection
A lightweight multi-scale convolutional adapter that adapts large SSL backbones to synthetic-speech detection with a tiny parameter footprint, while keeping cross-domain robustness.
ICASSP 2026
The DFKI-SLT System for ESDD 2026: BiCrossMamba-ST with Attentive SSL Fusion
Our DFKI-SLT submission to the ESDD 2026 Challenge: BiCrossMamba-ST combined with an attentive SSL fusion module for end-to-end speech deepfake detection.
arXiv 2026
DFKI-Speech System for WildSpoof Challenge: A Robust Framework for SASV In-the-Wild
Our DFKI-Speech entry to the WildSpoof Challenge: a robust framework for spoofing-aware automatic speaker verification (SASV) in unconstrained real-world conditions.
IWSDS 2026
The Complementary Role of Para-linguistic Cues for Robust Pronunciation Assessment
Shows that integrating para-linguistic cues (prosody, voice quality) alongside phonetic content improves robustness of pronunciation assessment systems in noisy / mismatched conditions.
2025
WASPAA 2025
Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
A hybrid fusion framework integrating SSL-based representations with handcrafted spectral descriptors (MFCC, LFCC, CQCC) for robust speech deepfake detection across unseen domains.
Interspeech 2025
BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention
A dual-branch spectro-temporal architecture with bidirectional Mamba blocks and mutual cross-attention. +67.74% / +26.3% relative gain over AASIST on ASVSpoof LA21 / DF21 with 28% fewer parameters.
Interspeech 2025
Towards a Unified Benchmark for Arabic Pronunciation Assessment: Quranic Recitation as Case Study
First unified benchmark for Arabic pronunciation assessment using Quranic recitation.
NAACL Findings 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
First large-scale layer-wise study of Wav2Vec 2.0, HuBERT, WavLM for deepfake detection across multilingual and partial / song / scene-based scenarios. Lower layers consistently provide the most discriminative features → ~3× faster inference at comparable EER.
ArabicNLP 2025
Iqra'Eval: A Shared Task on Qur'anic Pronunciation Assessment
The first shared task benchmarking Qur'anic pronunciation assessment, with task design, baselines, and lessons learned.
arXiv 2025
Fanar: An Arabic-Centric Multimodal Generative AI Platform
Arabic-centric multimodal generative AI platform supporting language, speech, and image generation with Islamic RAG.
arXiv 2025
MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
A morphology-aware extension of BPE that integrates linguistic structure into subword tokenization while preserving statistical efficiency.
Data in Brief 2025
CAFE: Spontaneous code-switching speech dataset in Algerian dialect, French and English
2024
ACL 2024
Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
A framework for dialectal sound and vowelization recovery in Arabic, tackling borrowed and dialectal sounds in phonologically diverse languages with limited data.
ICASSP 2024
L1-aware Multilingual Mispronunciation Detection Framework
L1-MultiMDD: incorporates L1-aware speech representation via an L1-L2 embedding and multi-task learning to improve multilingual MDD.
ICASSP 2024
Speech Representation Analysis Based on Inter- and Intra-Model Similarities
2023
EMNLP Findings 2023
Automatic Pronunciation Assessment — A Review
A comprehensive review of recent advances in automatic pronunciation assessment for both phonemic and prosodic aspects, covering methods, challenges, resources, and future directions.
Interspeech 2023
QVoice: Arabic Speech Pronunciation Learning Application
First end-to-end mispronunciation detection system for Modern Standard Arabic.
SLaTE 2023
Multi-View Multi-Task Representation Learning for Mispronunciation Detection
Multiple input views with auxiliary tasks yield more distinctive phonetic representations in low-resource MDD settings.
SLaTE 2023
SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation
Fine-grained data augmentation pipeline that generates mispronunciation errors via masked, mix-factor blending of phonetic units. SOTA on Speechocean762 (+2.0 PCC) and +4.6 F1 on AraVoiceL2.
Thesis
MSc Thesis
Mispronunciation Detection with SpeechBlender Data Augmentation Pipeline
Supervisors: Dr. Ahmed Ali and Dr. Shammur Chowdhury.
