Yassine El Kheir
I'm a PhD researcher at the German Research Center for AI (DFKI) and TU Berlin, advised by Prof. Sebastian Möller and Dr. Tim Polzehl. I also lead R&D (part-time) at Gretchen AI in Berlin.
My research sits at the intersection of speech foundation models, trustworthy audio AI, and linguistic inclusivity. I build and study systems that detect AI-generated speech, interpret self-supervised representations, and extend the reach of modern speech AI to languages and dialects that mainstream models leave behind — with a focus on Arabic and its rich dialectal diversity.
Before Berlin, I spent two years at the Qatar Computing Research Institute, contributing to Fanar (Qatar's Arabic-centric LLM) and leading QVoice, the first end-to-end Arabic mispronunciation detection system.
I study the science of robust speech intelligence — how foundation models represent, reason about, and are fooled by audio — and how to make them work across the world's linguistic diversity.
- Speech and audio foundation models: SSL representations, audio LLMs, interpretability, efficient architectures
- Multilingual and low-resource speech: ASR, TTS, pronunciation assessment, dialectal modeling
- Robust and trustworthy audio AI: deepfake detection, anti-spoofing, generalization across unseen domains
- Speech-language models: joint reasoning over acoustic and semantic features, explainable predictions
- Tokenization for multilingual LLMs: morphology-aware methods (MorphBPE used in Fanar)
Listed roughly newest-first. See the Projects page for the full picture.

Open-source, configuration-driven framework for deepfake audio detection research. Mix SSL frontends (Wav2Vec, WavLM, HuBERT, MERT, EAT), backends (AASIST, ECAPA, Nes2Net, RawNet2), losses and augmentations entirely via YAML. Ships with 455+ pretrained models on HuggingFace.

Leading the Speech Team to build open-source AI grounded in Moroccan linguistic identity. Built MoulSot, a curated 80-hour Moroccan Darija ASR corpus distilled from 1,500 hours of YouTube speech through a multi-stage pipeline (Silero VAD, SQUIM, Audiobox Aesthetics, DNS64, pyannote, Argilla, Gemini 2.5 Pro), and fine-tuned Qwen3-ASR-1.7B on top. Also building the first natural Text-to-Speech system and tokenizer for Moroccan Darija.
Co-leading R&D for generative-media detection at a Berlin startup. Built an image deepfake detector that ranked #1 on the Deepfake-Eval 2024 benchmark (~83% accuracy), and designed the production inference pipeline.

Speech contributions to News-Polygraph, a BMBF-funded multimodal disinformation platform. Includes BiCrossMamba-ST (one of the first Mamba-based anti-spoofing models, 28% fewer params), spectral plus SSL fusion, layer-wise SSL interpretability, and a parameter-efficient multi-scale adapter for synthetic speech detection (ICASSP 2026).

A multi-institutional initiative benchmarking Arabic pronunciation assessment via Quranic recitation. Organized the first shared task at ArabicNLP 2025; second iteration accepted as an Interspeech 2026 Challenge expanding to general MSA. Co-built QuranMB, the first expert-annotated Quranic mispronunciation dataset.
Contributed to Fanar, Qatar's Arabic-centric multimodal LLM. Designed MorphBPE, a morphology-aware tokenizer that improves fertility and downstream generation for morphologically rich languages. Helped run training pipelines from 300M to 3B parameters.
- 2026-05 talkReturning as Speech ML mentor and PhD Students panelist at HackAI Morocco 2026.
- 2026-05 releaseReleased MoulSot: 80 hours of curated Moroccan Darija ASR data distilled from 1,500h of YouTube, plus a fine-tuned Qwen3-ASR model (atlasia/moulsot.v0.3).
- 2026-03 paperDeepFense framework paper is out: a unified, modular, extensible toolkit for robust deepfake audio detection.
- 2026-02 organizeIQRA 2026, the second Iqra'Eval, accepted as an Interspeech 2026 Challenge on MSA pronunciation assessment.
- 2026-01 paperTwo ICASSP 2026 papers accepted: a parameter-efficient multi-scale adapter for synthetic-speech detection, and the DFKI-SLT system for the ESDD 2026 challenge.
- 2025-12 talkInvited talk at Alexandria University on Speech AI and Arabic pronunciation. Video on YouTube.
- 2025-10-25 codeReleased code for Two Views, One Truth.
- 2025-10-22 talkInvited talk at Reality Defender on Generalizable Audio Deepfake Detection.
- 2025-10-15 paperPresented Two Views, One Truth at WASPAA 2025 (Tahoe City, USA).
- 2025-09 grantWASPAA 2025 Travel Grant ($1,000).
- 2025-08 organizeCo-organized the ArabicSpeech Meetup at Interspeech 2025.
- 2025-06 organizeFounded and led the Iqra'Eval Shared Task at ArabicNLP 2025.
- 2025-05 paperBiCrossMamba-ST, among the first successful applications of Mamba state-space models to speech anti-spoofing.
- 2025-03 roleJoined Gretchen AI as Research & Engineering Lead (part-time), co-leading R&D for generative-media detection.
- 2025-02 paperMorphBPE, the morpho-aware tokenizer used in Fanar, is out.
- 2025-01 paperLayer-wise Analysis of SSL Models for Audio Deepfake Detection accepted at Findings of NAACL 2025.
- 2024-12 visitInvited researcher at the SDAIA Winter School, leading a team of 15 researchers to build QuranMB, the first expert-annotated dataset for Quranic mispronunciation.
Full list: publications page and Google Scholar
- arXiv 2026 DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection
- arXiv 2026 IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for MSA
- ICASSP 2026 A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection
- ICASSP 2026 The DFKI-SLT System for ESDD 2026: BiCrossMamba-ST with Attentive SSL Fusion
- WASPAA 2025 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
- Interspeech 2025 BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention
- NAACL Findings 2025 Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
- arXiv 2025 Fanar: An Arabic-Centric Multimodal Generative AI Platform
- arXiv 2025 MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
- ACL 2024 Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
- ICASSP 2024 L1-aware Multilingual Mispronunciation Detection Framework
- Interspeech 2023 QVoice: Arabic Speech Pronunciation Learning Application
- 2026-05 HackAI Morocco 2026, Speech ML Mentor (2nd year) and panelist on the PhD Students Panel.
- 2025-12 Alexandria University, invited talk on Speech AI and Arabic pronunciation. Video on YouTube.
- 2025-10 Reality Defender, invited talk on Generalizable Audio Deepfake Detection.
- 2025-10 WASPAA 2025, Tahoe City, paper presentation on Two Views, One Truth.
- 2025-08 Interspeech 2025, Rotterdam, co-organized the ArabicSpeech Meetup.
- 2025-05 HackAI Morocco 2025, Speech ML Mentor on ASR and forced alignment for low-resource dialects.
- 2024-12 SDAIA Winter School, Riyadh, invited researcher, led the QuranMB project team.
- 2024 SDAIA Summer School, invited researcher in the speech and language program.
PhD Researcher, DFKI & TU Berlin
2024 to present
Speech foundation models, SSL interpretability, Mamba for audio, multimodal robustness. DeepFense framework, News-Polygraph contributor.- Research & Engineering Lead (part-time), Gretchen AI, Berlin
Mar 2025 to present
Co-leading R&D for generative-media detection. #1 on Deepfake-Eval 2024 benchmark for image deepfake detection. - Research Associate, Qatar Computing Research Institute (QCRI)
2022 to 2024
Fanar LLM (MorphBPE tokenizer, large-scale training pipelines) and QVoice (Arabic pronunciation learning). 6+ papers in 2 years.
Education: PhD, TU Berlin / DFKI (2024, exp. 2027). MSc Machine Learning, KTH Royal Institute of Technology (2021, 2022). MSc Data Science, EURECOM & Télécom Paris (2020, 2021). Diplôme d'Ingénieur, Télécom Paris (2019, 2022, top 5%).
- 2025 WASPAA 2025 Travel Grant ($1,000)
- 2025 ArabicNLP 2025 Grant ($500), for leadership of the Iqra'Eval Shared Task
- 2022 Télécom Paris Scholarship, tuition waiver plus stipend for KTH Sweden
- 2019 FIRSI Excellence Scholarship (€9,000/year)
- 2018 Prépa FIRSI Scholarship
I’m currently supervising Enes Erdogan and Feidi Kallel (MSc students at TU Berlin), and have mentored teams at the SDAIA Winter School, SDAIA Summer School, HackAI Morocco (2025 and 2026), and through the Iqra’Eval initiative.
If you’re a Master’s student in Germany or the EU interested in speech and audio ML, foundation models, ASR/TTS, audio LLMs, deepfake detection, or Arabic / low-resource speech, feel free to reach out. See the Supervision page for details.
Program Committee & Reviewing: ACL 2024, ACL 2025, EMNLP 2024, Interspeech 2025, ICASSP 2025, EACL 2025, COLING 2024/25, ArabicNLP 2025, SLaTE 2025. Full list on the Services page.
đź“§ yassine.el_kheir@dfki.de · đź“§ elkheiryassine0@gmail.com
🎓 Google Scholar · đź’» GitHub · đź”— LinkedIn

