Yassine El Kheir

I'm a final-year PhD student at the German Research Center for AI (DFKI) and TU Berlin, advised by Prof. Sebastian Möller and Dr. Tim Polzehl. My research focuses on robust audio deepfake detection — from SSL backbone analysis and efficient architectures to Audio LLMs — building systems that generalize across domains and real-world conditions. I recently interned at NII Japan (Dec 2025 – Apr 2026), working with Xin Wang and Junichi Yamagishi — ASVspoof co-founders — on LLM-based speech reasoning. I also lead R&D (part-time) at Gretchen AI and consult on production speech AI systems. Before Berlin, I was a Research Associate at QCRI, building Arabic speech systems (ASR, TTS, pronunciation) and contributing to Fanar (Qatar's Arabic-centric LLM).

Detection is the core — and understanding how speech is synthesized from the inside makes that detection work stronger.

🔬 Research Interests

Robust and trustworthy audio AI: deepfake detection, anti-spoofing, generalization across unseen domains
Speech and audio foundation models: SSL representations, interpretability, efficient architectures
Audio LLMs for speech reasoning: LLM-based deepfake detection, speech authentication, multimodal reasoning
Multilingual and low-resource speech: ASR, TTS, pronunciation assessment, dialectal modeling
Tokenization for multilingual LLMs: morphology-aware methods (MorphBPE used in Fanar)

🚀 Flagship Projects

Listed roughly newest-first. See the Projects page for the full picture.

Robust Deepfake Detection Research

DFKI · ICASSP 2026 · Interspeech 2025 · WASPAA 2025 · NAACL Findings 2025

Core PhD research line: BiCrossMamba-ST (one of the first Mamba-based anti-spoofing models, 28% fewer params, Interspeech 2025), layer-wise SSL interpretability showing lower layers suffice for detection (~3× faster), spectral + SSL fusion (WASPAA 2025), and a parameter-efficient multi-scale adapter (ICASSP 2026).

BiCrossMamba Layer-wise Two Views MultiConvAdapter

DeepFense Framework

DFKI · open-source · Interspeech 2026 Long Track

Open-source, configuration-driven framework for deepfake audio detection. Mix SSL frontends, backends, losses and augmentations via YAML. Ships with 455+ pretrained models across 12 benchmarks on HuggingFace.

website 🤗 models code paper

Audio LLMs & Speech Reasoning

NII Japan · Dec 2025 – Apr 2026 · SLT 2026 (under review)

Visiting researcher at NII Japan with Xin Wang and Junichi Yamagishi (ASVspoof co-founders), extending deepfake detection toward LLM-based speech reasoning. Work led to Bridging the Modality Gap, using explicit textual grounding to teach LLMs to reason about speech authenticity — moving beyond classifiers.

Gretchen AI

Research & Engineering Lead (part-time) · 2025 to present

Co-leading R&D for generative-media detection. Built an image deepfake detector ranked #1 on Deepfake-Eval 2024 (~83% accuracy), and designed the production inference pipeline. Direct deployment of PhD research.

website

AtlasIA, Open-Source AI for Morocco

Speech Team Lead · 2025 to present

Leading ASR and TTS development for Moroccan Darija. Built MoulSot: 80 hours of curated speech distilled from 1,500 hours via a 9-stage pipeline, with fine-tuned Qwen3-ASR. Also building the first natural TTS system for Darija.

website 🤗 HuggingFace

Iqra'Eval, Quranic Pronunciation

Founder & Lead Organizer · ArabicNLP 2025 · Interspeech 2026 Challenge

Founded the first shared task for Quranic pronunciation assessment (ArabicNLP 2025). Follow-up accepted as an Interspeech 2026 Challenge — establishing a lasting community benchmark for Arabic pronunciation.

🤗 dataset IQRA 2026

Fanar and MorphBPE

QCRI · 2022 to 2024

Contributed to Fanar, Qatar's Arabic-centric multimodal LLM. Designed MorphBPE, a morphology-aware tokenizer that improves fertility and generation quality for morphologically rich languages.

Fanar paper MorphBPE

📰 News

2026-06 paper3 papers accepted at Interspeech 2026: DeepFense (Long Track, first author), IQRA 2026 Challenge paper (first author), and a collaboration with University of Stuttgart.
2026-06 awardDFKI-SLT system ranked 1st in Track 2 and 2nd in Track 1 at the ESDD 2026 Grand Challenge (ICASSP 2026) — 97 registered teams. Results paper.
2026-05 talkReturning as Speech ML mentor and PhD Students panelist at HackAI Morocco 2026.
2026-05 releaseReleased MoulSot: 80 hours of curated Moroccan Darija ASR data distilled from 1,500h of YouTube, plus a fine-tuned Qwen3-ASR model (atlasia/moulsot.v0.3).
2026-04 visitCompleted a 5-month visiting research stay at NII Japan (Tokyo), working with Xin Wang and Junichi Yamagishi on Audio LLMs for speech reasoning and deepfake detection.
2026-03 paperDeepFense framework paper is out: a unified, modular, extensible toolkit for robust deepfake audio detection.
2026-02 organizeIQRA 2026, the second Iqra'Eval, accepted as an Interspeech 2026 Challenge on MSA pronunciation assessment.
2026-01 paperTwo ICASSP 2026 papers accepted: a parameter-efficient multi-scale adapter for synthetic-speech detection, and the DFKI-SLT system for the ESDD 2026 challenge.
2025-12 visitStarted a visiting research internship at NII Japan (National Institute of Informatics, Tokyo), working with Xin Wang and Junichi Yamagishi on Audio LLMs for speech understanding.
2025-12 talkInvited talk at Alexandria University on Speech AI and Arabic pronunciation. Video on YouTube.
2025-10-25 codeReleased code for Two Views, One Truth.
2025-10-22 talkInvited talk at Reality Defender on Generalizable Audio Deepfake Detection.
2025-10-15 paperPresented Two Views, One Truth at WASPAA 2025 (Tahoe City, USA).
2025-09 grantWASPAA 2025 Travel Grant ($1,000).
2025-08 organizeCo-organized the ArabicSpeech Meetup at Interspeech 2025.
2025-06 organizeFounded and led the Iqra'Eval Shared Task at ArabicNLP 2025.
2025-05 paperBiCrossMamba-ST, among the first successful applications of Mamba state-space models to speech anti-spoofing.
2025-03 roleJoined Gretchen AI as Research & Engineering Lead (part-time), co-leading R&D for generative-media detection.
2025-02 paperMorphBPE, the morpho-aware tokenizer used in Fanar, is out.
2025-01 paperLayer-wise Analysis of SSL Models for Audio Deepfake Detection accepted at Findings of NAACL 2025.
2024-12 visitInvited researcher at the SDAIA Winter School, leading a team of 15 researchers to build QuranMB, the first expert-annotated dataset for Quranic mispronunciation.

📜 Selected Publications

Full list: publications page and Google Scholar

Interspeech 2026 DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection
Y. El Kheir, A. Das, Y. Xiao, X. Wang, F. Kallel, E. E. Erdogan, N. T. Vu, T. Polzehl, S. Möller
Interspeech 2026 IQRA 2026: Interspeech Challenge on Automatic Pronunciation Assessment for MSA
Y. El Kheir, A. Meghanani, M. Shahin, O. Ibrahim, S. A. Chowdhury, et al.
ICASSP 2026 A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection
Y. El Kheir, F. Ritter-Gutiérrez, A. Das, T. Polzehl, S. Möller
ICASSP 2026 The DFKI-SLT System for ESDD 2026: BiCrossMamba-ST with Attentive SSL Fusion
Y. El Kheir, A. Das, E. E. Erdogan, F. Kallel, T. Polzehl, S. Möller
WASPAA 2025 Two Views, One Truth: Spectral and Self-Supervised Features Fusion for Robust Speech Deepfake Detection
Y. El Kheir, A. Das, E. E. Erdogan, F. Ritter-Gutiérrez, T. Polzehl, S. Möller
Interspeech 2025 BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention
Y. El Kheir, T. Polzehl, S. Möller
NAACL Findings 2025 Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Y. El Kheir, Y. Samih, S. Maharjan, T. Polzehl, S. Möller
arXiv 2025 Fanar: An Arabic-Centric Multimodal Generative AI Platform
Fanar Team (incl. Y. El Kheir)
arXiv 2025 MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies
E. Asgari, Y. El Kheir, M. A. S. Javaheri
ACL 2024 Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic
Y. El Kheir, H. Mubarak, A. Ali, S. A. Chowdhury
ICASSP 2024 L1-aware Multilingual Mispronunciation Detection Framework
Y. El Kheir, S. A. Chowdhury, A. Ali
Interspeech 2023 QVoice: Arabic Speech Pronunciation Learning Application
Y. El Kheir, F. Khnaisser, S. A. Chowdhury, H. Mubarak, S. Afzal, A. Ali

🎤 Invited Talks & Selected Visits

2026-05 HackAI Morocco 2026, Speech ML Mentor (2nd year) and panelist on the PhD Students Panel.
2025-12 NII Japan, Tokyo, visiting researcher (Dec 2025 – Apr 2026) with Xin Wang and Junichi Yamagishi on Audio LLMs and speech deepfake detection.
2025-12 Alexandria University, invited talk on Speech AI and Arabic pronunciation. Video on YouTube.
2025-10 Reality Defender, invited talk on Generalizable Audio Deepfake Detection.
2025-10 WASPAA 2025, Tahoe City, paper presentation on Two Views, One Truth.
2025-08 Interspeech 2025, Rotterdam, co-organized the ArabicSpeech Meetup.
2025-05 HackAI Morocco 2025, Speech ML Mentor on ASR and forced alignment for low-resource dialects.
2024-12 SDAIA Winter School, Riyadh, invited researcher, led the QuranMB project team.
2024 SDAIA Summer School, invited researcher in the speech and language program.

💼 Experience

PhD Researcher, DFKI & TU Berlin
2024 to present
Speech foundation models, SSL interpretability, Mamba for audio, multimodal robustness. DeepFense framework, News-Polygraph contributor.
Visiting Researcher, National Institute of Informatics (NII), Tokyo, Japan
Dec 2025 – Apr 2026
Worked with Xin Wang and Junichi Yamagishi — co-founders of the ASVspoof challenge — on Audio LLMs for speech reasoning and deepfake detection. Explored textual grounding and LLM-based speech authenticity assessment.
Research & Engineering Lead (part-time), Gretchen AI, Berlin
Mar 2025 to present
Co-leading R&D for generative-media detection. #1 on Deepfake-Eval 2024 benchmark for image deepfake detection.
Research Associate, Qatar Computing Research Institute (QCRI)
2022 to 2024
Fanar LLM (MorphBPE tokenizer, large-scale training pipelines) and QVoice (Arabic pronunciation learning). 6+ papers in 2 years.

Education: PhD, TU Berlin / DFKI (2024, exp. 2027). MSc Machine Learning, KTH Royal Institute of Technology (2021, 2022). MSc Data Science, EURECOM & Télécom Paris (2020, 2021). Diplôme d'Ingénieur, Télécom Paris (2019, 2022, top 5%).

🏅 Awards & Grants

2026 1st place, Track 2 & 2nd place, Track 1, ESDD 2026 Grand Challenge at ICASSP 2026 (97 registered teams) — results paper
2024 1st place, Deepfake-Eval 2024 benchmark (~83% accuracy, image deepfake detection)
2025 WASPAA 2025 Travel Grant ($1,000)
2025 ArabicNLP 2025 Grant ($500), for leadership of the Iqra'Eval Shared Task
2022 Télécom Paris Scholarship, tuition waiver plus stipend for KTH Sweden
2019 FIRSI Excellence Scholarship (€9,000/year)
2018 Prépa FIRSI Scholarship

🤝 Mentorship & Service

I’m currently supervising Enes Erdogan and Feidi Kallel (MSc students at TU Berlin), and have mentored teams at the SDAIA Winter School, SDAIA Summer School, HackAI Morocco (2025 and 2026), and through the Iqra’Eval initiative.

If you’re a Master’s student in Germany or the EU interested in speech and audio ML, foundation models, ASR/TTS, audio LLMs, deepfake detection, or Arabic / low-resource speech, feel free to reach out. See the Supervision page for details.

Program Committee & Reviewing: ACL 2024, ACL 2025, EMNLP 2024, Interspeech 2025, ICASSP 2025, EACL 2025, COLING 2024/25, ArabicNLP 2025, SLaTE 2025. Full list on the Services page.

📬 Contact

📧 yassine.el_kheir@dfki.de · 📧 elkheiryassine0@gmail.com
🎓 Google Scholar · 💻 GitHub · 🔗 LinkedIn