Projects
A selection of research and engineering projects I've contributed to or led, spanning speech foundation models, audio language models, multilingual speech, and robust, trustworthy audio AI. Within each theme, projects are listed roughly newest-first.
๐ก๏ธ Robust & Trustworthy Speech AI

DeepFense, Modular Deepfake Audio Detection Framework
An open-source, configuration-driven framework I co-built for deepfake audio detection research. Mix and match SSL frontends, classifier backends, losses, and augmentations entirely via YAML, without touching code.
- Plug-and-play registry: Wav2Vec 2.0, WavLM, HuBERT, MERT, EAT frontends and AASIST, ECAPA-TDNN, Nes2Net, RawNet2 backends.
- 455+ pretrained models and 12 benchmark datasets (ASVSpoof19, CompSpoof, DECRO, SONICS) on HuggingFace.
- Built-in augmentations (RawBoost, RIR, codec, noise) and standardized metrics (EER, minDCF, actDCF, F1).

News-Polygraph
A collaborative German research initiative building a multimodal platform for detecting and analyzing disinformation across speech, image, and text. My contribution is on the speech and audio side: robust generalizable deepfake detection, anti-spoofing, and interpretability, feeding into the DeepFense framework above.

Mamba, SSL Interpretability, and Spectral Fusion
A connected research line on making speech anti-spoofing systems both stronger and more efficient:
- Parameter-Efficient Multi-Scale Adapter (ICASSP 2026): adapter-style fine-tuning of SSL backbones for synthetic-speech detection, with sharp efficiency gains.
- BiCrossMamba-ST (Interspeech 2025): among the first successful applications of Mamba state-space models to speech anti-spoofing. SOTA on ASVSpoof LA21 and DF21 with 28% fewer parameters than Transformer baselines.
- Two Views, One Truth (WASPAA 2025): hybrid fusion of SSL embeddings with handcrafted spectral descriptors (MFCC, LFCC, CQCC) for cross-domain robustness.
- Layer-wise SSL analysis (NAACL Findings 2025): first large-scale study showing that lower layers of speech SSL models are sufficient for artifact detection, enabling ~3x faster inference.
๐ Multilingual & Low-Resource Speech

AtlasIA, Open-Source AI for Morocco
Leading the Speech Team in an open-source community building AI grounded in Moroccan linguistic identity:
- MoulSot (2026): curated 80 hours of high-quality Moroccan Darija speech distilled from 1,500 hours of YouTube content through a 9-stage pipeline (Silero VAD, SQUIM, Audiobox Aesthetics, DNS64 denoising, pyannote diarization, Argilla human-in-the-loop, Gemini 2.5 Pro transcription), then fine-tuned Qwen3-ASR-1.7B on top. Released as
atlasia/MoulSot-Fullandatlasia/moulsot.v0.3. - Building the first natural Text-to-Speech system and tokenizer for Moroccan Darija.
- Mentoring HackAI Morocco teams on forced alignment and low-resource ASR.

Iqra'Eval, Quranic Pronunciation Assessment
A multi-institutional initiative to benchmark Arabic pronunciation assessment, using Quranic recitation as the case study:
- Founded and ran the Iqra'Eval Shared Task at ArabicNLP 2025, establishing the first standard benchmark for Quranic recitation analysis.
- Co-built QuranMB, the first expert-annotated dataset for Quranic mispronunciation (~2 h), with baselines, originally as lead of a 15-researcher team at the SDAIA Winter School.
- Next iteration accepted as an Interspeech 2026 Challenge (IQRA 2026).

QVoice and the Arabic Speech Stack
Led QVoice, the first end-to-end mispronunciation detection system for Modern Standard Arabic, and contributed a series of SOTA results on L2 speech assessment and multilingual ASR:
- SpeechBlender (SLaTE 2023), fine-grained augmentation for mispronunciation data scarcity.
- L1-aware Multilingual MDD (ICASSP 2024), incorporating native-language priors into multilingual mispronunciation detection.
- Beyond Orthography (ACL 2024), recovering short vowels and dialectal sounds in Arabic with limited data.
- AraVoiceL2, non-native Arabic speech dataset for phoneme-level evaluation.
๐ง Speech & Audio Foundation Models
Fanar & MorphBPE
Contributed to Fanar, Qatar's Arabic-centric multimodal LLM, and designed MorphBPE, a morphology-aware tokenizer that integrates linguistic structure into subword segmentation. The work shows how foundation-model design choices can be tuned to morphologically rich languages without sacrificing efficiency.
- MorphBPE improves fertility rates and downstream generation quality for morphologically rich languages, with negligible overhead vs. plain BPE.
- Helped run large-scale data filtering and training pipelines (300M to 3B parameters) with MosaicML's LLM-Foundry.
๐ผ Industry
Gretchen AI, Research & Engineering Lead
Co-leading R&D for generative-media detection at a Berlin-based AI startup. Built a proprietary image deepfake detector that ranked #1 on the Deepfake-Eval 2024 benchmark (~83% accuracy), and designed the scalable inference pipeline used in production.
