Projects

A selection of research and engineering projects I've contributed to or led, spanning speech foundation models, audio language models, multilingual speech, and robust, trustworthy audio AI. Within each theme, projects are listed roughly newest-first.

๐Ÿ›ก๏ธ Robust & Trustworthy Speech AI

open-source

DeepFense, Modular Deepfake Audio Detection Framework

DFKI · Apache-2.0 · 2025 to present

An open-source, configuration-driven framework I co-built for deepfake audio detection research. Mix and match SSL frontends, classifier backends, losses, and augmentations entirely via YAML, without touching code.

  • Plug-and-play registry: Wav2Vec 2.0, WavLM, HuBERT, MERT, EAT frontends and AASIST, ECAPA-TDNN, Nes2Net, RawNet2 backends.
  • 455+ pretrained models and 12 benchmark datasets (ASVSpoof19, CompSpoof, DECRO, SONICS) on HuggingFace.
  • Built-in augmentations (RawBoost, RIR, codec, noise) and standardized metrics (EER, minDCF, actDCF, F1).
disinformation

News-Polygraph

DFKI · BMBF-funded research project · 2024 to present

A collaborative German research initiative building a multimodal platform for detecting and analyzing disinformation across speech, image, and text. My contribution is on the speech and audio side: robust generalizable deepfake detection, anti-spoofing, and interpretability, feeding into the DeepFense framework above.

research line

Mamba, SSL Interpretability, and Spectral Fusion

DFKI · ICASSP 2026 · Interspeech 2025 · WASPAA 2025 · NAACL Findings 2025

A connected research line on making speech anti-spoofing systems both stronger and more efficient:

  • Parameter-Efficient Multi-Scale Adapter (ICASSP 2026): adapter-style fine-tuning of SSL backbones for synthetic-speech detection, with sharp efficiency gains.
  • BiCrossMamba-ST (Interspeech 2025): among the first successful applications of Mamba state-space models to speech anti-spoofing. SOTA on ASVSpoof LA21 and DF21 with 28% fewer parameters than Transformer baselines.
  • Two Views, One Truth (WASPAA 2025): hybrid fusion of SSL embeddings with handcrafted spectral descriptors (MFCC, LFCC, CQCC) for cross-domain robustness.
  • Layer-wise SSL analysis (NAACL Findings 2025): first large-scale study showing that lower layers of speech SSL models are sufficient for artifact detection, enabling ~3x faster inference.

๐ŸŒ Multilingual & Low-Resource Speech

TTS · ASR · community

AtlasIA, Open-Source AI for Morocco

Speech Team Lead · 2025 to present

Leading the Speech Team in an open-source community building AI grounded in Moroccan linguistic identity:

  • MoulSot (2026): curated 80 hours of high-quality Moroccan Darija speech distilled from 1,500 hours of YouTube content through a 9-stage pipeline (Silero VAD, SQUIM, Audiobox Aesthetics, DNS64 denoising, pyannote diarization, Argilla human-in-the-loop, Gemini 2.5 Pro transcription), then fine-tuned Qwen3-ASR-1.7B on top. Released as atlasia/MoulSot-Full and atlasia/moulsot.v0.3.
  • Building the first natural Text-to-Speech system and tokenizer for Moroccan Darija.
  • Mentoring HackAI Morocco teams on forced alignment and low-resource ASR.
benchmark · community

Iqra'Eval, Quranic Pronunciation Assessment

Founder & Lead Organizer · 2024 to present

A multi-institutional initiative to benchmark Arabic pronunciation assessment, using Quranic recitation as the case study:

  • Founded and ran the Iqra'Eval Shared Task at ArabicNLP 2025, establishing the first standard benchmark for Quranic recitation analysis.
  • Co-built QuranMB, the first expert-annotated dataset for Quranic mispronunciation (~2 h), with baselines, originally as lead of a 15-researcher team at the SDAIA Winter School.
  • Next iteration accepted as an Interspeech 2026 Challenge (IQRA 2026).
pronunciation · ASR

QVoice and the Arabic Speech Stack

QCRI · Interspeech 2023 · 6+ follow-on papers

Led QVoice, the first end-to-end mispronunciation detection system for Modern Standard Arabic, and contributed a series of SOTA results on L2 speech assessment and multilingual ASR:

  • SpeechBlender (SLaTE 2023), fine-grained augmentation for mispronunciation data scarcity.
  • L1-aware Multilingual MDD (ICASSP 2024), incorporating native-language priors into multilingual mispronunciation detection.
  • Beyond Orthography (ACL 2024), recovering short vowels and dialectal sounds in Arabic with limited data.
  • AraVoiceL2, non-native Arabic speech dataset for phoneme-level evaluation.

๐Ÿง  Speech & Audio Foundation Models

LLM · tokenization

Fanar & MorphBPE

QCRI · 2022 to 2024 · arXiv 2025

Contributed to Fanar, Qatar's Arabic-centric multimodal LLM, and designed MorphBPE, a morphology-aware tokenizer that integrates linguistic structure into subword segmentation. The work shows how foundation-model design choices can be tuned to morphologically rich languages without sacrificing efficiency.

  • MorphBPE improves fertility rates and downstream generation quality for morphologically rich languages, with negligible overhead vs. plain BPE.
  • Helped run large-scale data filtering and training pipelines (300M to 3B parameters) with MosaicML's LLM-Foundry.

๐Ÿ’ผ Industry

startup · part-time

Gretchen AI, Research & Engineering Lead

Berlin · Mar 2025 to present · part-time

Co-leading R&D for generative-media detection at a Berlin-based AI startup. Built a proprietary image deepfake detector that ranked #1 on the Deepfake-Eval 2024 benchmark (~83% accuracy), and designed the scalable inference pipeline used in production.