Publications

You can also find my articles on my Google Scholar profile.

Research Papers

📜BiCrossMamba-ST: Speech Deepfake Detection with Bidirectional Mamba Spectro-Temporal Cross-Attention

Authors: Yassine El Kheir, Tim Polzehl, Sebastian Möller

Publication Date: 2025/5/21

Conference: Accepted to Interspeech 2025 arXiv:2502.00894

Description: We propose BiCrossMamba-ST, a robust framework for speech deepfake detection that leverages a dual-branch spectrotemporal architecture powered by bidirectional Mamba blocks and mutual cross-attention. By processing spectral sub-bands and temporal intervals separately and then integrating their representations, BiCrossMamba-ST effectively captures the subtle cues of synthetic speech. In addition, our proposed framework leverages a convolution-based 2D attention map to focus on specific spectro-temporal regions, enabling robust deepfake detection. Operating directly on raw features, BiCrossMamba-ST achieves significant performance improvements, a 67.74% and 26.3% relative gain over state-of-the-art AASIST on ASVSpoof LA21 and ASVSpoof DF21 benchmarks, respectively, and a 6.80% improvement over RawBMamba on ASVSpoof DF21. Code and models will be made publicly available.

📜Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection

Authors: Yassine El Kheir, Youness Samih, Suraj Maharjan, Tim Polzehl, Sebastian Möller

Publication date: 2025/2/8

Conference: NAACL Findings 2025

Description: This paper conducts a comprehensive layer-wise analysis of self-supervised learning (SSL) models for audio deepfake detection across diverse contexts, including multilingual datasets (English, Chinese, Spanish), partial, song, and scene-based deepfake scenarios. By systematically evaluating the contributions of different transformer layers, we uncover critical insights into model behavior and performance. Our findings reveal that lower layers consistently provide the most discriminative features, while higher layers capture less relevant information. Notably, all models achieve competitive equal error rate (EER) scores even when employing a reduced number of layers. This indicates that we can reduce computational costs and increase the inference speed of detecting deepfakes by utilizing only a few lower layers. This work enhances our understanding of SSL models in deepfake detection, offering valuable insights applicable across varied linguistic and contextual settings.

Read the paper

📜MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies

Spectro-Temporal Cross-Attention

Authors: Ehsaneddin Asgari, Yassine El Kheir , Mohammad Ali Sadraei Javaheri

Publication Date: 2025/2/2

Conference: Submitted to ACL 2025 arXiv:2502.00894

Description: We introduce MorphBPE, a morphology-aware extension of BPE that integrates linguistic structure into subword tokenization while preserving statistical efficiency, specifically for morphologically rich languages.

🚀Fanar: An Arabic-Centric Multimodal Generative AI Platform

Authors: Fanar Team, Ummar Abbas, Mohammad Shahmeer Ahmad, Firoj Alam, Enes Altinisik, Ehsannedin Asgari, Yazan Boshmaf, Sabri Boughorbel, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Masoomali Fatehkia, Anastasios Fragkopoulos, Maram Hasanain, Majd Hawasly, Mus' ab Husaini, Soon-Gyo Jung, Ji Kim Lucas, Walid Magdy, Safa Messaoud, Abubakr Mohamed, Tasnim Mohiuddin, Basel Mousi, Hamdy Mubarak, Ahmad Musleh, Zan Naeem, Mourad Ouzzani, Dorde Popovic, Amin Sadeghi, Husrev Taha Sencar, Mohammed Shinoy, Omar Sinan, Yifan Zhang, Ahmed Ali, Yassine El Kheir , Xiaosong Ma, Chaoyi Ruan

Publication Date: 2025/1/18

Report: arXiv preprint arXiv:2501.13944

Description: Fanar is a platform for Arabic-centric multimodal generative AI systems, supporting language, speech, and image generation tasks, with key components like Fanar Star and Fanar Prime, offering state-of-the-art Arabic language models and advanced capabilities like Islamic Retrieval Augmented Generation (RAG).

🔤Beyond Orthography: Automatic Recovery of Short Vowels and Dialectal Sounds in Arabic

Authors: Yassine El Kheir , Hamdy Mubarak, Ahmed Ali, Shammur Absar Chowdhury

Publication Date: 2024/8/5

Conference: ACL 2024

Description: This paper presents a novel framework for dialectal sound and vowelization recovery in Arabic, addressing the challenge of recognizing borrowed and dialectal sounds in phonologically diverse languages, using limited data to improve performance.

🔤Larabench: Benchmarking Arabic AI with Large Language Models

Authors: Ahmed Abdelali, Hamdy Mubarak, Shammur Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Samir Abdaljalil, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Youssef Elshahawy, Ahmed Ali, Nadir Durrani, Nataša Milić-Frayling, Firoj Alam

Publication date: 2024/3

Conference: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Pages: 487-520

Read the paper

🗣️Automatic Pronunciation Assessment - A Review

Authors: Yassine El Kheir , Ahmed Ali, Shammur Absar Chowdhury

Publication Date: 2023/10/21

Conference: Findings of EMNLP 23

Description: A comprehensive review of recent advancements in automatic pronunciation assessment for both phonemic and prosodic aspects, discussing methods, challenges, and resources, with directions for future research.

📚L1-aware Multilingual Mispronunciation Detection Framework

Authors: Yassine El Kheir , Shammur Absar Chowdhury, Ahmed Ali

Publication Date: 2023/9/14

Conference: IEEE ICASSP 2024

Description: This paper introduces the L1-MultiMDD framework, incorporating L1-aware speech representation to detect mispronunciations across multiple languages, improving multilingual MDD by integrating an L1-L2 embedding and multi-task learning.

🎤Multi-View Multi-Task Representation Learning for Mispronunciation Detection

Authors: Yassine El Kheir , Shammur Absar Chowdhury, Ahmed Ali

Publication Date: 2023/6/2

Conference: Speech and Language Technology in Education Workshop (SLaTE 2023)

Description: This paper proposes a novel architecture for mispronunciation detection that uses multiple views of the input data assisted by auxiliary tasks to learn more distinctive phonetic representations in low-resource settings, outperforming the state-of-the-art models.

🎤QVoice: Arabic Speech Pronunciation Learning Application

Authors: Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed Ali

Publication date: 2023/5/9

Conference: INTERSPEECH 2023