Dialects
Modern Standard Arabic transcription
The formal pan-Arab register — used for press releases, government statements, scripted IVR, and formal training material.
Last updated: April 2026
Modern Standard Arabic — al-fuṣḥā al-muʿāṣira — is the formal pan-Arab register: the language of news broadcasts, scripted IVR menus, government press releases, formal corporate communications, and training material. No one speaks MSA at home or on a customer-service call — but it shows up everywhere in operational audio that does need transcribing. CallScribe's MSA model is the most accurate of the four because MSA training data is most abundant.
Why MSA is the easiest Arabic for ASR
MSA has a fully specified, taught, and standardised phonology and orthography. Training data is abundant: news broadcasts, audiobooks, parliamentary recordings, and IVR voice talent are all close to standard. Speakers in formal settings hyperarticulate. Vowels — fully written in tashkīl-marked MSA, partially in unmarked text — are predictable from morphology. The result is that MSA word-error rates on clean broadcast audio can drop below 6%, well below any spontaneous dialect.
CallScribe's MSA model achieves 4-7% WER on clean studio recordings and 8-12% on telephony-grade MSA (which exists in scripted IVR, recorded compliance disclosures, and formal interview audio). Dialect interference is the main failure mode — MSA "spoken with a Gulf accent" is technically still MSA but the acoustic model needs to handle the speaker's native vowel system.
When customers actually need MSA-only transcription
Three common cases: scripted IVR menu transcription for compliance ("press 1 for Arabic"), pre-recorded compliance disclosures attached to financial calls, and corporate training videos. In all three, the audio is clean, the speech is scripted, and accuracy expectations are high. MSA also dominates government broadcast monitoring and Arabic-language press-release transcription pipelines that some CallScribe customers run alongside their call-centre work.
Tashkīl, orthography, and number handling
MSA orthography is consonantal — short vowels are written as diacritics (tashkīl) only when needed for disambiguation. CallScribe transcribes MSA into unmarked orthography by default (matching how Arabs read most printed text), with an optional fully-marked output for educational and Quranic-recitation use cases. Numbers and dates are produced in either Arabic-Indic numerals (٠١٢٣) or European numerals (0123) per project setting; the default is Arabic-Indic.
Hamza placement (أ vs ا vs ء) follows standard Arabic Academy conventions. Tā' marbūṭa is preserved; alif maqṣūra is distinguished from yāʼ. These details matter for legal and contractual transcripts where downstream OCR/search needs canonical spelling.
Where MSA fails: spontaneous speech misclassified as MSA
The most common MSA-related accuracy problem is not the MSA model — it is mislabelling. A spontaneous Saudi or Egyptian phone call routed to the MSA model produces visibly worse transcripts than the dialect-aware model would. CallScribe's dialect-ID probe routes spontaneous calls to the right dialect-aware model and sends only audio that genuinely has MSA acoustic signal to the MSA model. If you know your audio is MSA (recorded IVR, broadcast clip), you can lock the project to MSA and skip the probe.
Code-switching and English in MSA contexts
Even formal MSA speakers code-switch, especially in business and technical contexts ("الذكاء الاصطناعي AI", "نموذج الـ machine learning"). The MSA model is bilingual-ready: English terms within an MSA stream are preserved as English script. Numbers, brand names, and product SKUs follow the same rule.
At a glance
- ✓4-7% WER on clean broadcast MSA
- ✓Tashkīl-marked output available
- ✓Arabic-Indic and European number formats
- ✓Best for IVR, broadcast, scripted compliance audio
- ✓Locks out dialect-ID probe when project is MSA-tagged
FAQs
Should I use the MSA model for customer calls?▾
Almost never. Real customer calls are spontaneous and dialectal, even when speakers think they are speaking MSA. Route customer audio to the auto-detect dialect router; reserve the MSA model for genuinely scripted content like IVR, recorded disclosures, training videos, and broadcast clips.
Can I get fully-vowelled MSA output?▾
Yes — enable tashkīl output in project settings. We use a separate diacritisation model that runs after transcription. Note that fully-marked output is not always desirable: most Arab readers prefer unmarked text, and downstream NLP tools may not expect tashkīl.
Does CallScribe handle Quranic recitation?▾
Quranic recitation is a separate domain — it follows tajwīd rules with its own phonological conventions. While our MSA model can transcribe it, accuracy and orthographic fidelity are not benchmarked for liturgical use. We do not recommend CallScribe as a primary tool for muṣḥaf-grade transcription.
How does MSA pricing differ from dialect pricing?▾
It does not. All four dialect models — MSA, Khaleeji, Levantine, Egyptian — are included in the Business and Scale tiers at the same flat rate. The free Starter tier includes MSA the same as it includes the dialect models.
Will the MSA model misread Quranic verses in a sermon?▾
Quranic insertion in a sermon is common. The MSA model recognises high-frequency Quranic phrasing reasonably well but does not apply tajwīd-specific orthography. For sermon transcription, accept that Quranic quotes will appear in standard MSA orthography rather than in muṣḥaf form.
How does MSA compare to Whisper-base out of the box?▾
OpenAI's open-source Whisper has known weaknesses on MSA, particularly on hamza placement and tā' marbūṭa. CallScribe is built on Whisper large-v3-turbo with MSA-specific fine-tuning and a post-processing orthography normaliser that brings WER down by 2-4 percentage points relative to vanilla Whisper.
5 min/mo free · No credit card · 8-12% WER on Khaleeji