How accurate is CallScribe for Gulf Arabic?

CallScribe uses Whisper large-v3-turbo optimized for Arabic dialects. Internal testing across 200+ Gulf Arabic call recordings (March 2026) shows 85-95% word-level accuracy for Khaleeji dialect with clear audio (SNR > 15dB).

Is my call data private with CallScribe?

Yes. CallScribe processes everything on your own server. No audio, transcripts, or metadata are sent to external servers. Zero external API calls during transcription.

What file formats does CallScribe support?

CallScribe accepts MP3, WAV, M4A, FLAC, OGG, and WebM audio files. Export formats include PDF, CSV, TXT, DOCX, and SRT for subtitle workflows.

How much does CallScribe cost?

CallScribe offers a free Starter plan (5 min/month), Business plan at $29/month (500 min), and Scale plan at $79/month (3000 min). No per-minute charges.

Does CallScribe support Khaleeji dialect?

Yes. CallScribe uses Whisper large-v3-turbo optimized for Gulf Arabic (Khaleeji). Internal testing across 200+ call recordings shows 85-95% word-level accuracy with clear audio (SNR > 15dB). March 2026 benchmark.

Can CallScribe handle code-switching between Arabic and English?

Yes. CallScribe detects when speakers switch between Arabic and English mid-sentence — a common pattern in GCC business calls. Both languages are transcribed accurately.

Is CallScribe GDPR compliant?

Yes. All processing happens on private infrastructure hosted in the EU (Hetzner, Germany) with optional GCC residency via Tailscale-routed workers. No audio, transcripts, or metadata are sent to external US-based servers during transcription. CallScribe publishes a Data Processing Agreement (DPA) covering GDPR Article 28 processor obligations, sub-processor disclosures (Stripe for billing, Resend for transactional email, Sentry for error telemetry), 72-hour breach notification, and data deletion on termination. The platform is also aligned with UAE PDPL data sovereignty requirements. Data Subject Access Requests can be submitted to privacy@callscribe.ae and are answered within 30 days.

What dialects of Arabic does CallScribe transcribe most accurately?

CallScribe is tuned primarily for Khaleeji (Gulf) Arabic — including Emirati, Saudi, Kuwaiti, Qatari, Bahraini, and Omani variants — where internal benchmarks on 200+ clear-audio calls (SNR > 15 dB) show 85-95% word-level accuracy. Levantine Arabic (Lebanese, Syrian, Palestinian, Jordanian) lands around 84-88%. Egyptian Arabic reaches 86-90%. Modern Standard Arabic (MSA), common in news and formal recordings, reaches 91-94%. Maghrebi dialects (Moroccan, Algerian, Tunisian) are not officially supported in this release. Accuracy drops on heavily overlapping speakers, poor SNR below 10 dB, or long-form mumbling. See the /model-card page for the full methodology and per-dialect WER table.

How does CallScribe compare to AWS Transcribe and Google Speech-to-Text for Arabic?

AWS Transcribe and Google Speech-to-Text both support Arabic but are tuned primarily toward Modern Standard Arabic, with limited coverage of Gulf, Levantine, and Egyptian dialects. In internal comparisons on Khaleeji call center recordings, both vendors produced 15-25% higher word error rates than CallScribe's Whisper large-v3-turbo pipeline. CallScribe also processes audio on private infrastructure in the EU or on GCC-resident workers — AWS and Google route audio through US regions by default, which is a compliance blocker for UAE PDPL and many GCC enterprise procurement policies. Pricing is flat-rate per minute bucket instead of per-second billing, which tends to save 30-60% at call center volume.

Can I deploy CallScribe on my own infrastructure?

Yes. CallScribe offers a self-hosted deployment option for Scale-tier customers and enterprise buyers. The stack runs entirely on Docker Compose: a Fastify API, a Python worker running Whisper large-v3-turbo and pyannote.audio for diarization, PostgreSQL, Redis, and nginx. Typical hardware: a single GPU worker (RTX 4090 or L4) handles up to 10x realtime throughput. The API tier runs comfortably on a 4-core VPS. Tailscale is used to connect worker nodes back to the control plane over a private mesh, so the GPU host can live in your own rack while the API stays in the cloud. Contact sales@callscribe.ae for a self-host deployment guide and license terms.

What is the turnaround time for a 1-hour call?

On the shared Business tier, a 1-hour call typically completes in 4-8 minutes end-to-end — including upload, transcription with Whisper large-v3-turbo, speaker diarization via pyannote, sentiment analysis, and audio quality scoring. Scale tier customers get priority queue placement and usually see 2-4 minutes for the same file. Self-hosted deployments on an RTX 4090 consistently process 60 minutes of audio in under 3 minutes (over 20x realtime). Queue wait time is the largest variable during peak hours on the free tier. WebSocket progress updates stream live from the worker so users see per-file percent-complete rather than a silent spinner.

How does CallScribe handle noisy call center recordings?

Call center audio is rarely clean — hold music bleed, codec artifacts, echo, cross-talk, and background PA announcements are all common. CallScribe runs a pre-processing pipeline that analyzes signal-to-noise ratio (SNR), root-mean-square loudness, and speech activity before transcription, then reports a per-file audio quality score so users know how much to trust the transcript. For SNR above 15 dB accuracy stays in the 85-95% band. Between 10 and 15 dB it degrades to the high 70s. Below 10 dB, CallScribe flags the file as low-confidence and recommends re-recording or applying an external denoiser before retry. Overlapping speakers are split via pyannote diarization, not just channel separation, so mono call recordings still work.

ما هو CallScribe؟

CallScribe هو منصة تحويل المكالمات الصوتية إلى نص مكتوب، مصممة خصيصاً للأسواق العربية. يدعم اللهجة الخليجية والشامية والمصرية بدقة ٨٥-٩٥٪، بالإضافة إلى الإنجليزية والأردية والهندية.

هل يدعم اللهجة الخليجية؟

نعم. CallScribe مُحسَّن للهجة الخليجية. نستخدم نموذج Whisper large-v3-turbo المُعدَّل للمحادثات العربية الحقيقية. الاختبارات الداخلية على أكثر من ٢٠٠ مكالمة خليجية أظهرت دقة ٨٥-٩٥٪.

كم تكلفة الخدمة؟

خطة مجانية: ٣٠ دقيقة شهرياً. خطة الأعمال: ٢٩ دولار شهرياً مع ٥٠٠ دقيقة. خطة النمو: ٧٩ دولار شهرياً مع ٣٠٠٠ دقيقة.

هل بياناتي آمنة؟

نعم. جميع المعالجة تتم على بنية تحتية خاصة. لا يتم إرسال أي ملفات صوتية أو نصوص إلى خوادم خارجية. متوافق مع GDPR ومتطلبات سيادة البيانات في دول الخليج.

هل يدعم التبديل بين العربية والإنجليزية؟

نعم. CallScribe يكتشف تلقائياً عندما يتحول المتحدث بين العربية والإنجليزية في نفس الجملة — وهو نمط شائع في مكالمات الأعمال في دول الخليج.

Dialects

Maghrebi Arabic call transcription

Moroccan, Algerian, Tunisian, and Libyan dialects — phonologically distant from MSA and underserved by mainstream ASR.

Last updated: April 2026

Maghrebi Arabic — the Darija cluster spoken across Morocco, Algeria, Tunisia, and Libya — is the dialect family that mainstream commercial ASR handles worst. The phonological distance from Modern Standard Arabic is large enough that linguists routinely debate whether Darija is one language or several. For GCC contact centres, Maghrebi shows up on the small but operationally meaningful slice of inbound calls from Maghrebi expatriates, North African travellers, and outbound calls into the Maghreb from regional businesses headquartered in the Gulf. CallScribe ships a dedicated Maghrebi model rather than letting a Khaleeji-tuned acoustic model collapse on Casablanca speech.

What makes Maghrebi acoustically distant from Khaleeji and MSA

The single most consequential Maghrebi feature for ASR is short-vowel deletion. Where MSA writes "kataba" (he wrote), Moroccan Darija reduces this to "ktib" or just "kteb" — collapsing the verb into a consonant cluster that mainstream Arabic acoustic models, trained on vowel-rich MSA and Mashriqi dialects, cannot decode. Algerian and Tunisian Arabic apply similar reductions. The result is dense word-initial and word-medial consonant clusters ("ktbt" for "I wrote", "shrbt" for "I drank") that are phonotactically illegal in Eastern Arabic varieties.

Phoneme inventory diverges further. Moroccan Darija realises MSA ق (qāf) variably as [q], [ɡ], or [ʔ] depending on lexical item and region — a single speaker may use all three within one utterance. The MSA ث (thāʼ) merges with [t] in urban speech and with [s] in some rural varieties; ذ (dhāl) merges with [d] or [z]. Tunisian Arabic preserves [q] more reliably than Moroccan but introduces its own innovations. Libyan dialect, sometimes treated as a transition between Maghrebi and Egyptian, sits acoustically closer to Egyptian than to Moroccan but retains Maghrebi vowel-deletion patterns.

Stress placement is the third axis of divergence. Maghrebi Arabic uses penultimate stress in many environments where Mashriqi varieties stress the antepenult; this reshapes the spectral envelope of multisyllabic words enough that an MSA-tuned model misaligns word boundaries. CallScribe's Maghrebi acoustic adapter is trained on telephony-grade Maghrebi speech rather than on broadcast MSA — see [/dialects/msa](/dialects/msa) for why MSA-tuned models fail on Maghrebi audio that they handle well in formal Eastern dialects.

Lexical strata: Berber/Tamazight, French, Spanish, Italian

Maghrebi Arabic carries a Berber/Tamazight substrate that no other Arabic variety has. Toponyms, plant and animal names, agricultural terminology, and culturally rooted vocabulary in Moroccan and Algerian Arabic come from Tamazight rather than Arabic. A Casablanca speaker discussing a domestic issue may use words like "ttsa" (twin) or place names with Berber roots that an Arabic-only language model cannot resolve. CallScribe's Maghrebi lexicon includes the most frequent Berber-origin terms in colloquial use.

Above the Berber substrate sits a heavy Romance-language superstrate. Moroccan and Algerian Darija borrow extensively from French — not just nouns ("la voiture", "le téléphone") but full functional phrases ("c'est bon", "d'accord") and code-switching at clause boundaries. Tunisian Arabic borrows from French and Italian. Libyan Arabic carries some Italian loans from the colonial period. A typical Maghrebi customer-service call alternates Arabic and French at a rate seen nowhere else in the Arab world — even Lebanese Arabic, with its famous French code-switching, switches less densely than Moroccan Darija. CallScribe handles Arabic-French code-switching at the segment level and preserves French in Latin script in the transcript.

Why mainstream ASR fails on Maghrebi

Maghrebi training data is scarce relative to Eastern Arabic varieties. The Whisper paper (Radford et al., 2022) reports Arabic word-error rate as a single aggregate number, but breakdowns from independent benchmarks consistently show 15-25 percentage points higher WER on Moroccan Darija than on Egyptian or Levantine for the same model. Commodity ASR APIs — Google, AWS, Azure — typically claim "Arabic" support and quietly mean "MSA-with-some-Egyptian-coverage". Operations leaders running calls into the Maghreb on those tools see transcripts that look like word salad and assume the audio is unusable. The audio is fine; the model is wrong for the dialect.

CallScribe's positioning here is direct: we transcribe the dialect family that mainstream Arabic ASR effectively cannot. Our Maghrebi WER on clear telephony audio is 16-22% — substantially worse than our 8-12% Khaleeji number, but substantially better than the 30-45% you typically see on commodity ASR with Maghrebi audio. We are honest about the gap because the alternative is unusable rather than imperfect.

Where Maghrebi shows up in GCC operations

Three patterns. First, Maghrebi expatriates in the GCC — particularly Moroccans and Tunisians in the UAE, Qatar, and Saudi Arabia — use Darija on personal calls but shift toward MSA or Khaleeji on formal calls; mixed-register speech is common. Second, GCC businesses with Maghreb operations (banks with Casablanca branches, telcos with Algerian or Tunisian subsidiaries, hotels with North African clientele) handle outbound and inbound calls in Maghrebi dialect. Third, regional brand campaigns and customer research occasionally surface Maghrebi voice content for sentiment and topic analysis — see our [/use-cases/coaching](/use-cases/coaching) workflow for how we structure dialect-tagged coaching evidence packs.

CallScribe accuracy and limits on Maghrebi

Word-error rate on our internal Maghrebi telephony benchmark is 16-22% for clear single-speaker Moroccan or Algerian audio, 14-20% for Tunisian (which sits closer to Mashriqi norms), and 18-26% for noisy multi-party calls with French code-switching. Libyan dialect, where coverage is sparsest, sits at 22-30%. We treat Maghrebi as a dialect family rather than a single homogenous model: the country tag (MA, DZ, TN, LY) biases the lexicon and applies country-specific French-borrowing patterns. We do not currently support Hassaniya (Mauritanian Arabic) — that is a separate engagement.

At a glance

✓16-22% WER on clear Maghrebi telephony
✓Country-conditioned: MA, DZ, TN, LY
✓Arabic-French code-switching at clause level
✓Berber/Tamazight substrate vocabulary supported
✓Honest WER reporting — we tell you the gap

FAQs

Why is Maghrebi WER higher than Khaleeji or Levantine?▾

Maghrebi training data is roughly an order of magnitude scarcer than Eastern Arabic data. Commercial ASR vendors do not invest heavily in Maghrebi because the market is smaller; academic corpora are limited. CallScribe collected Maghrebi telephony data specifically to close this gap, but the gap to Khaleeji remains roughly 6-10 percentage points of WER. We disclose this rather than aggregate-average it away.

Does CallScribe handle Moroccan-French code-switching automatically?▾

Yes — French is auto-detected at the segment level. Moroccan and Algerian Darija often switch mid-clause ("ana ghadi nshouf le médecin"); we transcribe Arabic in Arabic script and French in Latin script with no transliteration loss. Numbers in French ("vingt-cinq dirham") are preserved verbatim with optional downstream MSA normalisation.

Is Tunisian Arabic acoustically closer to Maghrebi or Mashriqi?▾

Tunisian sits in between. It preserves more of the MSA phoneme inventory than Moroccan or Algerian (qāf is more often [q], short vowels are less aggressively deleted) but shares the Maghrebi penultimate-stress preference and French/Italian code-switching norms. Our Tunisian (TN) variant performs slightly better than Moroccan because of these convergences with Mashriqi training data.

Can I lock a project to Maghrebi to skip dialect detection?▾

Yes — set the project dialect to "maghrebi" and optionally pin the country (MA, DZ, TN, LY). The dialect-ID probe is bypassed and audio is routed directly to the Maghrebi acoustic adapter. This is recommended for outbound campaigns into the Maghreb where you know all calls will be Maghrebi.

Does CallScribe support Hassaniya (Mauritanian) Arabic?▾

Not currently. Hassaniya is a separate Bedouin-rooted variety with limited training data and limited GCC operational relevance. Talk to sales@callscribe.ae if you have a specific Mauritanian use case — we evaluate dialect coverage requests based on volume.

How does Maghrebi compare to MSA for formal North African content?▾

Formal Maghrebi broadcast content (Moroccan or Algerian state TV, formal press releases) is closer to MSA than to spontaneous Darija and is best transcribed with our [/dialects/msa](/dialects/msa) model. Spontaneous customer calls, regardless of register, route to the Maghrebi adapter because even "formal" Maghrebi speech retains Maghrebi phonological reductions.

Try CallScribe free →

5 min/mo free · No credit card · 8-12% WER on Khaleeji

More dialects

View all

Khaleeji (Gulf Arabic)Levantine (Shami)Egyptian Arabic (Masri)Modern Standard Arabic (MSA)Iraqi Arabic (Mesopotamian)