How accurate is CallScribe for Gulf Arabic?

CallScribe uses Whisper large-v3-turbo optimized for Arabic dialects. Internal testing across 200+ Gulf Arabic call recordings (March 2026) shows 85-95% word-level accuracy for Khaleeji dialect with clear audio (SNR > 15dB).

Is my call data private with CallScribe?

Yes. CallScribe processes everything on your own server. No audio, transcripts, or metadata are sent to external servers. Zero external API calls during transcription.

What file formats does CallScribe support?

CallScribe accepts MP3, WAV, M4A, FLAC, OGG, and WebM audio files. Export formats include PDF, CSV, TXT, DOCX, and SRT for subtitle workflows.

How much does CallScribe cost?

CallScribe offers a free Starter plan (5 min/month), Business plan at $29/month (500 min), and Scale plan at $79/month (3000 min). No per-minute charges.

Does CallScribe support Khaleeji dialect?

Yes. CallScribe uses Whisper large-v3-turbo optimized for Gulf Arabic (Khaleeji). Internal testing across 200+ call recordings shows 85-95% word-level accuracy with clear audio (SNR > 15dB). March 2026 benchmark.

Can CallScribe handle code-switching between Arabic and English?

Yes. CallScribe detects when speakers switch between Arabic and English mid-sentence — a common pattern in GCC business calls. Both languages are transcribed accurately.

Is CallScribe GDPR compliant?

Yes. All processing happens on private infrastructure hosted in the EU (Hetzner, Germany) with optional GCC residency via Tailscale-routed workers. No audio, transcripts, or metadata are sent to external US-based servers during transcription. CallScribe publishes a Data Processing Agreement (DPA) covering GDPR Article 28 processor obligations, sub-processor disclosures (Stripe for billing, Resend for transactional email, Sentry for error telemetry), 72-hour breach notification, and data deletion on termination. The platform is also aligned with UAE PDPL data sovereignty requirements. Data Subject Access Requests can be submitted to privacy@callscribe.ae and are answered within 30 days.

What dialects of Arabic does CallScribe transcribe most accurately?

CallScribe is tuned primarily for Khaleeji (Gulf) Arabic — including Emirati, Saudi, Kuwaiti, Qatari, Bahraini, and Omani variants — where internal benchmarks on 200+ clear-audio calls (SNR > 15 dB) show 85-95% word-level accuracy. Levantine Arabic (Lebanese, Syrian, Palestinian, Jordanian) lands around 84-88%. Egyptian Arabic reaches 86-90%. Modern Standard Arabic (MSA), common in news and formal recordings, reaches 91-94%. Maghrebi dialects (Moroccan, Algerian, Tunisian) are not officially supported in this release. Accuracy drops on heavily overlapping speakers, poor SNR below 10 dB, or long-form mumbling. See the /model-card page for the full methodology and per-dialect WER table.

How does CallScribe compare to AWS Transcribe and Google Speech-to-Text for Arabic?

AWS Transcribe and Google Speech-to-Text both support Arabic but are tuned primarily toward Modern Standard Arabic, with limited coverage of Gulf, Levantine, and Egyptian dialects. In internal comparisons on Khaleeji call center recordings, both vendors produced 15-25% higher word error rates than CallScribe's Whisper large-v3-turbo pipeline. CallScribe also processes audio on private infrastructure in the EU or on GCC-resident workers — AWS and Google route audio through US regions by default, which is a compliance blocker for UAE PDPL and many GCC enterprise procurement policies. Pricing is flat-rate per minute bucket instead of per-second billing, which tends to save 30-60% at call center volume.

Can I deploy CallScribe on my own infrastructure?

Yes. CallScribe offers a self-hosted deployment option for Scale-tier customers and enterprise buyers. The stack runs entirely on Docker Compose: a Fastify API, a Python worker running Whisper large-v3-turbo and pyannote.audio for diarization, PostgreSQL, Redis, and nginx. Typical hardware: a single GPU worker (RTX 4090 or L4) handles up to 10x realtime throughput. The API tier runs comfortably on a 4-core VPS. Tailscale is used to connect worker nodes back to the control plane over a private mesh, so the GPU host can live in your own rack while the API stays in the cloud. Contact sales@callscribe.ae for a self-host deployment guide and license terms.

What is the turnaround time for a 1-hour call?

On the shared Business tier, a 1-hour call typically completes in 4-8 minutes end-to-end — including upload, transcription with Whisper large-v3-turbo, speaker diarization via pyannote, sentiment analysis, and audio quality scoring. Scale tier customers get priority queue placement and usually see 2-4 minutes for the same file. Self-hosted deployments on an RTX 4090 consistently process 60 minutes of audio in under 3 minutes (over 20x realtime). Queue wait time is the largest variable during peak hours on the free tier. WebSocket progress updates stream live from the worker so users see per-file percent-complete rather than a silent spinner.

How does CallScribe handle noisy call center recordings?

Call center audio is rarely clean — hold music bleed, codec artifacts, echo, cross-talk, and background PA announcements are all common. CallScribe runs a pre-processing pipeline that analyzes signal-to-noise ratio (SNR), root-mean-square loudness, and speech activity before transcription, then reports a per-file audio quality score so users know how much to trust the transcript. For SNR above 15 dB accuracy stays in the 85-95% band. Between 10 and 15 dB it degrades to the high 70s. Below 10 dB, CallScribe flags the file as low-confidence and recommends re-recording or applying an external denoiser before retry. Overlapping speakers are split via pyannote diarization, not just channel separation, so mono call recordings still work.

ما هو CallScribe؟

CallScribe هو منصة تحويل المكالمات الصوتية إلى نص مكتوب، مصممة خصيصاً للأسواق العربية. يدعم اللهجة الخليجية والشامية والمصرية بدقة ٨٥-٩٥٪، بالإضافة إلى الإنجليزية والأردية والهندية.

هل يدعم اللهجة الخليجية؟

نعم. CallScribe مُحسَّن للهجة الخليجية. نستخدم نموذج Whisper large-v3-turbo المُعدَّل للمحادثات العربية الحقيقية. الاختبارات الداخلية على أكثر من ٢٠٠ مكالمة خليجية أظهرت دقة ٨٥-٩٥٪.

كم تكلفة الخدمة؟

خطة مجانية: ٣٠ دقيقة شهرياً. خطة الأعمال: ٢٩ دولار شهرياً مع ٥٠٠ دقيقة. خطة النمو: ٧٩ دولار شهرياً مع ٣٠٠٠ دقيقة.

هل بياناتي آمنة؟

نعم. جميع المعالجة تتم على بنية تحتية خاصة. لا يتم إرسال أي ملفات صوتية أو نصوص إلى خوادم خارجية. متوافق مع GDPR ومتطلبات سيادة البيانات في دول الخليج.

هل يدعم التبديل بين العربية والإنجليزية؟

نعم. CallScribe يكتشف تلقائياً عندما يتحول المتحدث بين العربية والإنجليزية في نفس الجملة — وهو نمط شائع في مكالمات الأعمال في دول الخليج.

A Practical Guide to Arabic Dialect Coverage for ASR

"Arabic support" is a phrase with an enormous amount of hiding room. When a speech-to-text vendor tells you their model handles Arabic, the realistic question to ask is: which Arabic? The language as spoken in a Dubai call center sounds nothing like the language on Al Jazeera news. A Lebanese speaker and a Moroccan speaker, dropped into the same conversation, would not fully understand each other. If you are deploying ASR on Arabic audio, you need to understand this variation before you pick a model, because the model's training data baked in specific assumptions about which Arabic "counts". This post is a practical field guide to the dialect landscape from an ASR engineering perspective.

Khaleeji (Gulf Arabic)

Khaleeji is the cluster of dialects spoken across the Gulf Cooperation Council states — the UAE, Saudi Arabia, Kuwait, Qatar, Bahrain, and Oman. It is not a single dialect. An Emirati speaker and a Saudi Hejazi speaker will sound clearly distinct to a native ear, and an ASR model that only saw Saudi training data will underperform on Emirati audio in ways you can measure.

Major regional variants worth naming:

Emirati Arabic — UAE cities, especially Dubai and Abu Dhabi. Characterized by rapid delivery, heavy borrowing from English and Urdu in business contexts, and a specific set of demonstrative pronouns that differ from other Gulf variants.
Saudi Arabic — itself splits into Najdi (central, including Riyadh), Hejazi (western, including Mecca and Jeddah), and Southern Saudi. Najdi is what most ASR datasets label as "Saudi" and is the closest Gulf variant to MSA in phonology.
Kuwaiti Arabic — similar to Najdi but with distinctive vowel patterns and heavy borrowing from Persian and English.
Qatari and Bahraini Arabic — closely related, both close to the Emirati/Saudi Najdi spectrum.
Omani Arabic — has its own distinct vocabulary influenced by Swahili, Baluchi, and Persian due to Oman's historical trade networks. Often the hardest Gulf variant for models trained on standard Saudi data.

A well-trained Arabic ASR model should be evaluated on at least three of these variants before anyone claims "Gulf Arabic support". A model evaluated only on Riyadh studio recordings is going to fall over in Muscat or Salalah.

Levantine (Shami) Arabic

Levantine Arabic is the speech of Lebanon, Syria, Palestine, and Jordan. It is often the second-largest dialect bucket in multilingual Arabic datasets after MSA, partly because of the media footprint of Lebanese and Syrian drama production. Major sub-dialects:

Lebanese Arabic — Beirut and northward. Very high code-switching with French and English, softer consonants, distinctive "halla2"/"hayda" vocabulary.
Syrian Arabic — Damascus and Aleppo are the two main urban centers. Damascene is closest to the "generic Levantine" that most models were trained on.
Palestinian Arabic — internal variation is significant: Ramallah vs. Nablus vs. Gaza.
Jordanian Arabic — urban Amman is close to Palestinian; rural Jordanian diverges more.

For a GCC business, Levantine audio mostly shows up in two scenarios: customers who emigrated from the Levant now living in the Gulf, and offshore call center agents operating from Jordan, Lebanon, or Egypt serving Gulf customers. Both scenarios matter for ASR coverage.

Egyptian Arabic

Egyptian Arabic is the single most widely understood Arabic dialect across the entire Arabic-speaking world, thanks to a century of Egyptian cinema and TV. Cairo urban speech dominates the dataset pool. ASR models generally do well on it. Upper Egyptian (Sa'idi) diverges more and is underrepresented in training data. In GCC call centers, Egyptian speakers are very common on the agent side and show up frequently in customer-facing calls even for non-Egyptian customers.

Maghrebi (Darija)

Moroccan, Algerian, and Tunisian dialects are collectively called Maghrebi or Darija and are effectively unsupported by most commercial Arabic ASR systems. The phonology, vocabulary, and French-Berber substrate differ enough from Eastern Arabic that a model trained on Gulf/Levantine/Egyptian/MSA will produce gibberish on Casablanca street speech. This is a real gap in the market, but it is also not the target for a GCC-focused platform — we do not support Maghrebi in CallScribe today, and we flag it clearly in our model card rather than pretend otherwise.

Code-Switching in GCC Business Calls

The single most distinctive feature of real GCC call audio is code-switching between Arabic and English. It is not occasional. It is constant, and it happens at every granularity:

Lexical — individual English words dropped into Arabic sentences: "okay", "really", "meeting", "invoice", "account number".
Phrasal — English phrases as interjections: "no problem", "to be honest", "you know what I mean".
Clausal — entire clauses in English within an otherwise Arabic utterance: "I will call you back لأن the system is down".
Turn-level — one speaker fully in Arabic, the other fully in English, alternating turn by turn.

A model that cannot handle code-switching gracefully will either drop the English words entirely, misrepresent them as phonetic Arabic, or refuse to transcribe that segment. Whisper large-v3-turbo handles this well in practice because its language detector runs at a fine-grained timescale and the decoder vocabulary covers both languages.

What This Means for ASR Procurement

When you evaluate an Arabic ASR system, insist on a benchmark that covers the specific dialects in your audio. Do not accept a single "Arabic WER" number. Ask for per-dialect breakdowns, per-audio-quality breakdowns, and code-switching tests. Ask which variants were in the training set and which were only in the eval set. Require the vendor to run their model on a held-out sample of your actual calls before you sign a contract. A model that looks great on an MSA news benchmark will fail on a Khaleeji complaint line, and the only way to find out is to test it.