Dialects

Levantine (Shami) Arabic transcription

The Arabic of Lebanon, Syria, Jordan, and Palestine — and a large share of GCC expat workforces.

Last updated: April 2026

Levantine Arabic — the Shami cluster covering Lebanon, Syria, Jordan, and Palestine — shows up on GCC business calls more often than most operations leaders expect. Lebanese and Syrian agents staff a substantial fraction of Dubai and Riyadh contact centres; Jordanian customers call Saudi banks; Palestinian small-business owners speak Levantine with their UAE suppliers. CallScribe ships a dedicated Levantine model rather than asking the Khaleeji model to fall over on Beirut speech.

Phonology that distinguishes Levantine from Gulf Arabic

The single most diagnostic Levantine feature is the realisation of MSA ق (qāf) as a glottal stop [ʔ] in urban Levantine — Beirut, Damascus, Amman, Jerusalem city speech all share this. A speaker saying "قال" produces [ʔaːl], not [ɡaːl] (Gulf) or [qaːl] (MSA). Rural Levantine — especially Druze and some Bedouin-rooted varieties — keeps [q] or shifts to [ɡ], which is why a single "Levantine" model needs to handle a bimodal qāf realisation.

The Levantine ج (jīm) is consistently [ʒ] (like English measure) — distinct from Gulf [j]/[d͡ʒ] and Egyptian [ɡ]. Vowel inventory features a Levantine-specific imāla — short /a/ raises toward [e] in many environments, which dramatically changes the spectral signature an acoustic model must recognise. Without dialect-specific training data, an MSA-tuned model misreads Levantine "bēt" (house) as something closer to "biːt".

Lexicon and code-switching: French, English, Aramaic substrates

Lebanese Arabic in particular code-switches with French at a rate seen nowhere else in the Arab world. A single Beirut customer-service utterance may contain "merci ktīr, w bukra mn-shoufak". Syrian Arabic borrows from Aramaic substrate vocabulary (especially in rural areas around Maaloula and Saidnaya) and from Turkish in Aleppo. Jordanian and Palestinian Arabic maintain a more conservative Bedouin-rooted lexicon in rural speech but shift toward urban Levantine norms in city centres.

CallScribe's Levantine model is bilingual-ready for Arabic-English and Arabic-French at the lexical level. Numbers in particular often appear in French in Lebanese speech ("vingt-cinq" for 25) — we transcribe them as spoken rather than auto-converting, then normalise downstream so analytics still work.

Morphology: future marker, negation, pronouns

Levantine future is marked with رح (raḥ) or حـ (ḥa-) prefix: "raḥ ʔūl-lak" (I will tell you). Negation is بـ ... ـش (b-... -š) in some Palestinian and South-Lebanese varieties, just مـا (mā) or مش (miš) in Damascus and Beirut. Object pronouns suffix to verbs and prepositions in patterns that diverge from MSA enough to confuse generic Arabic NLP — "shuft-ak-yāh" (I saw it on you) compresses three morphemes into one phonological word.

Pronoun systems also differ: Levantine "ana", "inta/inti", "huwwe/hiyye", "niḥna", "intū", "hinne" — slightly different vowels and stress from Gulf and very different from Egyptian "iḥna/intū/humma". An acoustic model that treats vowel quality as noise will produce the wrong pronoun on a transcript.

Why Levantine matters for GCC operations

GCC operations leaders frequently treat all Arabic as Khaleeji and end up with degraded transcripts on the substantial portion of their traffic that is Lebanese, Syrian, Jordanian, or Palestinian. Levantine speakers may be customers (Levantine-origin GCC residents, regional businesses), agents (a substantial share of contact-centre workforces), or third-party vendors. Auto-detecting dialect per call is not a luxury — it is required to keep CSAT scoring accurate, because sentiment lexicons differ across dialects.

CallScribe's Levantine accuracy and limits

Word-error rate on our internal Levantine telephony benchmark is 10-14% for clear single-speaker audio and 16-22% for noisy multi-party calls with French code-switching. Lebanese audio is generally easier than Syrian rural audio for the model; urban Damascus speech sits in the middle. The Levantine model uses a country-conditioned variant the same way the Gulf model does — you can tag a project as LB, SY, JO, or PS to bias the lexicon.

At a glance

  • 10-14% WER on clear Levantine telephony
  • Country-conditioned: LB, SY, JO, PS
  • Arabic-French code-switching for Lebanese audio
  • Glottal-stop and uvular qāf realisations both handled
  • Levantine-specific sentiment cues

FAQs

Does CallScribe handle Lebanese-French code-switching automatically?

Yes — French is auto-detected at the segment level and transcribed as spoken in the original language. Numbers in French and short French phrases ("voilà", "exactement", "merci ktīr") are preserved verbatim rather than transliterated to Arabic script.

Can the Levantine model handle Syrian rural dialects?

Partially. The model is trained primarily on urban Levantine (Beirut, Damascus, Amman, Jerusalem city). Rural Druze, Alawite, and Bedouin-influenced varieties have higher WER because training data is scarcer. Word accuracy for these varieties is typically 18-25%.

How does CallScribe distinguish Levantine from Egyptian on a call?

A 5-second acoustic probe at the start of the call runs a dialect-ID classifier that distinguishes Khaleeji, Levantine, Egyptian, and MSA. If the probe is ambiguous (very short or noisy), the system uses the project's default dialect setting.

Does CallScribe transliterate Levantine into MSA?

No — by default, transcripts are produced in the dialect as spoken, using Arabic script. We do not auto-MSA-ify because that loses sentiment and intent signal. An optional MSA-normalisation pass is available for downstream search use cases.

Is Lebanese specifically tuned, or is it lumped under Levantine?

Lebanese is treated as a Levantine sub-variety with its own country tag (LB) so the lexicon and language model bias toward Lebanese-specific French borrowings and idioms. Acoustic features are shared across the Levantine cluster.

Will Palestinian Arabic work for non-GCC customers?

Yes — the Palestinian variant (PS) is supported. Note that some Gazan and rural West Bank speech overlaps with Bedouin features; the model handles the mainstream urban Palestinian dialect well, with degraded WER on more conservative rural speech.

Try CallScribe free →

5 min/mo free · No credit card · 8-12% WER on Khaleeji

More dialects

View all