CallScribe Features

Every capability in the platform, organized by what it actually does for your team.

Transcription

CallScribe uses Whisper large-v3-turbo as its core acoustic model, tuned for Khaleeji (Gulf), Levantine, Egyptian, and Modern Standard Arabic. Each uploaded file runs through a pre-processing stage (loudness normalization, silence trimming, SNR measurement) before inference, and a post-processing stage that applies punctuation restoration and Arabic diacritic cleanup. Internal benchmarks on 200+ GCC call recordings show 85-95% word-level accuracy for Khaleeji with SNR above 15 dB — see the model card for the full methodology.

English, Urdu, and Hindi are also supported end-to-end. The language detector runs per-file, so mixed-language batches work without manual tagging.

Speaker Diarization

Diarization — figuring out who said what — uses pyannote.audio's pretrained pipeline. CallScribe runs diarization independently of channel separation, which means it works on mono recordings where both speakers share a single track. Typical call center recordings have two to four speakers; the diarizer identifies each and assigns a stable label throughout the transcript. Overlapping speech is segmented rather than dropped.

Sentiment Analysis

Every speaker turn is tagged with a sentiment label (positive, neutral, negative) and an intensity score. Sentiment runs on the transcribed text, not the raw audio, so tone of voice is not part of the signal — but lexical sentiment on Gulf-specific phrases has been tuned against an internal eval set. Sentiment is available on Business and Scale tiers.

Analytics

The aggregate analytics dashboard surfaces 16 KPIs across your uploaded calls: average call duration, speaker distribution, sentiment trend over time, code-switching rate, silence ratio, interruption count, top keywords per speaker, and more. Filters let you slice by date range, speaker label, or sentiment category. Everything is queryable over the REST API on the Scale tier.

Export Formats

Transcripts export to PDF (formatted with speaker labels and timestamps), CSV (per-turn rows for spreadsheet analysis), TXT (plain text), DOCX (Microsoft Word), and SRT (subtitle format for video workflows). Bulk export bundles an entire batch into a single zip.

REST API

The Scale tier ships with a full REST API for programmatic upload, polling, and transcript retrieval, plus webhook callbacks for async workflows. Rate limits are 100 requests per minute for authenticated calls. The API is versioned at /api/v1/ with clearly documented breaking-change policy. API keys are scoped per tenant and rotate on demand from the dashboard.

Security & Privacy

All audio and transcripts are encrypted at rest using AES-256 (PostgreSQL pgcrypto) and in transit over TLS 1.3. CallScribe runs on Hetzner EU infrastructure with optional GCC-resident GPU workers connected over a private Tailscale mesh. Row-level security (RLS) in PostgreSQL ensures tenant isolation at the database layer, not just the application layer. Read the security page and the DPA for the full posture.

Integrations

CallScribe integrates with common call center stacks via the REST API: Twilio Voice, Genesys Cloud, Amazon Connect, 3CX, and Zoho Desk. Webhook-based pipelines post audio URLs directly into CallScribe from your recording storage bucket.