CallScribe Features
Every capability in the platform, organized by what it actually does for your team.
Transcription
CallScribe uses Whisper large-v3-turbo as its core acoustic model, tuned for Khaleeji (Gulf), Levantine, Egyptian, and Modern Standard Arabic. Each uploaded file runs through a pre-processing stage (loudness normalization, silence trimming, SNR measurement) before inference, and a post-processing stage that applies punctuation restoration and Arabic diacritic cleanup. Internal benchmarks on 200+ GCC call recordings show 85-95% word-level accuracy for Khaleeji with SNR above 15 dB — see the model card for the full methodology.
English, Urdu, and Hindi are also supported end-to-end. The language detector runs per-file, so mixed-language batches work without manual tagging.
Speaker Diarization
Diarization — figuring out who said what — uses pyannote.audio's pretrained pipeline. CallScribe runs diarization independently of channel separation, which means it works on mono recordings where both speakers share a single track. Typical call center recordings have two to four speakers; the diarizer identifies each and assigns a stable label throughout the transcript. Overlapping speech is segmented rather than dropped.
Sentiment Analysis
Every speaker turn is tagged with a sentiment label (positive, neutral, negative) and an intensity score. Sentiment runs on the transcribed text, not the raw audio, so tone of voice is not part of the signal — but lexical sentiment on Gulf-specific phrases has been tuned against an internal eval set. Sentiment is available on Business and Scale tiers.
Analytics
The aggregate analytics dashboard surfaces 16 KPIs across your uploaded calls: average call duration, speaker distribution, sentiment trend over time, code-switching rate, silence ratio, interruption count, top keywords per speaker, and more. Filters let you slice by date range, speaker label, or sentiment category. Everything is queryable over the REST API on the Scale tier.
Export Formats
Transcripts export to PDF (formatted with speaker labels and timestamps), CSV (per-turn rows for spreadsheet analysis), TXT (plain text), DOCX (Microsoft Word), and SRT (subtitle format for video workflows). Bulk export bundles an entire batch into a single zip.
REST API
The Scale tier ships with a full REST API for programmatic upload, polling, and transcript retrieval, plus webhook callbacks for async workflows. Rate limits are 100 requests per minute for authenticated calls. The API is versioned at /api/v1/ with clearly documented breaking-change policy. API keys are scoped per tenant and rotate on demand from the dashboard.
Security & Privacy
All audio and transcripts are encrypted at rest using AES-256 (PostgreSQL pgcrypto) and in transit over TLS 1.3. CallScribe runs on Hetzner EU infrastructure with optional GCC-resident GPU workers connected over a private Tailscale mesh. Row-level security (RLS) in PostgreSQL ensures tenant isolation at the database layer, not just the application layer. Read the security page and the DPA for the full posture.
Integrations
CallScribe integrates with common call center stacks via the REST API: Twilio Voice, Genesys Cloud, Amazon Connect, 3CX, and Zoho Desk. Webhook-based pipelines post audio URLs directly into CallScribe from your recording storage bucket.