Voice to Screenplay: A Practical Guide to Dictating Your Script

English craft

Every screenwriter has had the experience of a scene arriving fully formed — dialogue, action, the exact way light falls through a window — while they’re doing something that has nothing to do with writing. Driving. Cooking. Falling asleep.

Voice dictation is the tool that lets you capture those moments. This guide is about using voice input well — not just pressing record and hoping for the best, but speaking in a way that produces clean, formatted output.

Why voice works differently than typing

When you type a screenplay, you switch mental modes constantly: you’re a writer, then a formatter, then a writer again. Voice removes the formatting step entirely. You speak the story; the software handles the structure.

But this only works if you speak in a way the software can parse. Murmuring a stream of consciousness is great for a journal. For a screenplay, you need to give the AI enough signals to know what kind of element you’re describing.

The good news: the conventions of spoken storytelling map closely to screenplay structure. “We’re outside, on Marina Beach, it’s evening” is a scene heading in disguise. “She says:” is a dialogue cue. You already know how to describe scenes — you just need to lean into that structure when you speak.

Speaking scene headings

A scene heading needs three pieces of information: interior or exterior, the location, and the time of day. When you dictate, state all three clearly and in order.

“Interior. A small, cluttered police station. Night.”

“Exterior. The old railway bridge. Early morning.”

The AI in ScriptDraft is trained to recognise this pattern even if you don’t say INT. or EXT. explicitly. But using those terms makes recognition more reliable, especially in noisy environments.

If you’re moving between scenes quickly, a brief pause after the heading helps the system classify it before you continue with action.

Speaking action lines

Action lines describe what the audience sees. Speak them in present tense, as if narrating what’s on screen right now.

“A man in his thirties sits at a desk covered in papers. He hasn’t slept. You can tell by the way he holds the pen — too tight.”

Don’t worry about sentence length while recording. Short, declarative sentences transcribe more cleanly than long, winding ones. You can always combine lines when you edit.

What to avoid: internal states that have no visual expression. “He feels afraid” is not a screenplay action line. “He checks the door twice before sitting” is. When you dictate, think camera, not mind.

Speaking dialogue

Dialogue is where voice dictation has the most room for error. The system needs to know who is speaking before it can format the dialogue correctly. The most reliable approach is to say the character name clearly before each line.

“MARA says: I don’t think you understand what you’re asking.”

“JON replies: I understand perfectly. That’s why I’m asking you and not anyone else.”

For a scene with quick back-and-forth, you can batch record a few exchanges rather than pausing after each line. Just keep the character attribution clear.

Parentheticals — the small performance directions in brackets — can be spoken naturally:

“MARA, quietly, says: That’s not the answer I was looking for.”

The app will recognise “quietly” as a parenthetical if it appears between the character name and the dialogue.

Handling long monologues

Long speeches are actually easier to dictate than multi-character scenes. You establish the character name once and speak through the entire monologue.

“ELIAS begins: I’ve been watching this street for thirty years. The tea stall has had four owners. The tailor’s shop became a phone repair shop became a biryani place. But the tree at the corner — that’s been there longer than me, longer than anyone I know. And every morning, without exception, someone ties a string of jasmine to its lowest branch. I’ve never seen who does it.”

Dictate the full speech in one take if you can. Multiple short takes with the same character name introduce more places for classification errors.

Reviewing and correcting after dictation

No voice system is perfect. After recording a scene, read through what was transcribed before moving to the next one. Most errors fall into predictable categories:

Misclassified elements — an action line that was recognised as dialogue, or a scene heading buried in action. These are easy to fix with a tap: select the block and change its type.

Homophones and mishearings — “reel” for “real”, “MAYA” for “MAIA”. Read the transcript against your memory of what you intended. Fix any that would confuse a reader.

Run-on action — when you speak quickly, consecutive action ideas sometimes merge into one long block. Break these into shorter paragraphs for readability.

A quick review pass — two to three minutes for a full scene — is much faster than trying to dictate perfectly.

Working in your native language

If you think in Malayalam, Tamil, Telugu, Kannada, or Hindi, dictating in your mother tongue is significantly faster than dictating in English. The mental translation step disappears.

ScriptDraft processes voice on-device using the iOS or Android system speech recogniser. This means the audio stays on your device — it’s not sent to ScriptDraft’s servers. For Indian language writers, this combination of native language dictation and local processing matters both for speed and for privacy.

The formatted output follows standard Fountain conventions regardless of the language you dictate in — scene headings, action, character names, dialogue all get placed correctly.

Building a dictation habit

The writers who get the most out of voice-to-screenplay tools are the ones who use them consistently, not just in the office but everywhere. A voice note recorded while walking contains the energy of that moment in a way that a typed scene two hours later rarely does.

Keep the app accessible. Record the rough version first — messy, incomplete, full of “um”s. The AI can work with that. Then review and clean. That two-step rhythm — record raw, polish briefly — is faster than typing the scene from scratch and produces first drafts that feel more alive.

Your voice is the fastest way to get a story out of your head. The formatting can follow.