Guide10 min read

ElevenLabs Tutorial for Beginners: From Sign-Up to First Audio File

Everything a first-time ElevenLabs user needs to know: sign-up, voice selection, settings, generation, download, and long-form content with Projects.

Updated 11 April 2026

Step 1: Sign Up and Understand the Dashboard

Go to elevenlabs.io and click Sign Up. You can register with a Google account, GitHub, or a standard email and password. Once registered, you're automatically on the free plan — no credit card required. You get 10,000 characters per month to work with from day one.

The dashboard has four main sections in the left navigation. Speech is the basic text-to-speech generator — you type text, pick a voice, generate audio. Projects is the long-form narration editor for content longer than a few paragraphs (this requires Creator plan or above). Voices is your voice library where you can browse pre-made voices and manage any custom voices you create. Studio contains additional tools including Dubbing for multilingual content.

The character counter in the top area of the interface shows your remaining monthly allowance. Keep an eye on this as you learn — it's easy to spend characters quickly when experimenting and regenerating.

Step 2: Choose Your Voice

Click the voice selector in the Speech interface — it's typically showing a default voice name near the text input area. A panel opens showing the voice library. You can filter by gender, age, accent, use case (narration, news, conversational, etc.), and language. With over 1,000 voices in the library, filtering is essential unless you enjoy scrolling indefinitely.

Each voice has a preview button that plays a short sample. Listen to the preview, but don't rely on it too heavily — the preview clip is typically chosen to show the voice at its best, and your content may produce different results. The first rule of voice selection: test with your actual content, not with the preview.

For general-purpose narration, these voices consistently perform well: Rachel (American English, warm and natural, great for educational and conversational content), Josh (American English, authoritative, good for news-style and documentary), Charlotte (British English, professional, good for corporate and formal content), and Liam (American English, clear and energetic, good for marketing and upbeat content). Don't spend more than 15 minutes choosing on your first session — pick one and generate something. Voice selection is iterative.

Step 3: Generate and Download Your First Audio

Type or paste your text into the large text field in the Speech interface. The character counter updates as you type to show you the cost of generating this text. When you're ready, click Generate and wait. For texts under 500 characters, generation typically takes 2–4 seconds. Longer texts take proportionally longer — a 2,000-character passage might take 8–15 seconds.

When generation completes, the audio player appears below the text field. Hit play to listen. If it sounds good, click the download button (typically a downward arrow icon) to save the MP3 file. Default export is MP3 at 128kbps. Higher bitrate options are available in the settings if you need studio-quality files.

If something sounds wrong — a word is mispronounced, the pacing feels odd, the emotion doesn't match the content — click Generate again. The same text produces slightly different results each time due to the model's sampling approach. Two or three regenerations usually produces one version noticeably better than the others. On the free plan, be selective about regenerating — each attempt costs characters. On a paid plan, regeneration is cheap enough that you should always generate 2–3 versions for anything that will be publicly published.

Step 4: Understand Voice Settings

Click the gear or settings icon near the voice selector to access voice settings. Three sliders control the character of your generation: Stability, Similarity Boost, and Style Exaggeration.

Stability (0–100%) controls how consistent the voice is between different generations of the same text. Lower stability means more variation and potentially more expressiveness — the voice is more likely to emphasise or colour different parts of the text. Higher stability means more predictable, consistent output but can sound flat. For conversational content, 30–50% works well. For formal narration where consistency matters, 60–80% is safer.

Similarity Boost (0–100%) controls how closely the output adheres to the trained voice characteristics. For pre-made voices, keeping this at 75% or above prevents drift from the expected voice quality. For cloned voices, higher similarity boost helps preserve the specific characteristics that make the clone recognisable.

Style Exaggeration (0–100%) is turned off by default and should stay that way for most use cases. It amplifies the expressive characteristics of the voice, which can add punch to dramatic content but sounds unnatural on informational narration. Experiment with values of 10–30% if you're producing content that needs emotional range.

Step 5: Long-Form Content with Projects

The Speech interface is designed for short to medium-length content — individual clips, paragraphs, short scripts. For anything longer than a few hundred words — a full video script, a podcast episode, a book chapter — the Projects feature is the right tool. Note: Projects requires Creator plan or above at $22/month.

Navigate to Projects in the left sidebar and create a new project. Give it a name that corresponds to your piece of content. Paste your full text. ElevenLabs automatically breaks the text into paragraphs, each shown as a separate segment you can control individually.

Listen through the segments. For any that don't sound right, click the regenerate button on that individual segment — only that segment is regenerated, and everything else stays intact. This surgical approach to regeneration is the key advantage of Projects over the standard Speech interface for long content. Once you're satisfied with all segments, use the Export button to download the full audio as a single stitched file. You can choose MP3 or WAV format and set the export quality.