ElevenLabs Tutorial for Beginners: From Signup to First Audio
Updated April 2026
ElevenLabs is one of the more intuitive AI tools available, but there are a few things that aren't obvious when you first sign up. This guide walks you through everything from account creation to your first professional-quality audio export.
In this guide
Step 1: Create Your Account
Go to elevenlabs.io and click Sign Up. You can register with Google, GitHub, or an email address. Once registered, you're automatically on the free plan — 10,000 characters per month, no credit card required.
The first thing you'll see is the dashboard. The main navigation on the left has four key sections: Speech, which is the basic text-to-speech generator; Projects, which is the long-form narration editor; Voices, which is your voice library; and Studio, which has additional audio tools. For this tutorial, we're starting with Speech.
Step 2: Choose a Voice
Click on the voice selector dropdown in the Speech interface. You'll see the pre-made voice library, which contains over a thousand options. The interface lets you filter by gender, age, accent, and use case. Before you commit, use the preview button to hear a short sample of each voice.
For most general-purpose content, the following voices consistently perform well: Rachel (American English, warm and natural), Josh (American English, authoritative), Callum (British English, professional), and Sarah (American English, friendly). For voiceover work, Charlotte (British English) and Liam (American English) are strong choices.
Don't spend too long choosing on your first session — pick one and generate something. You can always come back and try different voices once you understand how the generation process works.
Step 3: Generate Your First Audio
Type or paste your text into the text field. The character counter shows you how many characters you're about to use. Click Generate. For most texts under 500 characters, generation takes 2–4 seconds. Longer texts take proportionally longer.
When generation completes, hit play in the audio player. If it sounds good, click the download button (arrow icon) to save the MP3. If something sounds off — a word is mispronounced, the pacing is wrong, the emotion doesn't match — don't download it yet. Regenerate. The same text will produce slightly different results each time. Two or three regenerations usually produces one that's clearly better than the others.
A note on character consumption: regeneration uses characters each time. If you're on the free tier, be selective about what you regenerate. On a paid plan, regeneration is cheap enough that you should always generate 2–3 versions of important content and pick the best.
Step 4: Adjust Voice Settings
Click the settings icon next to the voice selector to see the voice settings panel. The key controls are Stability, Similarity Boost, and Style Exaggeration.
Stability controls how consistent the voice is between generations. Lower stability means more variation and expressiveness; higher stability means more consistent but potentially flat delivery. For conversational content, a stability setting around 30–50% tends to work well. For corporate or educational content where consistency matters, 60–80% is safer.
Similarity Boost controls how closely the output matches the original voice characteristics. For pre-made voices, keep this at 75% or above. For cloned voices, higher similarity boost helps preserve the voice's unique characteristics.
Style Exaggeration amplifies the expressive characteristics of the voice. This is off by default. For dramatic content, increasing it to 20–40% can add punch. For professional narration, leaving it at 0 usually produces cleaner results.
Step 5: Use Projects for Long-Form Content
The Speech interface is good for individual clips, but for anything longer than a few paragraphs — a video script, a course module, a chapter — you want the Projects feature. This is only available on Creator tier and above.
In Projects, paste your full text. ElevenLabs breaks it into paragraphs. You can assign different voices to different sections (useful for multi-speaker content), regenerate individual paragraphs without affecting others, and then export the whole thing as a single audio file. This workflow is substantially faster than managing individual clips from the Speech interface.