What is Dia AI by Nari Labs?
DIA-1.6B is a text-to-speech model. It’s designed to create highly realistic speech from a provided transcript. You can assign different speakers to different lines, making it feel like a real conversation.
Even more interesting, it’s capable of producing non-verbal sounds like:
- Laughter
- Coughing
- Clearing throat

Overview of Dia AI
Feature | Details |
---|---|
Model Name | DIA-1.6B |
Developer | Nari Labs |
Size | ~6.5 GB |
License | Apache 2.0 (open-source) |
Requirements | ~10GB Video RAM (VRAM) |
Non-verbal Generation | Laughter, Coughing, etc. |
Interface | Gradio-based UI |
Hosting | Hugging Face |
Requirements to Run DIA-1.6B
Before getting started, here’s what you’ll need:
- Around 10GB of Video RAM (VRAM).
- Basic Python environment.
- Familiarity with Gradio interface.
- Optional: Apple Silicon support (based on user reports).
Key Features of DIA-1.6B
Realistic Dialogue
Assign different speakers to different lines.
Non-verbal Sounds
Includes laughter, coughing, and throat clearing.
Simple Installation
Very beginner-friendly compared to many TTS models.
Open Source License
Apache 2.0 ensures flexibility and openness.
Cross-Platform Support
Some users even reported successful use on Apple Silicon devices.
Interactive Demo
Pros and Cons
Pros
- Fast video editing
- AI-powered workflow
- User-friendly interface
- Real-time preview
- No-code platform
Cons
- Limited customization options
- Requires internet connection
- Few export formats
How to Use DIA-1.6B
Setting up DIA-1.6B turned out to be surprisingly easy. Here's how you can get started.
Step 1: Install Necessary Packages
Ensure you have Python installed. Then run:
git clone https://github.com/nari-labs/DIA-TTS.git cd DIA-TTS pip install -r requirements.txt
Step 2: Launch the Gradio UI
Run the script to start the Gradio interface:
python app.py
The script will automatically download model weights (~6.5 GB) and set up the Gradio server. You’ll see a link in the terminal once ready.
Step 3: Generate Dialogue
In the Gradio UI:
- Input your script.
- Assign speakers as S1, S2, etc.
- Click Generate Audio.
Local Installation Details
For local installation, you don't need to manually download the model files. Running the provided script will automatically fetch everything from Hugging Face.
Important notes during installation:
- Expect about 7.4 GB VRAM usage once the model is running.
- Full VRAM usage peaks around 10 GB during active generation.
- The model file downloads are automatic and straightforward.
I noticed the model even provides a sharable Gradio link if needed, though personally, I prefer using it locally for better control.
First Testing Experience
After launching the Gradio UI and loading the model, I was ready for my first test.

Here's what happened:
- Entered two lines assigned to Speaker 1 and Speaker 2.
- Clicked Generate Audio.
- VRAM spiked to around 10 GB.
- Within seconds, the model generated a fluid conversation.