What is Dia AI by Nari Labs?

DIA-1.6B is a text-to-speech model. It’s designed to create highly realistic speech from a provided transcript. You can assign different speakers to different lines, making it feel like a real conversation.

Even more interesting, it’s capable of producing non-verbal sounds like:

  • Laughter
  • Coughing
  • Clearing throat
Dia AI by Nari Labs

Overview of Dia AI

FeatureDetails
Model NameDIA-1.6B
DeveloperNari Labs
Size~6.5 GB
LicenseApache 2.0 (open-source)
Requirements~10GB Video RAM (VRAM)
Non-verbal GenerationLaughter, Coughing, etc.
InterfaceGradio-based UI
HostingHugging Face

Requirements to Run DIA-1.6B

Before getting started, here’s what you’ll need:

  • Around 10GB of Video RAM (VRAM).
  • Basic Python environment.
  • Familiarity with Gradio interface.
  • Optional: Apple Silicon support (based on user reports).

Key Features of DIA-1.6B

  • Realistic Dialogue

    Assign different speakers to different lines.

  • Non-verbal Sounds

    Includes laughter, coughing, and throat clearing.

  • Simple Installation

    Very beginner-friendly compared to many TTS models.

  • Open Source License

    Apache 2.0 ensures flexibility and openness.

  • Cross-Platform Support

    Some users even reported successful use on Apple Silicon devices.

Interactive Demo

Pros and Cons

Pros

  • Fast video editing
  • AI-powered workflow
  • User-friendly interface
  • Real-time preview
  • No-code platform

Cons

  • Limited customization options
  • Requires internet connection
  • Few export formats

How to Use DIA-1.6B

Setting up DIA-1.6B turned out to be surprisingly easy. Here's how you can get started.

Step 1: Install Necessary Packages

Ensure you have Python installed. Then run:

git clone https://github.com/nari-labs/DIA-TTS.git cd DIA-TTS pip install -r requirements.txt

Step 2: Launch the Gradio UI

Run the script to start the Gradio interface:

python app.py

The script will automatically download model weights (~6.5 GB) and set up the Gradio server. You’ll see a link in the terminal once ready.

Step 3: Generate Dialogue

In the Gradio UI:

  • Input your script.
  • Assign speakers as S1, S2, etc.
  • Click Generate Audio.

Local Installation Details

For local installation, you don't need to manually download the model files. Running the provided script will automatically fetch everything from Hugging Face.

Important notes during installation:

  • Expect about 7.4 GB VRAM usage once the model is running.
  • Full VRAM usage peaks around 10 GB during active generation.
  • The model file downloads are automatic and straightforward.

I noticed the model even provides a sharable Gradio link if needed, though personally, I prefer using it locally for better control.

First Testing Experience

After launching the Gradio UI and loading the model, I was ready for my first test.

Dia AI by Nari Labs

Here's what happened:

  • Entered two lines assigned to Speaker 1 and Speaker 2.
  • Clicked Generate Audio.
  • VRAM spiked to around 10 GB.
  • Within seconds, the model generated a fluid conversation.

Listen to the demo audio:

DIA-1.6B FAQs