🚀 Vaja Early Adopter API

Welcome to the Vaja Early Adopter Program!

Access GPU-accelerated Text-to-Speech services through our production-ready API. All services run on dedicated GPU infrastructure for optimal performance.

🔑 Authentication

All API requests require authentication using your Vaja API key in the Authorization header:

Authorization: Bearer YOUR_VAJA_API_KEY

Getting Your API Key:
Contact your Vaja representative to receive your unique API key for early adopter access.

🎙️ Text-to-Speech Services

We provide 4 different TTS engines optimized for different use cases:

1. Kokoro-82M TTS NEW

Lightweight & Fast - 82 Million Parameters

Available Voices (11 total)

GET https://early.vaja.ai/api/tts/kokoro/voices

Voice Options:

Female: af_bella (default), af_heart, af_nicole, af_sarah, af_sky, bf_emma, bf_isabella
Male: am_adam, am_michael, bm_george, bm_lewis

Synthesize Speech

POST https://early.vaja.ai/api/tts/kokoro

Example Request:

curl -X POST https://early.vaja.ai/api/tts/kokoro \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to Vaja AI services.",
    "voice": "af_bella"
  }' \
  --output output.wav

Request Parameters:

text (required): The text to synthesize
voice (optional): Voice ID (default: "af_bella")

Very fast inference (~0.5-1s)
Excellent audio quality
11 unique voices (male & female)
Optimized for English

2. StyleTTS2 RECOMMENDED

SSML-Enabled, High-Performance TTS

Synthesize Speech

POST https://early.vaja.ai/api/tts/styletts2

Example Request:

curl -X POST https://early.vaja.ai/api/tts/styletts2 \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from StyleTTS2!",
    "voice": "af_sky"
  }' \
  --output output.wav

SSML support for advanced control
6x faster than traditional TTS
Highest quality output
24kHz sample rate
Best for production applications

3. Coqui XTTS v2

Multi-lingual Text-to-Speech

Synthesize Speech

POST https://early.vaja.ai/api/tts/coqui

Example Request:

curl -X POST https://early.vaja.ai/api/tts/coqui \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from Coqui TTS!",
    "language": "en"
  }' \
  --output output.wav

Request Parameters:

text (required): The text to synthesize
language (required): Language code (en, es, fr, de, etc.)

Supported Languages:

English (en)
Spanish (es)
French (fr)
German (de)
Portuguese (pt)
Polish (pl)
Turkish (tr)
Russian (ru)
Dutch (nl)
Czech (cs)
Arabic (ar)
Chinese (zh)
Japanese (ja)
Hungarian (hu)
Korean (ko)

Voice cloning capability
15+ languages supported
Great audio quality

4. LiveTranslate TTS

Real-time Translation with TTS

Endpoint Structure

POST https://early.vaja.ai/api/tts/livetranslate/*

Contact your Vaja representative for specific LiveTranslate endpoint documentation and usage examples.

Combined translation and TTS
Multi-lingual support
Real-time processing

📊 Service Comparison

Service	Speed	Quality	Languages	Best For
Kokoro-82M	⚡⚡⚡ Very Fast	⭐⭐⭐⭐ Excellent	English	Fast English TTS with multiple voices
StyleTTS2	⚡⚡⚡ 6x Faster	⭐⭐⭐⭐⭐ Best	English	Production apps needing SSML
Coqui XTTS	⚡⚡ Moderate	⭐⭐⭐⭐ Great	15+ Languages	Multi-language support
LiveTranslate	⚡⚡ Moderate	⭐⭐⭐ Good	Multi-lingual	Translation + TTS combined

🔧 Technical Specifications

Audio Output Format

Format: WAV (RIFF)
Sample Rate: 24kHz
Channels: Mono/Stereo (varies by service)
Bit Depth: 16-bit PCM

API Specifications

Protocol: HTTPS (TLS 1.2+)
Infrastructure: GPU-accelerated dedicated hardware
CORS: Enabled for web applications
Max Request Size: 100MB
Rate Limiting: Contact your representative for limits

Response Times (Approximate)

Service	Short Text (1-2 sentences)	Medium Text (paragraph)
Kokoro-82M	~0.5-1s	~1-2s
StyleTTS2	~0.5-1.5s	~1.5-3s
Coqui XTTS	~1-2s	~2-4s

📝 Quick Start Example

Save this script as test_vaja_tts.sh to test all services:

#!/bin/bash

# Replace with your actual API key
API_KEY="YOUR_VAJA_API_KEY"

# Test Kokoro-82M (Female Voice)
echo "Testing Kokoro-82M with female voice..."
curl -X POST https://early.vaja.ai/api/tts/kokoro \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Testing Kokoro female voice","voice":"af_bella"}' \
  -o kokoro_female.wav

# Test Kokoro-82M (Male Voice)
echo "Testing Kokoro-82M with male voice..."
curl -X POST https://early.vaja.ai/api/tts/kokoro \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Testing Kokoro male voice","voice":"am_adam"}' \
  -o kokoro_male.wav

# Test StyleTTS2
echo "Testing StyleTTS2..."
curl -X POST https://early.vaja.ai/api/tts/styletts2 \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Testing StyleTTS2 synthesis","voice":"af_sky"}' \
  -o styletts2.wav

# Test Coqui (English)
echo "Testing Coqui XTTS in English..."
curl -X POST https://early.vaja.ai/api/tts/coqui \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Testing Coqui English","language":"en"}' \
  -o coqui_english.wav

# Test Coqui (Spanish)
echo "Testing Coqui XTTS in Spanish..."
curl -X POST https://early.vaja.ai/api/tts/coqui \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hola, esto es una prueba","language":"es"}' \
  -o coqui_spanish.wav

echo "✓ Done! Check the generated WAV files"

⚠️ Best Practices

Security

Never expose your API key in client-side code
Store API keys in environment variables or secure vaults
Rotate API keys regularly
Use HTTPS for all API requests

Performance

Cache generated audio when possible to reduce API calls
Choose the right service for your use case (see comparison table)
For English-only apps, use Kokoro-82M or StyleTTS2 for best performance
Keep text inputs reasonably sized (under 1000 characters recommended)

Error Handling

Always check HTTP response codes
Implement retry logic with exponential backoff
Handle network timeouts gracefully
Validate audio output before using in production

📚 Common Use Cases

1. Voice Assistant / Chatbot

Recommended: StyleTTS2 or Kokoro-82M

Fast response times and high quality make these ideal for interactive applications.

2. Multi-lingual Content

Recommended: Coqui XTTS v2

Support for 15+ languages with consistent voice quality.

3. Audiobook / Long-form Content

Recommended: StyleTTS2 with SSML

SSML support allows for natural pauses, emphasis, and prosody control.

4. Real-time Translation + Speech

Recommended: LiveTranslate TTS

Combined translation and speech synthesis in one step.

🆘 Support & Resources

Need Help?

For API keys, technical support, or questions about the Early Adopter program, contact your Vaja representative.

Useful Resources

API Endpoint: https://early.vaja.ai
Support Email: Contact your Vaja representative
Status Updates: Early adopters receive direct notifications