🎤 AX TTS API Documentation

Turkish Text-to-Speech Service - REST & WebSocket API

Overview

AX TTS is a high-performance Turkish Text-to-Speech service with both REST and WebSocket streaming APIs. Built on Piper TTS engine with Turkish language support.

Features

Quick Start

1. REST API - Simple Synthesis

curl -X POST 'https://api.axtts.pixagor.net/api/v1/synthesize' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Merhaba dünya",
    "format": "wav"
  }' \
  -o output.wav

2. Health Check

curl https://api.axtts.pixagor.net/health
✅ Service is operational! Base URL: https://api.axtts.pixagor.net/

REST API

POST /api/v1/synthesize

Synthesize text to speech and return complete audio file.

Request Parameters

Parameter Type Required Description
text string required Text to synthesize (max 5000 characters)
voice string optional Voice model (default: "default")
speed float optional Speech speed (0.5-2.0, default: 1.0)
format string optional Output format: "wav" or "pcm16" (default: "wav")

Response

Content-Type: audio/wav or audio/l16

Binary audio data

Example with Options

curl -X POST 'https://api.axtts.pixagor.net/api/v1/synthesize' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Bugün 24.10.2025 tarihinde ₺150,50 ödeme yaptım",
    "speed": 1.1,
    "format": "wav"
  }' \
  -o turkish.wav

WebSocket Streaming

WebSocket wss://api.axtts.pixagor.net/ws/stream

Real-time audio chunk streaming with low latency.

Connection

const ws = new WebSocket('wss://api.axtts.pixagor.net/ws/stream');

ws.onopen = () => {
  ws.send(JSON.stringify({
    text: "Merhaba dünya",
    format: "pcm16",
    speed: 1.0
  }));
};

ws.onmessage = (event) => {
  const chunk = JSON.parse(event.data);

  if (chunk.type === 'audio_chunk') {
    // Process base64 encoded PCM audio
    const audioData = atob(chunk.payload);
    // chunk.chunk_id for ordering
  } else if (chunk.type === 'done') {
    // Synthesis complete
    console.log(`Duration: ${chunk.duration_ms}ms`);
    console.log(`Chunks: ${chunk.chunks}`);
  } else if (chunk.type === 'error') {
    console.error(chunk.message);
  }
};

Message Types

Type Direction Description
request Client → Server Initial synthesis request with text and options
audio_chunk Server → Client Audio data chunk (base64 encoded PCM16)
done Server → Client Synthesis completed with stats
error Server → Client Error message
💡 Performance: WebSocket streaming provides < 300ms time to first audio (TTFA), ideal for real-time applications.

Text Normalization

The service automatically normalizes Turkish text for natural speech:

Input Normalized Output
1250 bin iki yüz elli
₺150,75 yüz elli lira yetmiş beş kuruş
24.10.2025 yirmi dört Ekim iki bin yirmi beş
14:30 saat on dört otuz
%20 yüzde yirmi
Dr. Doktor
vb. ve benzeri

Code Examples

Python

import requests

response = requests.post(
    'https://api.axtts.pixagor.net/api/v1/synthesize',
    json={
        'text': 'Merhaba dünya',
        'format': 'wav'
    }
)

with open('output.wav', 'wb') as f:
    f.write(response.content)

JavaScript (Fetch)

fetch('https://api.axtts.pixagor.net/api/v1/synthesize', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: 'Merhaba dünya',
    format: 'wav'
  })
})
.then(res => res.blob())
.then(blob => {
  const url = URL.createObjectURL(blob);
  const audio = new Audio(url);
  audio.play();
});

cURL with Turkish Text

curl -X POST 'https://api.axtts.pixagor.net/api/v1/synthesize' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Toplam 1250 kişi, 24.10.2025 tarihinde ₺150,50 ödeme yaptı.",
    "speed": 1.0,
    "format": "wav"
  }' \
  -o payment.wav

Interactive API Documentation

Explore and test the API interactively:

Rate Limits & Performance

Rate Limits

Performance Metrics

📊 Monitoring: View real-time metrics at grafana.axtts.pixagor.net

Support & Links