Speech to Text

POST

transcribe

Speech to Text processing is billed at a rate of 1 AI invocation per 25 seconds of processing time. For instance, if your file requires 50 seconds to process, you would be charged for 2 AI invocations.

Body

url

string

The video/audio url. Not required if file_store_key is specified.

file_store_key

string

The key used to store the video/audio file on Jigsawstack File Storage. Not required if url is specified.

language

string

The language to translate the file into. If not specified, the model will automatically detect the language and transcribe accordingly.

translate

boolean

default: "false"

Translates the file into the given language.

by_speaker

boolean

default: "false"

Identifies and separates different speakers in the audio file.

webhook_url

string

Webhook URL to send result to.

batch_size

number

The batch size to return. Maximum value is 40.

Either url or file_store_key should be provided not both.

x-api-key

string

required

Your JigsawStack API key

Response

success

boolean

Indicates whether the call was successful.

Was this page helpful?

Object Detention Text to Speech

API Documentation

Web

Prompt Engine

Vision

Audio

Geo

Store

Validate

Body

Header

Response

API Documentation

Web

Prompt Engine

Vision

Audio

Geo

Store

Validate

​Body

​Header

​Response

Body

Header

Response