Speech to Text
Transcribe video and audio files with ease leveraging Whisper large V3 AI model.
Speech to Text processing is billed at a rate of 1 AI invocation per 25 seconds of processing time. For instance, if your file requires 50 seconds to process, you would be charged for 2 AI invocations.
Body
The video/audio url. Not required if file_store_key
is specified.
The key used to store the video/audio file on Jigsawstack File
Storage. Not
required if url
is specified.
The language to translate the file into. If not specified, the model will automatically detect the language and transcribe accordingly.
Translates the file into the given language.
Identifies and separates different speakers in the audio file.
Webhook URL to send result to.
The batch size to return. Maximum value is 40.
url
or file_store_key
should be provided not both.Header
Your JigsawStack API key
Response
Indicates whether the call was successful.
Was this page helpful?