Speech to Text Function Overview

This page contains a brief overview of what you can do with the audio editor's speech to text functions.

Analyzing Spoken Text

A number of speech to text services exist around the WWW. Usually, they will want to be paid for their work. You or your company must set up a contract with the respective service (or install the service on a local HTTP server) and configure the service for your use.

Load an audio file with spoken text into the clipboard, then right-click the file and click on "Speech to text analysis". If this menu item pops up, click on the desired service name in the sub menu to launch the analysis.

An icon is displayed in the clipboard item which changes while analyzing from

to

when analysis is finished, or to

if analysis runs into an error. If you hover with the mouse over the icon a tool tip appears showing the actual state.

While analyzing

An error occurred

With analyzed speech to text

When analysis is finished successfully, the resulting text is attached to the audio. When you load the audio into the SingleTrack timeline, you can see the text in the speech to text pane. To see the speech to text pane, use the "View" menu.

Working with Speech to Text Data

This is the SingleTrack screen with the speech to text pane below the timeline. You can move this pane around to any position you like.

The highlighted text in the speech to text pane (blue background) corresponds to the area between mark in and mark out on the timeline.

You can use editing functions (cut inside / outside, cut inside / outside and move). The text will be edited along. Of course the text edit works only at word level; if you remove a part of a word from the timeline, the outcome in text is probably not what you expect.

When you start playback, a red "highlight" corresponding to the sound head position moves through the text:

On the other hand, select a word in the text to set the sound head to this word.

To find text in the text area, type the search string into the "Search" field, and press the ENTER key or click the down-arrow button. All matches are marked in yellow, and you can cycle through the matches by repeatedly pressing the ENTER key, or clicking the down arrow. SHIFT+ENTER, or the up arrow, cycle the matches in reverse direction.

To get rid of the search result and yellow markers, click the X button next to the "Search" field.

Saving Speech to Text Data

When the audio file is saved to the file system, to DBM, or to DPE, the text file is saved with the audio. In the file system, the text file has the same name as the audio file, but its extension is .s2t. In DBM or DPE, the storage location is implicitly determined.

When an audio file with text information is included in a project, the text information is implicitly embedded in the project.

Loading Speech to Text Data

When speech to text information is available for an audio file that is loaded from DBM or DPE, this information is implicitly loaded. When an audio file is loaded from the file system, and a .s2t file exists with the same file name, then the audio editor attempts to read speech to text information from this file.

When a project is loaded that contains speech to text information, then this information is implicitly loaded.