Speech to Text (Transcription) Configuration

This page explains how to configure MTE, ETE and STE for using the speech to text feature.

Enabling speech to text

Speech to text functionality is disabled by default. Administrators can, however, disable and enable speech to text globally, per workstation, or per user. As usual, a global setting is superseded by a per-workstation setting, which, in turn, is superseded by a per-user setting.

The setting UseSpeechToText, located at MultiTrack\Settings, EasyTrack\Settings, or SingleTrack\Settings, enables or disabled speech to text. A value of 1 enables the feature, and a value of 0 disables it.

No additional settings are required to view speech to text data, to work with it, and to save it.

The following sections explain the required preliminaries to access a speech to text service in the WWW (or in your local network).

Using Transcription Services

The speech to text analysis is not performed by the audio editor. This task is delegated to specialized software which typically runs on a web server. Several commercial transcription services exist in the WWW, and some can be installed locally on a web server of your company. Which solution works best for you depends on availability, company policies, and other circumstances.

DAVID System can supply specialized client software for the service of your choice. This client software consists of one dll file, and, possibly, several additional files. Please consult DAVID support for details and installation instructions.

To prepare a service for use for the audio editor, the following configuration steps must be made. It makes sense to enter the configuration data in the global settings so that they are available to all users who need them.

Open or create the Common Digas registry key. (This key must exist at top level; it is not a sub key of MultiTrack or MultiTrack\Settings.)
In Common, open or create the SpeechToText subkey.
In Common\SpeechToText, create one key for each service that you want to make available. The key can have a rather technical name (users will not have to see it).
In the service's key, enter configuration data for the service. This is explained below.

Service Configuration

As mentioned above, you must create a separate configuration set for each service, using the service's key under Common\SpeechToText. You should make the following entries:

Name	Content	Example	Notes
DLL (obsolete) DLL64	Name of the dll file that accesses the service	`SpeechmaticsInterface.dll`	The DLL entry exists for the legacy 32-bit editor. Use DLL64 to specify a 64-bit enabled DLL.
URI	Web address of the service		Provided by the supplier of the service
User	User name or account under which the service is accessed	`Testuser`	Provided by the supplier of the service
Password	Password for the user name or account		Provided by the supplier of the service, or defined by the user
Language	Assumed language of spoken text	`de`	List of valid language codes must be obtained from service provider
XmlConfig	Additional configuration values	`<retain_temp_file/><use_model />`	Depends on dll file; please consult installation instructions for valid entries. This entry must either be empty, or contain a list of XML data. For SpeechmaticsInterface.dll use XmlConfig=<retain_temp_file/><use_model /> For Speechmatics version 2 interface, specify <version>2<version>

Only the DLL/DLL64 entry must be present. Whether one or more of the other entries are required depends on the transcription service and is typically evaluated by the dll.

The password is encrypted before it is stored. To enter the password using ADMIN:

Create the Password entry without a value
Right-click the Password entry
From the popup menu, click "Change crypted value"
In the dialog that opens, enter the clear-text password, then select "Broadcast Utility Server" under "Crypting method", and click "OK".

Speech to text services usually want to be told the language they are supposed to transcribe. To make the same service available for different languages, create one service key in Common\SpeechToText for each language.

Speechmatics Version 2 Service

Starting with version 1.0.17.0, SpeechmaticsInterface.dll was extended to work with the version 2 of the Speechmatics service. The following configuration changesare required:

In order to access the new interface version, <version>2</version> must be specified in the XmlConfig parameter.
The User parameter is no longer used.
The Password parameter must contain the access token supplied by the service provider.
The URI parameter must point to the v2 service endpoint as instructed by the service provider.

Making Services Available

To enable users to interactively call speech to text services, the audio editor must be configured. This can be done globally to enable all users, on a per-workstation basis to enable local setting at the specific workstation, or on a per-user bases.

Two types of configuration are available: Simple configuration through the SpeechToTextServices parameter, or (since MTE 8.2.1843.0) complex configuration through several entries in the TranscriptionMenu configuration folder. If both configurations are present, only the complex one is used.

Simple Configuration

The simple configuration completely relies on the data stored in the Common|SpeechToText folder. The parameter SpeechToTextServices in MultiTrack|Settings (or SingleTrack|Settings, EasyTrack|Settings) contains a list of services from which the interactive user can select. Each entry in this list is the name of a folder in Common|SpeechToText. If this name is rather technical, you can specify a user-friendly name, followed by an equals sign (=) and the technical name. Entries are separated by vertical bars (|).

Example: Local (English)=srv3260.local.en-us|RemoteService

The user can select “Local (English)” or “RemoteService”, but the actual configuration folders are Common|SpeechToText|srv3260.local.en-us and Common|SpeechToText|RemoteService.

Complex Configuration

The complex configuration is what you need when your transcription services uses several options and you want users to present with all possible combinations of options but not confuse them with a long menu to pick from. Complex Configuration allows to build a menu with several levels. This is described in the page Transcription Menu – Complex Menu Configuration.

Standalone Configuration

If you are using the audio editor in a standalone configuration, i.e. without DBM and DPE, then the configuration values mentioned above must be entered into the program's INI file. To locate this file, open Windows Explorer, click in the address field, type %PROGRAMDATA%\DIGASYSTEM, and press the ENTER key. Open the MULTITRACK, EASYTRACK, or SINGLETRACK folder (depending on the editor you are using), and look for a file named MULTITRACK.INI, EASYTRACK.INI, or SINGLETRACK.INI. Open this file in your favorite plain-text editor (by default, this is Window's notepade.exe) and immediately try to save it. If that works, fine; otherwise, give yourself full access to the respective directory .

If you use a word processor like Microsoft Word to edit the INI file, be sure to save the file in its original text-only format; otherwise, it becomes unusable.

Entries in the INI file are grouped into sections which correspond to the keys in the Digas registry. Sections start with a line that contains the section name, enclosed in square brackets. Within a section each entry has the form <name>=<value>. It is an error to have more than one section with the same name, or to have more than one entry within a section with the same name.

You must create or edit the following entries:

Section	Entry	Meaning
`[Settings]`	`UseSpeechToText`	Set to 1 to enable or to 0 to disable the speech to text feature	See "Enabling speech to text" on this page
`[\Common\SpeechToText\<service>]`	`DLL`	This section configures the speech to text service "<service>". Be sure to type all three backslashes in the section name!	See "Service Configuration" on this page
	`URI`
	`User`
	`Password`
	`Language`
	`XmlConfig`
`[Settings]`	`SpeechToTextServices`	Define which services are made available to the user	See "Making Services Available" on this page

The password must be entered in an encrypted way. Use the Digas Encryption Tool to encrypt the password. Start the tool, select encryption type "Broadcast Utility Server", and type the password into the input field. Then click on "Encrypt" and on "Copy" to copy the encrypted password into the Windows clipboard. Now you can paste the password into the text editor.