Speech to Text Configuration

This page explains how to configure MTE, ETE and STE for using the speech to text feature.

Enabling speech to text

Speech to text functionality is disabled by default. Administrators can, however, disable and enable speech to text globally, per workstation, or per user. As usual, a global setting is superseded by a per-workstation setting, which, in turn, is superseded by a per-user setting.

The setting UseSpeechToText, located at MultiTrack\Settings, EasyTrack\Settings, or SingleTrack\Settings, enables or disabled speech to text. A value of 1 enables the feature, and a value of 0 disables it.

No additional settings are required to view speech to text data, to work with it, and to save it.

The following sections explain the required preliminaries to access a speech to text service in the WWW (or in your local network).

Using Analysis Services

The speech to text analysis is not performed by the audio editor. This task is delegated to specialized software which typically runs on a web server. Several commercial speech to text services exist in the WWW, and some can be installed locally on a web server of your company's. Which solution works best for you depends on availability, company policies, and other circumstances.

DAVID System can supply specialized client software for the service of your choice. This client software consists of one dll file, and, possibly, several additional files. Please consult DAVID support for details and installation instructions.

To prepare a service for use for the audio editor, the following configuration steps must be made. It makes sense to enter the configuration data in the global settings so that they are available to all users who need them.

Open or create the Common Digas registry key. (This key must exist at top level; it is not a sub key of MultiTrack or MultiTrack\Settings.)
In Common, open or create the SpeechToText subkey.
In Common\SpeechToText, create one key for each service that you want to make available. The key can have a rather technical name (users will not have to see it).
In the service's key, enter configuration data for the service. This is explained below.

Service Configuration

As mentioned above, you must create a separate configuration set for each service, using the service's key under Common\SpeechToText. You should make the following entries:

Name	Content	Example	Notes
`DLL`	Name of the dll file that accesses the service	`SpeechmaticsInterface.dll`
`URI`	Web address of the service		Provided by the supplier of the service
`User`	User name or account under which the service is accessed	`Testuser`	Provided by the supplier of the service
`Password`	Password for the user name or account		Provided by the supplier of the service, or defined by the user
`Language`	Assumed language of spoken text	`de`	List of valid language codes must be obtained from service provider
`XmlConfig`	Additional configuration values	`<retain_temp_file/><use_model />`	Depends on dll file; please consult installation instructions for valid entries. This entry must either be empty, or contain a list of XML data. For SpeechmaticsInterface.dll use XmlConfig=<retain_temp_file/><use_model /> For Speechmatics version 2 interface, specify <version>2<version>

The password is encrypted before it is stored. To enter the password using ADMIN:

Create the Password entry without a value
Right-click the Password entry
From the popup menu, click "Change crypted value"
In the dialog that opens, enter the clear-text password, then select "Broadcast Utility Server" under "Crypting method", and click "OK".

Speech to text services usually want to be told the language they are supposed to transcribe. To make the same service available for different languages, create one service key in Common\SpeechToText for each language.

Speechmatics Version 2 Service

Starting with version 1.0.17.0, SpeechmaticsInterface.dll was extended to work with the version 2 of the Speechmatics service. The following configuration changesare required:

In order to access the new interface version, <version>2</version> must be specified in the XmlConfig parameter.
The User parameter is no longer used.
The Password parameter must contain the access token supplied by the service provider.
The URI parameter must point to the v2 service endpoint as instructed by the service provider.

Making Services Available

To enable users to interactively call speech to text services, the audio editor must be configured. This can be done globally to enable all users, on a per-workstation basis to enable local setting at the specific workstation, or on a per-user bases. Open the MultiTrack\Settings, EasyTrack\Settings, or SingleTrack\Settings key of the Digas registry (depending on which audio editor program you want to configure) and edit or create the parameter SpeechToTextServices. In this parameter, enter the name of the service that you have configured in the previous step, i.e., the name of the service-specific subkey of Common\SpeechToText.

If this name is rather technical, and you want users to see a name that is easier to remember, enter the user-friendly name, followed by an equals sign, followed by the name of the subkey. For example, SpeechToTextServices=Local (English)=srv3260.local.en-us; the service configuration, in this case, would be found in Common\SpeechToText\srv3260.local.en-us, but the user would see the name "Local (English)".
To give the user a menu of services, enter several service names into this parameter, separated by vertical bars (|).
As usual, a per-workstation setting overwrites a global setting, and a per-user setting overwrites both global and per-workstation settings.

Standalone Configuration

If you are using the audio editor in a standalone configuration, i.e. without DBM and DPE, then the configuration values mentioned above must be entered into the program's INI file. To locate this file, open Windows Explorer, click in the address field, type %PROGRAMDATA%\DIGASYSTEM, and press the ENTER key. Open the MULTITRACK, EASYTRACK, or SINGLETRACK folder (depending on the editor you are using), and look for a file named MULTITRACK.INI, EASYTRACK.INI, or SINGLETRACK.INI. Open this file in your favorite plain-text editor (by default, this is Window's notepade.exe) and immediately try to save it. If that works, fine; otherwise, give yourself full access to the respective directory .

If you use a word processor like Microsoft Word to edit the INI file, be sure to save the file in its original text-only format; otherwise, it becomes unusable.

Entries in the INI file are grouped into sections which correspond to the keys in the Digas registry. Sections start with a line that contains the section name, enclosed in square brackets. Within a section each entry has the form <name>=<value>. It is an error to have more than one section with the same name, or to have more than one entry within a section with the same name.

You must create or edit the following entries:

Section	Entry	Meaning
`[Settings]`	`UseSpeechToText`	Set to 1 to enable or to 0 to disable the speech to text feature	See "Enabling speech to text" on this page
`[\Common\SpeechToText\<service>]`	`DLL`	This section configures the speech to text service "<service>". Be sure to type all three backslashes in the section name!	See "Service Configuration" on this page
	`URI`
	`User`
	`Password`
	`Language`
	`XmlConfig`
`[Settings]`	`SpeechToTextServices`	Define which services are made available to the user	See "Making Services Available" on this page

The password must be entered in an encrypted way. Use the Digas Encryption Tool to encrypt the password. Start the tool, select encryption type "Broadcast Utility Server", and type the password into the input field. Then click on "Encrypt" and on "Copy" to copy the encrypted password into the Windows clipboard. Now you can paste the password into the text editor.