Speech to Text Configuration
This page explains how to configure MTE, ETE and STE for using the speech to text feature.
Enabling speech to text
Speech to text functionality is disabled by default. Administrators can, however, disable and enable speech to text globally, per workstation, or per user. As usual, a global setting is superseded by a per-workstation setting, which, in turn, is superseded by a per-user setting.
The setting UseSpeechToText
, located at MultiTrack\Settings
, EasyTrack\Settings
, or SingleTrack\Settings
, enables or disabled speech to text. A value of 1 enables the feature, and a value of 0 disables it.
No additional settings are required to view speech to text data, to work with it, and to save it.
The following sections explain the required preliminaries to access a speech to text service in the WWW (or in your local network).
Using Analysis Services
The speech to text analysis is not performed by the audio editor. This task is delegated to specialized software which typically runs on a web server. Several commercial speech to text services exist in the WWW, and some can be installed locally on a web server of your company's. Which solution works best for you depends on availability, company policies, and other circumstances.
DAVID System can supply specialized client software for the service of your choice. This client software consists of one dll
file, and, possibly, several additional files. Please consult DAVID support for details and installation instructions.
To prepare a service for use for the audio editor, the following configuration steps must be made. It makes sense to enter the configuration data in the global settings so that they are available to all users who need them.
- Open or create the
Common
Digas registry key. (This key must exist at top level; it is not a sub key ofMultiTrack
orMultiTrack\Settings
.) - In
Common
, open or create theSpeechToText
subkey. - In
Common\SpeechToText
, create one key for each service that you want to make available. The key can have a rather technical name (users will not have to see it). - In the service's key, enter configuration data for the service. This is explained below.
Service Configuration
As mentioned above, you must create a separate configuration set for each service, using the service's key under Common\SpeechToText
. You should make the following entries:
Name | Content | Example | Notes |
---|---|---|---|
DLL | Name of the dll file that accesses the service | SpeechmaticsInterface.dll | |
URI | Web address of the service | Provided by the supplier of the service | |
User | User name or account under which the service is accessed | Testuser | Provided by the supplier of the service |
Password | Password for the user name or account | Provided by the supplier of the service, or defined by the user | |
Language | Assumed language of spoken text | de | List of valid language codes must be obtained from service provider |
XmlConfig | Additional configuration values | <retain_temp_file/><use_model /> | Depends on dll file; please consult installation instructions for valid entries. This entry must either be empty, or contain a list of XML data. For SpeechmaticsInterface.dll use XmlConfig=<retain_temp_file/><use_model /> For Speechmatics version 2 interface, specify <version>2<version> |
The password is encrypted before it is stored. To enter the password using ADMIN:
- Create the
Password
entry without a value - Right-click the
Password
entry - From the popup menu, click "Change crypted value"
- In the dialog that opens, enter the clear-text password, then select "Broadcast Utility Server" under "Crypting method", and click "OK".
Speech to text services usually want to be told the language they are supposed to transcribe. To make the same service available for different languages, create one service key in Common\SpeechToText
for each language.
Speechmatics Version 2 Service
Starting with version 1.0.17.0, SpeechmaticsInterface.dll
was extended to work with the version 2 of the Speechmatics service. The following configuration changesare required:
- In order to access the new interface version,
<version>2</version>
must be specified in theXmlConfig
parameter. - The
User
parameter is no longer used. - The
Password
parameter must contain the access token supplied by the service provider. - The
URI
parameter must point to the v2 service endpoint as instructed by the service provider.
Making Services Available
To enable users to interactively call speech to text services, the audio editor must be configured. This can be done globally to enable all users, on a per-workstation basis to enable local setting at the specific workstation, or on a per-user bases. Open the MultiTrack\Settings
, EasyTrack\Settings
, or SingleTrack\Settings
key of the Digas registry (depending on which audio editor program you want to configure) and edit or create the parameter SpeechToTextServices
. In this parameter, enter the name of the service that you have configured in the previous step, i.e., the name of the service-specific subkey of Common\SpeechToText
.
- If this name is rather technical, and you want users to see a name that is easier to remember, enter the user-friendly name, followed by an equals sign, followed by the name of the subkey. For example,
SpeechToTextServices=Local (English)=srv3260.local.en-us
; the service configuration, in this case, would be found inCommon\SpeechToText\srv3260.local.en-us
, but the user would see the name "Local (English)". - To give the user a menu of services, enter several service names into this parameter, separated by vertical bars (
|
). - As usual, a per-workstation setting overwrites a global setting, and a per-user setting overwrites both global and per-workstation settings.
Standalone Configuration
If you are using the audio editor in a standalone configuration, i.e. without DBM and DPE, then the configuration values mentioned above must be entered into the program's INI file. To locate this file, open Windows Explorer, click in the address field, type %PROGRAMDATA%\DIGASYSTEM
, and press the ENTER key. Open the MULTITRACK
, EASYTRACK
, or SINGLETRACK
folder (depending on the editor you are using), and look for a file named MULTITRACK.INI
, EASYTRACK.INI
, or SINGLETRACK.INI
. Open this file in your favorite plain-text editor (by default, this is Window's notepade.exe
) and immediately try to save it. If that works, fine; otherwise, give yourself full access to the respective directory .
If you use a word processor like Microsoft Word to edit the INI file, be sure to save the file in its original text-only format; otherwise, it becomes unusable.
Entries in the INI file are grouped into sections which correspond to the keys in the Digas registry. Sections start with a line that contains the section name, enclosed in square brackets. Within a section each entry has the form <name>=<value>
. It is an error to have more than one section with the same name, or to have more than one entry within a section with the same name.
You must create or edit the following entries:
Section | Entry | Meaning | |
---|---|---|---|
[Settings] | UseSpeechToText | Set to 1 to enable or to 0 to disable the speech to text feature | See "Enabling speech to text" on this page |
[\Common\SpeechToText\<service>] | DLL | This section configures the speech to text service "<service>". Be sure to type all three backslashes in the section name! | See "Service Configuration" on this page |
URI | |||
User | |||
Password | |||
Language | |||
XmlConfig | |||
[Settings] | SpeechToTextServices | Define which services are made available to the user | See "Making Services Available" on this page |
The password must be entered in an encrypted way. Use the Digas Encryption Tool to encrypt the password. Start the tool, select encryption type "Broadcast Utility Server", and type the password into the input field. Then click on "Encrypt" and on "Copy" to copy the encrypted password into the Windows clipboard. Now you can paste the password into the text editor.