The AUDIO_ConvertFile() functions inside Audio32.dll convert an audio file to an output file using a specified format.

This format is also used for Audio Analysis and Audio Convertion tasks within Workflows of the DPE Workflow System.

Section String

A section string contains [SECTIONS] and the VALUE for this section, e.g.


Section Name




Container format of the output file.
RAW: raw audio file (no metadata)
MUS: DigaSystem ‘MusiFile’
WAV: RIFF/WAVE file (can also be RF64, see note 2 below) BWF: RIFF/WAVE file with BWF metadata

DALET: File in proprietary format used by old DALET system

MPEG4: MPEG-4 file (only valid with audio format AAC)


Several combined container/essence formats must be specified as RAW files with the appropriate FORMAT value. This includes e.g. FLAC, OGG- VORBIS, REALAUDIO, REALG2, REAL9 and WMA.

If WAV or BWF is specified as target format, the output file will automatically written as an RF64 file, if the filesize is larger than 4 GB.


Format of the audio essence data within the output file.

LINEAR: Uncompressed PCM

LINEAR(FP): Uncompressed PCM (floating-point format) MPEG-LAYER2: MPEG Layer 2
WMA: Windows Media Audio

REALAUDIO: RealAudio Version 5 audio REALG2: RealMedia G2
REAL9: RealMedia 9 (“Helix”)

OGG: Ogg-Vorbis
AAC: Advanced Audio Coding


Audio sampling rate in samples/second.


Only for some MPEG II formats: If set to 1, the actual sample rate is half the value given by SAMPLERATE.


Bit rate in kbit/second.


Only for PCM and WMA data: Width of one sample in bits.

For integer PCM (format LINEAR), it can be any value from 8 to 32. For floating-point PCM (format LINEAR(FP)), it can be 32 or 64. WMA supports 16 and 24 bit data.


Information about the number of channels.

n-CHANNEL: (n = 3, 4, ...) Multichannel file with more than two channels. JOINT: “Joint Stereo” MPEG
DUAL: “Dual channel” MPEG


The audio signal will be amplified by this value (in dB). Default is 0, meaning no amplification.


Method used for sampling rate conversion. The following options are supported:

0: A very fast, but not very exact algorithm
2: high-precision algorithm (obsolete)
3: SoX resampling library (default)

If SRCMODE=3 is selected, the quality of the SoX resampler can be influenced by several SOXR_... parameters, q.v. below.


0 (default) or 1

If 1, a frame filter is used when decoding MPEG Layer 2 data. This means that “junk” data within the audio essence is skipped without generating an error.


0 (default) or 1
If 1, certain kinds of input file errors (e.g. corrupted header or trailer data) are ignored.

For details, see section 3.5.5 Error handling during conversion in the full Audio32.dll Technical Manual (available upon request).


0 (default) or 1

Normally, conversion between two MPEG Layer 2 formats use a direct transcoding of the audio data without generating any intermediate uncompressed audio. This is very fast, but the “audio level” metadata generated for some file formats (e.g. “LEVL”-Chunk in BWF files) may be somewhat inexact.

If this option is set to 1, generation of intermediate PCM data can be enforced. Conversion will be slower, but will create absolutely exact level data.


0 (default) or 1

If WAV or BWF is specified as target file format, the converter reserves a bit of space in the file header, so that the file can be changed to RF64 format as soon as the size exceeds 4 GB. If the size remains below this threshold, the file will remain a 100% conformant WAV/BWF file. However, some malformed 3rd party software may not like the reserved header space (formatted as “JUNK” chunks).

If this option is set to 1 (default is 0), this space is not written. The drawback is that the conversion fails, if the target file size reaches 4 GB.


0 (default) or 1
If this option is 1, any input file, for which no format information can be found by other means, is scanned for MP3 frames. By default, this is only done for files with extension “.mp3”.


0 (default) or 1

By default, an input file, for which no format information can be found by other means, is passed to the operating system’s DirectX filter to find out if it can be decoded. Because buggy DirectX filters may crash for some files, this DirectX test can be disabled by setting this option to 1.


Method used to decode MPEG Layer 3 input.

0: use proprietary internal decoder (has issues with a small subset of MP3 data)
1: use DirectX decoder
2: use DirectX decoder, if available; otherwise, use internal decoder (default)


0 (default) or 1

Normally, the AUDIO32.DLL doesn’t allow multiple parallel MP3 encodings for one process. Former tests have revealed that the LAME encoder DLL is not thread-safe on multiprocessor machines. However, this might have been fixed in newer LAME versions. By setting the option to 1, the DLL can run several MP3 encodings in parallel.


When converting a multi-channel file to stereo, the two channels which are to be used as source data, can be specified here as two comma- separated integers.

Channel numbers are zero-based, and indicate the storage position of the channels in the source file. E.g. 0,1 specifies the first two channels.

By default, the first two source channels are used, except when the source file is an RF64 file with two “stereo downmix” channels (e.g. “5.1+stereo”). In this case, the stereo downmix is used as source data.

For BWF output only


Title of the clip (default: empty); will be written to the appropriate field in the ‘bext’ chunk of the BWF file.


Author of the clip (default: empty); will be written to the appropriate field in the ‘bext’ chunk of the BWF file.

For SoX sampling rate conversion only


Main quality parameter. Possible values are integer numbers in the range 0...7. Most values correspond to a specific setting for SoX’s “rate” effect.

0 = quick (SoX -q” setting)
1 = low (SoX -lsetting)
2 = medium (SoX -msetting)
3 = medium; 16-bit precision
4 = high (SoX -hsetting); 20-bit precision
5 = high: 24-bit precision
6 = very high (SoX -vsetting); 28-bit precision (default)
7 = highest possible; 32-bit precision

The default setting of 6 is generally good enough even for very high quality 24-bit PCM encodings.


Phase response of the SoX resampler, as an integer value in the range 0...100. Some values correspond to a specific setting of SoX’s “rate” effect:

0 = minimum (SoX “-M” setting)
25 = intermediate (SoX “-I” setting)
50 = linear (SoX “-L” setting) (default)


0 = standard bandwidth (95%) filter (default)
1 = high-bandwidth (99%) filter (SoX “-s” setting)

For RealAudio V5 output only


Bit rate in bit/s, or 0


If BITRATE is 0 (or not given at all), CODEC must contain the number of the encoding codec (see AUDIO_QueryRealAudioCodecs() in Audio32.dll Technical Manual)

For RealAudio G2 output only


0 (default), 1, 2 or 3 to define the type of source audio:

0 = Voice
1 = Voice with music
2 = Instrumental music
3 = Instrumental music in stereo


Decimal integer value, which is interpreted as a bit field to define the target data rates:

Bit 0 (0x0001): 28K modem
Bit 1 (0x0002): 56K modem
Bit 3 (0x0008): Dual ISDN (128K)
Bit 5 (0x0020): LAN
Bit 6 (0x0040): 256K DSL
Bit 7 (0x0080): 384K DSL
Bit 8 (0x0100): 512K DSL

(Options for bit 2 and 4 are obsolete)

For RealAudio 9 output only


Either VOICE or MUSIC to define the type of source audio


Decimal integer value, which is interpreted as a bit field to define the target “audiences”. Setting Bit #n to 1 will create a stream for target audience #n. The properties of each “audience” can be configured with a commercial RealMedia toolkit.

For any RealAudio format (V5, G2 or 9)


Title of the clip (default: empty)


Author of the clip (default: empty)
Copyright information for the clip (default: empty)


0 (default) or 1

Only if this option is 1, the resulting RealAudio file can be recorded with RealPlayer Plus.


0 (default) or 1

If set to 1, the resulting RealAudio file can be stored on the local hard drive.

For MPEG Layer 3 output only


Quality setting for the LAME encoder:

0 = low
1 = medium
2 = high (default)
5 = very high


0 (default) or 1

If raw MP3 output is created, an ID3 V1 metadata tag is appended, if this option is set to 1.


0 (default), 3 or 4

If raw MP3 output is created, an ID3 V2 metadata tag can be written at the beginning of the file.

0 = no ID3 V2 tag
3 = ID3 V2.3 tag
4 = ID3 V2.4 tag

Metadata for the tag can be passed in the metadata parameter (see below). Several DigaSystem standard fields (e.g. TITLE, COMPOSER, etc.) are filled into the appropriate ID3 V2 tags. Arbitrary ID3 V2 text tags (code “Txxx”) can be passed in field ID3V2/Txxx.


Character encoding used for ID3 V2 tag:

0 = ISO 8859-1 (“ANSI”) (default)
1 = UTF-16
2 = UTF-8

For AAC output only


High-frequency cut-off:

0 = use default for given bitrate and sampling rate (default)
1 = no cut-off


Variable bitrate mode quality:

0 = no VBR coding (constant bitrate) (default)
1 = low quality vbr mode 1
2 = low quality vbr mode 2
3 = low quality vbr mode 3
4 = medium quality vbr mode 1
5 = medium quality vbr mode 2
6 = medium quality vbr mode 3
7 = high quality vbr mode 1
8 = high quality vbr mode 2
9 = high quality vbr mode 3


High Efficiency AAC encoding:

0 = HE not used (default)
1 = HE v1 used and implicitly signaled in the bitstream
2 = HE v2 used and implicitly signaled in the bitstream

Time/pitch scaling options


A floating point number specifying the time-stretch factor.

Default is 1.0, meaning no time stretching.


A floating point number specifying the pitch-scaling factor (2.0 = one octave).

Default is 1.0, meaning no pitch scaling.


Quality control for the time/pitch scaling algorithm:

200: fastest/preview mode
201: still fast but better
202: Mpex for single instruments/voice (fast)
203: Mpex for single instruments/voice (best) (default)
204: general purpose polyphonic instruments (fast)
205: general purpose polyphonic instruments (good)
206: general purpose polyphonic instruments (best)

Options to convert only a part of the file


Offset (in milliseconds) of the start of the cut (default: 0 = start of audio data)


Offset (in milliseconds) of the end of the cut (default: 0 = end of audio data)

NOTE: If non-PCM source and/or destination data is involved in the cut, 25ms are automatically added to ENDTIME for each non-PCM format (i.e., 50ms max.) to avoid sound loss due to codec filter delays.


Immediately after the beginning and before the end of the cut, a linear fade is applied to avoid “sharp” changes in audio level. This parameter defines the length of this fade in milliseconds.

Default is 10.

Note: The fade is not applied at the beginning (or end) of the cut, if the start offset (or end offset) is 0. It is assumed that the start and end of the audio material are already properly faded, if necessary.

See also remarks on cuts from MPEG data below.


0 (default) or 1

If set to 1, the audio cut is pre-selected at file level, i.e. the converter calculates, which part of the file it needs to read and convert to generate the selected cut. This will be significantly faster, if a small part from very large source file is to be extracted. But there are also drawbacks:

  • It works only with linear and MPEG Layer II input data
  • Without a “raw cut”, some metadata of the source file are preserved in the cut. This covers the metadata of BWF chunks, and the marker points of all WAVE files (of course only those markers are transferred, which are inside the cut).

See also remarks on cuts from MPEG data below.

Extracting channels as mono files


If this is set to 1, the call to AUDIO_ConvertFile() is rerouted to AUDIO_SplitFileIntoMonoFilesW().

The OUTPUTFILENAME parameter is passed as the OutputFilenameTemplate to AUDIO_SplitFileIntoMonoFilesW.

See section 3.5.3 RenderProject of the Audio32.dll Technical Manual for further information.


See parameter firstOutputFileNumber of function AUDIO_SplitFileIntoMonoFilesW().



If set to 1, the DLL will write some basic information about the conversion to a text file named “Audio32.DLL.log” in the same directory as the DLL itself.

NOTE: This is a preliminary implementation only. It is intended for special testing and debugging requirements, but not for general use.

General Notes

The sections SAMPLERATE, BITRATE, RESOLUTION and MODE can be omitted, if the respective value(s) should be taken from the input file.

Metadata: This string can contain a SectionString with arbitrary metadata fields. For some output formats (e.g. MP3 with ID3V2 tag), these metadata are written to the target file. For the names of the data fields (e.g. TITLE, etc.), the standard DigaSystem field names should be used.

Audio cuts in MPEG data

When cutting audio (by setting STARTTIME and/or ENDTIME parameters) from non-linear source data, the transcoding normally works like this: Decode the source data to linear, extract the range from STARTTIME to ENDTIME from the linear data (and apply fading, if FADETIME is non-zero), and encode the result to the output format.

When working with MPEG Layer 2 data, some special handling is applied. If the following conditions are all satisfied, the decoding/encoding steps are skipped:

  1. Source AND destination format is MPEG Layer 2, with the same sampling rate, and the same number of audio channels. Such a conversion is handled by a single MPEG-Transcoding step. Important: If the destination bitrate is higher than or equal to the source bitrate, no actual transcoding is done!
  2. RAWCUT is “1”. “1” is the default value of the parameter, if condition a) is true. Otherwise, the default for RAWCUT is “0”.
  3. ENFORCE_INTERMEDIATE_PCM is “0”. This is the default value of the parameter.
  4. FADETIME is 0. 0 is the default value, if conditions a)-c) are all true. Otherwise, the default for FADETIME is 10.

Summary: If condition 1 is true, and the parameters RAWCUT, ENFORCE_INTERMEDIATE_PCM and FADETIME are not specified (i.e., the defaults apply), then the audio cut will be created without an intermediate MPEG decoding/encoding step.

If the MPEG decoding/encoding is skipped, the automatic extension of the cut length by 50ms (see note in description of ENDTIME parameter) is also omitted.