Audio32 Section Strings for Audio File Conversion

The AUDIO_ConvertFile() functions inside Audio32.dll convert an audio file to an output file using a specified format.

This format is also used for Audio Analysis and Audio Convertion tasks within Workflows of the DPE Workflow System.

Section String

A section string contains [SECTIONS] and the VALUE for this section, e.g.

CODE

[FILETYPE]WAV[FORMAT]LINEAR[SAMPLERATE]44100

Section Name	Values
Common
FILETYPE	Container format of the output file. RAW: raw audio file (no metadata) MUS: DigaSystem ‘MusiFile’ WAV: RIFF/WAVE file (can also be RF64, see note 2 below) BWF: RIFF/WAVE file with BWF metadata AIFF: AIFF file DALET: File in proprietary format used by old DALET system MPEG4: MPEG-4 file (only valid with audio format AAC) NOTES: Several combined container/essence formats must be specified as RAW files with the appropriate FORMAT value. This includes e.g. FLAC, OGG- VORBIS, REALAUDIO, REALG2, REAL9 and WMA. If WAV or BWF is specified as target format, the output file will automatically written as an RF64 file, if the filesize is larger than 4 GB.
FORMAT	Format of the audio essence data within the output file. LINEAR: Uncompressed PCM LINEAR(FP): Uncompressed PCM (floating-point format) MPEG-LAYER2: MPEG Layer 2 MPEG-LAYER3: MPEG Layer 3 WMA: Windows Media Audio REALAUDIO: RealAudio Version 5 audio REALG2: RealMedia G2 REAL9: RealMedia 9 (“Helix”) FLAC: FLAC OGG: Ogg-Vorbis ALAW: A-Law AAC: Advanced Audio Coding
SAMPLERATE	Audio sampling rate in samples/second.
DOWNSAMPLING	Only for some MPEG II formats: If set to 1, the actual sample rate is half the value given by SAMPLERATE.
BITRATE	Bit rate in kbit/second.
RESOLUTION	Only for PCM and WMA data: Width of one sample in bits. For integer PCM (format LINEAR), it can be any value from 8 to 32. For floating-point PCM (format LINEAR(FP)), it can be 32 or 64. WMA supports 16 and 24 bit data.
MODE	Information about the number of channels. MONO STEREO n-CHANNEL: (n = 3, 4, ...) Multichannel file with more than two channels. JOINT: “Joint Stereo” MPEG DUAL: “Dual channel” MPEG
DBGAIN	The audio signal will be amplified by this value (in dB). Default is 0, meaning no amplification.
SRCMODE	Method used for sampling rate conversion. The following options are supported: 0: A very fast, but not very exact algorithm 2: high-precision algorithm (obsolete) 3: SoX resampling library (default) If SRCMODE=3 is selected, the quality of the SoX resampler can be influenced by several SOXR_... parameters, q.v. below.
MP2FILTER	0 (default) or 1 If 1, a frame filter is used when decoding MPEG Layer 2 data. This means that “junk” data within the audio essence is skipped without generating an error.
RELAXED_ERROR_CHECK	0 (default) or 1 If 1, certain kinds of input file errors (e.g. corrupted header or trailer data) are ignored. For details, see section 3.5.5 Error handling during conversion in the full Audio32.dll Technical Manual (available upon request).
ENFORCE_INTERMEDIATE_PCM	0 (default) or 1 Normally, conversion between two MPEG Layer 2 formats use a direct transcoding of the audio data without generating any intermediate uncompressed audio. This is very fast, but the “audio level” metadata generated for some file formats (e.g. “LEVL”-Chunk in BWF files) may be somewhat inexact. If this option is set to 1, generation of intermediate PCM data can be enforced. Conversion will be slower, but will create absolutely exact level data.
NEVER_CREATE_RF64	0 (default) or 1 If WAV or BWF is specified as target file format, the converter reserves a bit of space in the file header, so that the file can be changed to RF64 format as soon as the size exceeds 4 GB. If the size remains below this threshold, the file will remain a 100% conformant WAV/BWF file. However, some malformed 3rd party software may not like the reserved header space (formatted as “JUNK” chunks). If this option is set to 1 (default is 0), this space is not written. The drawback is that the conversion fails, if the target file size reaches 4 GB.
ALWAYS_CHECK_FOR_MP3	0 (default) or 1 If this option is 1, any input file, for which no format information can be found by other means, is scanned for MP3 frames. By default, this is only done for files with extension “.mp3”.
OMIT_DIRECTX_TEST	0 (default) or 1 By default, an input file, for which no format information can be found by other means, is passed to the operating system’s DirectX filter to find out if it can be decoded. Because buggy DirectX filters may crash for some files, this DirectX test can be disabled by setting this option to 1.
MP3_DECODING_METHOD	Method used to decode MPEG Layer 3 input. 0: use proprietary internal decoder (has issues with a small subset of MP3 data) 1: use DirectX decoder 2: use DirectX decoder, if available; otherwise, use internal decoder (default)
ALLOW_PARALLEL_MP3_ENCODING	0 (default) or 1 Normally, the AUDIO32.DLL doesn’t allow multiple parallel MP3 encodings for one process. Former tests have revealed that the LAME encoder DLL is not thread-safe on multiprocessor machines. However, this might have been fixed in newer LAME versions. By setting the option to 1, the DLL can run several MP3 encodings in parallel.
SOURCECHANNELS	When converting a multi-channel file to stereo, the two channels which are to be used as source data, can be specified here as two comma- separated integers. Channel numbers are zero-based, and indicate the storage position of the channels in the source file. E.g. 0,1 specifies the first two channels. By default, the first two source channels are used, except when the source file is an RF64 file with two “stereo downmix” channels (e.g. “5.1+stereo”). In this case, the stereo downmix is used as source data.
For BWF output only
TITLE	Title of the clip (default: empty); will be written to the appropriate field in the ‘bext’ chunk of the BWF file.
AUTHOR	Author of the clip (default: empty); will be written to the appropriate field in the ‘bext’ chunk of the BWF file.
For SoX sampling rate conversion only
SOXR_QUALITY	Main quality parameter. Possible values are integer numbers in the range 0...7. Most values correspond to a specific setting for SoX’s “rate” effect. 0 = quick (SoX “-q” setting) 1 = low (SoX “-l” setting) 2 = medium (SoX “-m” setting) 3 = medium; 16-bit precision 4 = high (SoX “-h” setting); 20-bit precision 5 = high: 24-bit precision 6 = very high (SoX “-v” setting); 28-bit precision (default) 7 = highest possible; 32-bit precision The default setting of 6 is generally good enough even for very high quality 24-bit PCM encodings.
SOXR_PHASERESPONSE	Phase response of the SoX resampler, as an integer value in the range 0...100. Some values correspond to a specific setting of SoX’s “rate” effect: 0 = minimum (SoX “-M” setting) 25 = intermediate (SoX “-I” setting) 50 = linear (SoX “-L” setting) (default)
SOXR_STEEPFILTER	0 = standard bandwidth (95%) filter (default) 1 = high-bandwidth (99%) filter (SoX “-s” setting)
For RealAudio V5 output only
BITRATE	Bit rate in bit/s, or 0
CODEC	If BITRATE is 0 (or not given at all), CODEC must contain the number of the encoding codec (see AUDIO_QueryRealAudioCodecs() in Audio32.dll Technical Manual)
For RealAudio G2 output only
G2_SOURCE	0 (default), 1, 2 or 3 to define the type of source audio: 0 = Voice 1 = Voice with music 2 = Instrumental music 3 = Instrumental music in stereo
G2_TARGET	Decimal integer value, which is interpreted as a bit field to define the target data rates: Bit 0 (0x0001): 28K modem Bit 1 (0x0002): 56K modem Bit 3 (0x0008): Dual ISDN (128K) Bit 5 (0x0020): LAN Bit 6 (0x0040): 256K DSL Bit 7 (0x0080): 384K DSL Bit 8 (0x0100): 512K DSL (Options for bit 2 and 4 are obsolete)
For RealAudio 9 output only
RA9_SOURCE	Either VOICE or MUSIC to define the type of source audio
RA9_TARGET	Decimal integer value, which is interpreted as a bit field to define the target “audiences”. Setting Bit #n to 1 will create a stream for target audience #n. The properties of each “audience” can be configured with a commercial RealMedia toolkit.
For any RealAudio format (V5, G2 or 9)
TITLE	Title of the clip (default: empty)
AUTHOR	Author of the clip (default: empty)
COPYRIGHT	Copyright information for the clip (default: empty)
RA_SELECTIVERECORD	0 (default) or 1 Only if this option is 1, the resulting RealAudio file can be recorded with RealPlayer Plus.
RA_MOBILEPLAY	0 (default) or 1 If set to 1, the resulting RealAudio file can be stored on the local hard drive.
For MPEG Layer 3 output only
LAMEQUALITY	Quality setting for the LAME encoder: 0 = low 1 = medium 2 = high (default) 5 = very high
ID3V1	0 (default) or 1 If raw MP3 output is created, an ID3 V1 metadata tag is appended, if this option is set to 1.
ID3V2	0 (default), 3 or 4 If raw MP3 output is created, an ID3 V2 metadata tag can be written at the beginning of the file. 0 = no ID3 V2 tag 3 = ID3 V2.3 tag 4 = ID3 V2.4 tag Metadata for the tag can be passed in the metadata parameter (see below). Several DigaSystem standard fields (e.g. TITLE, COMPOSER, etc.) are filled into the appropriate ID3 V2 tags. Arbitrary ID3 V2 text tags (code “Txxx”) can be passed in field ID3V2/Txxx.
ID3V2_UNICODE	Character encoding used for ID3 V2 tag: 0 = ISO 8859-1 (“ANSI”) (default) 1 = UTF-16 2 = UTF-8
For AAC output only
AAC_HFCUTOFF	High-frequency cut-off: 0 = use default for given bitrate and sampling rate (default) 1 = no cut-off
AAC_VBR	Variable bitrate mode quality: 0 = no VBR coding (constant bitrate) (default) 1 = low quality vbr mode 1 2 = low quality vbr mode 2 3 = low quality vbr mode 3 4 = medium quality vbr mode 1 5 = medium quality vbr mode 2 6 = medium quality vbr mode 3 7 = high quality vbr mode 1 8 = high quality vbr mode 2 9 = high quality vbr mode 3
AAC_HE	High Efficiency AAC encoding: 0 = HE not used (default) 1 = HE v1 used and implicitly signaled in the bitstream 2 = HE v2 used and implicitly signaled in the bitstream
Time/pitch scaling options
TIMESTRETCH	A floating point number specifying the time-stretch factor. Default is 1.0, meaning no time stretching.
PITCHSCALE	A floating point number specifying the pitch-scaling factor (2.0 = one octave). Default is 1.0, meaning no pitch scaling.
TPSC_QUALITY	Quality control for the time/pitch scaling algorithm: 200: fastest/preview mode 201: still fast but better 202: Mpex for single instruments/voice (fast) 203: Mpex for single instruments/voice (best) (default) 204: general purpose polyphonic instruments (fast) 205: general purpose polyphonic instruments (good) 206: general purpose polyphonic instruments (best)
Options to convert only a part of the file
STARTTIME	Offset (in milliseconds) of the start of the cut (default: 0 = start of audio data)
ENDTIME	Offset (in milliseconds) of the end of the cut (default: 0 = end of audio data) NOTE: If non-PCM source and/or destination data is involved in the cut, 25ms are automatically added to ENDTIME for each non-PCM format (i.e., 50ms max.) to avoid sound loss due to codec filter delays.
FADETIME	Immediately after the beginning and before the end of the cut, a linear fade is applied to avoid “sharp” changes in audio level. This parameter defines the length of this fade in milliseconds. Default is 10. Note: The fade is not applied at the beginning (or end) of the cut, if the start offset (or end offset) is 0. It is assumed that the start and end of the audio material are already properly faded, if necessary. See also remarks on cuts from MPEG data below.
RAWCUT	0 (default) or 1 If set to 1, the audio cut is pre-selected at file level, i.e. the converter calculates, which part of the file it needs to read and convert to generate the selected cut. This will be significantly faster, if a small part from very large source file is to be extracted. But there are also drawbacks: It works only with linear and MPEG Layer II input data Without a “raw cut”, some metadata of the source file are preserved in the cut. This covers the metadata of BWF chunks, and the marker points of all WAVE files (of course only those markers are transferred, which are inside the cut). See also remarks on cuts from MPEG data below.
Extracting channels as mono files
SPLIT_INTO_MONO	If this is set to 1, the call to `AUDIO_ConvertFile()` is rerouted to `AUDIO_SplitFileIntoMonoFilesW()`. The OUTPUTFILENAME parameter is passed as the OutputFilenameTemplate to AUDIO_SplitFileIntoMonoFilesW. See section 3.5.3 RenderProject of the Audio32.dll Technical Manual for further information.
FIRST_OUTPUT_FILE_NUMBER	See parameter firstOutputFileNumber of function `AUDIO_SplitFileIntoMonoFilesW()`.
Logging
WRITELOG	If set to 1, the DLL will write some basic information about the conversion to a text file named “Audio32.DLL.log” in the same directory as the DLL itself. NOTE: This is a preliminary implementation only. It is intended for special testing and debugging requirements, but not for general use.

General Notes

The sections SAMPLERATE, BITRATE, RESOLUTION and MODE can be omitted, if the respective value(s) should be taken from the input file.

Metadata: This string can contain a SectionString with arbitrary metadata fields. For some output formats (e.g. MP3 with ID3V2 tag), these metadata are written to the target file. For the names of the data fields (e.g. TITLE, etc.), the standard DigaSystem field names should be used.

Audio cuts in MPEG data

When cutting audio (by setting STARTTIME and/or ENDTIME parameters) from non-linear source data, the transcoding normally works like this: Decode the source data to linear, extract the range from STARTTIME to ENDTIME from the linear data (and apply fading, if FADETIME is non-zero), and encode the result to the output format.

When working with MPEG Layer 2 data, some special handling is applied. If the following conditions are all satisfied, the decoding/encoding steps are skipped:

Source AND destination format is MPEG Layer 2, with the same sampling rate, and the same number of audio channels. Such a conversion is handled by a single MPEG-Transcoding step. Important: If the destination bitrate is higher than or equal to the source bitrate, no actual transcoding is done!
RAWCUT is “1”. “1” is the default value of the parameter, if condition a) is true. Otherwise, the default for RAWCUT is “0”.
ENFORCE_INTERMEDIATE_PCM is “0”. This is the default value of the parameter.
FADETIME is 0. 0 is the default value, if conditions a)-c) are all true. Otherwise, the default for FADETIME is 10.

Summary: If condition 1 is true, and the parameters RAWCUT, ENFORCE_INTERMEDIATE_PCM and FADETIME are not specified (i.e., the defaults apply), then the audio cut will be created without an intermediate MPEG decoding/encoding step.

If the MPEG decoding/encoding is skipped, the automatic extension of the cut length by 50ms (see note in description of ENDTIME parameter) is also omitted.

Section String

Common

FILETYPE

FORMAT

SAMPLERATE

DOWNSAMPLING

BITRATE

RESOLUTION

MODE

DBGAIN

SRCMODE

MP2FILTER

RELAXED_ERROR_CHECK

ENFORCE_INTERMEDIATE_PCM

NEVER_CREATE_RF64

ALWAYS_CHECK_FOR_MP3

OMIT_DIRECTX_TEST

MP3_DECODING_METHOD

ALLOW_PARALLEL_MP3_ENCODING

SOURCECHANNELS

For BWF output only

TITLE

AUTHOR

For SoX sampling rate conversion only

SOXR_QUALITY

SOXR_PHASERESPONSE

SOXR_STEEPFILTER

For RealAudio V5 output only

BITRATE

CODEC

For RealAudio G2 output only

G2_SOURCE

G2_TARGET

For RealAudio 9 output only

RA9_SOURCE

RA9_TARGET

For any RealAudio format (V5, G2 or 9)

TITLE

AUTHOR

COPYRIGHT

RA_SELECTIVERECORD

RA_MOBILEPLAY

For MPEG Layer 3 output only

LAMEQUALITY

ID3V1

ID3V2

ID3V2_UNICODE

For AAC output only

AAC_HFCUTOFF

AAC_VBR

AAC_HE

Time/pitch scaling options

TIMESTRETCH

PITCHSCALE

TPSC_QUALITY

Options to convert only a part of the file

STARTTIME

ENDTIME

FADETIME

RAWCUT

Extracting channels as mono files

SPLIT_INTO_MONO

FIRST_OUTPUT_FILE_NUMBER

Logging

WRITELOG

General Notes

Audio cuts in MPEG data