Audio32 Section Strings for Audio File Conversion
The AUDIO_ConvertFile
()
functions inside Audio32.dll convert an audio file to an output file using a specified format.
This format is also used for Audio Analysis and Audio Convertion tasks within Workflows of the DPE Workflow System.
Section String
A section string contains [SECTIONS] and the VALUE for this section, e.g.
[FILETYPE]WAV[FORMAT]LINEAR[SAMPLERATE]44100
Section Name | Values |
---|---|
Common | |
FILETYPE | Container format of the output file. AIFF: AIFF file MPEG4: MPEG-4 file (only valid with audio format AAC) NOTES: Several combined container/essence formats must be specified as RAW files with the appropriate FORMAT value. This includes e.g. FLAC, OGG- VORBIS, REALAUDIO, REALG2, REAL9 and WMA. If WAV or BWF is specified as target format, the output file will automatically written as an RF64 file, if the filesize is larger than 4 GB. |
FORMAT | Format of the audio essence data within the output file. LINEAR: Uncompressed PCM LINEAR(FP): Uncompressed PCM (floating-point format) MPEG-LAYER2: MPEG Layer 2 REALAUDIO: RealAudio Version 5 audio REALG2: RealMedia G2 OGG: Ogg-Vorbis |
SAMPLERATE | Audio sampling rate in samples/second. |
DOWNSAMPLING | Only for some MPEG II formats: If set to 1, the actual sample rate is half the value given by SAMPLERATE. |
BITRATE | Bit rate in kbit/second. |
RESOLUTION | Only for PCM and WMA data: Width of one sample in bits. For integer PCM (format LINEAR), it can be any value from 8 to 32. For floating-point PCM (format LINEAR(FP)), it can be 32 or 64. WMA supports 16 and 24 bit data. |
MODE | Information about the number of channels. MONO |
DBGAIN | The audio signal will be amplified by this value (in dB). Default is 0, meaning no amplification. |
SRCMODE | Method used for sampling rate conversion. The following options are supported: 0: A very fast, but not very exact algorithm If SRCMODE=3 is selected, the quality of the SoX resampler can be influenced by several SOXR_... parameters, q.v. below. |
MP2FILTER | 0 (default) or 1 If 1, a frame filter is used when decoding MPEG Layer 2 data. This means that “junk” data within the audio essence is skipped without generating an error. |
RELAXED_ERROR_CHECK | 0 (default) or 1 For details, see section 3.5.5 Error handling during conversion in the full Audio32.dll Technical Manual (available upon request). |
ENFORCE_INTERMEDIATE_PCM | 0 (default) or 1 Normally, conversion between two MPEG Layer 2 formats use a direct transcoding of the audio data without generating any intermediate uncompressed audio. This is very fast, but the “audio level” metadata generated for some file formats (e.g. “LEVL”-Chunk in BWF files) may be somewhat inexact. If this option is set to 1, generation of intermediate PCM data can be enforced. Conversion will be slower, but will create absolutely exact level data. |
NEVER_CREATE_RF64 | 0 (default) or 1 If WAV or BWF is specified as target file format, the converter reserves a bit of space in the file header, so that the file can be changed to RF64 format as soon as the size exceeds 4 GB. If the size remains below this threshold, the file will remain a 100% conformant WAV/BWF file. However, some malformed 3rd party software may not like the reserved header space (formatted as “JUNK” chunks). If this option is set to 1 (default is 0), this space is not written. The drawback is that the conversion fails, if the target file size reaches 4 GB. |
ALWAYS_CHECK_FOR_MP3 | 0 (default) or 1 If this option is 1, any input file, for which no format information can be found by other means, is scanned for MP3 frames. By default, this is only done for files with extension “.mp3”. |
OMIT_DIRECTX_TEST | 0 (default) or 1 By default, an input file, for which no format information can be found by other means, is passed to the operating system’s DirectX filter to find out if it can be decoded. Because buggy DirectX filters may crash for some files, this DirectX test can be disabled by setting this option to 1. |
MP3_DECODING_METHOD | Method used to decode MPEG Layer 3 input. 0: use proprietary internal decoder (has issues with a small subset of MP3 data) |
ALLOW_PARALLEL_MP3_ENCODING | 0 (default) or 1 Normally, the AUDIO32.DLL doesn’t allow multiple parallel MP3 encodings for one process. Former tests have revealed that the LAME encoder DLL is not thread-safe on multiprocessor machines. However, this might have been fixed in newer LAME versions. By setting the option to 1, the DLL can run several MP3 encodings in parallel. |
SOURCECHANNELS | When converting a multi-channel file to stereo, the two channels which are to be used as source data, can be specified here as two comma- separated integers. Channel numbers are zero-based, and indicate the storage position of the channels in the source file. E.g. 0,1 specifies the first two channels. By default, the first two source channels are used, except when the source file is an RF64 file with two “stereo downmix” channels (e.g. “5.1+stereo”). In this case, the stereo downmix is used as source data. |
For BWF output only | |
TITLE | Title of the clip (default: empty); will be written to the appropriate field in the ‘bext’ chunk of the BWF file. |
AUTHOR | Author of the clip (default: empty); will be written to the appropriate field in the ‘bext’ chunk of the BWF file. |
For SoX sampling rate conversion only | |
SOXR_QUALITY | Main quality parameter. Possible values are integer numbers in the range 0...7. Most values correspond to a specific setting for SoX’s “rate” effect. 0 = quick (SoX “-q” setting) The default setting of 6 is generally good enough even for very high quality 24-bit PCM encodings. |
SOXR_PHASERESPONSE | Phase response of the SoX resampler, as an integer value in the range 0...100. Some values correspond to a specific setting of SoX’s “rate” effect: 0 = minimum (SoX “-M” setting) |
SOXR_STEEPFILTER | 0 = standard bandwidth (95%) filter (default) 1 = high-bandwidth (99%) filter (SoX “-s” setting) |
For RealAudio V5 output only | |
BITRATE | Bit rate in bit/s, or 0 |
CODEC | If BITRATE is 0 (or not given at all), CODEC must contain the number of the encoding codec (see AUDIO_QueryRealAudioCodecs() in Audio32.dll Technical Manual) |
For RealAudio G2 output only | |
G2_SOURCE | 0 (default), 1, 2 or 3 to define the type of source audio: 0 = Voice |
G2_TARGET | Decimal integer value, which is interpreted as a bit field to define the target data rates: Bit 0 (0x0001): 28K modem (Options for bit 2 and 4 are obsolete) |
For RealAudio 9 output only | |
RA9_SOURCE | Either VOICE or MUSIC to define the type of source audio |
RA9_TARGET | Decimal integer value, which is interpreted as a bit field to define the target “audiences”. Setting Bit #n to 1 will create a stream for target audience #n. The properties of each “audience” can be configured with a commercial RealMedia toolkit. |
For any RealAudio format (V5, G2 or 9) | |
TITLE | Title of the clip (default: empty) |
AUTHOR | Author of the clip (default: empty) |
COPYRIGHT | Copyright information for the clip (default: empty) |
RA_SELECTIVERECORD | 0 (default) or 1 Only if this option is 1, the resulting RealAudio file can be recorded with RealPlayer Plus. |
RA_MOBILEPLAY | 0 (default) or 1 If set to 1, the resulting RealAudio file can be stored on the local hard drive. |
For MPEG Layer 3 output only | |
LAMEQUALITY | Quality setting for the LAME encoder: 0 = low |
ID3V1 | 0 (default) or 1 If raw MP3 output is created, an ID3 V1 metadata tag is appended, if this option is set to 1. |
ID3V2 | 0 (default), 3 or 4 If raw MP3 output is created, an ID3 V2 metadata tag can be written at the beginning of the file. 0 = no ID3 V2 tag Metadata for the tag can be passed in the metadata parameter (see below). Several DigaSystem standard fields (e.g. TITLE, COMPOSER, etc.) are filled into the appropriate ID3 V2 tags. Arbitrary ID3 V2 text tags (code “Txxx”) can be passed in field ID3V2/Txxx. |
ID3V2_UNICODE | Character encoding used for ID3 V2 tag: 0 = ISO 8859-1 (“ANSI”) (default) |
For AAC output only | |
AAC_HFCUTOFF | High-frequency cut-off: 0 = use default for given bitrate and sampling rate (default) |
AAC_VBR | Variable bitrate mode quality: 0 = no VBR coding (constant bitrate) (default) |
AAC_HE | High Efficiency AAC encoding: 0 = HE not used (default) |
Time/pitch scaling options | |
TIMESTRETCH | A floating point number specifying the time-stretch factor. Default is 1.0, meaning no time stretching. |
PITCHSCALE | A floating point number specifying the pitch-scaling factor (2.0 = one octave). Default is 1.0, meaning no pitch scaling. |
TPSC_QUALITY | Quality control for the time/pitch scaling algorithm: 200: fastest/preview mode |
Options to convert only a part of the file | |
STARTTIME | Offset (in milliseconds) of the start of the cut (default: 0 = start of audio data) |
ENDTIME | Offset (in milliseconds) of the end of the cut (default: 0 = end of audio data) NOTE: If non-PCM source and/or destination data is involved in the cut, 25ms are automatically added to ENDTIME for each non-PCM format (i.e., 50ms max.) to avoid sound loss due to codec filter delays. |
FADETIME | Immediately after the beginning and before the end of the cut, a linear fade is applied to avoid “sharp” changes in audio level. This parameter defines the length of this fade in milliseconds. Default is 10. Note: The fade is not applied at the beginning (or end) of the cut, if the start offset (or end offset) is 0. It is assumed that the start and end of the audio material are already properly faded, if necessary. See also remarks on cuts from MPEG data below. |
RAWCUT | 0 (default) or 1 If set to 1, the audio cut is pre-selected at file level, i.e. the converter calculates, which part of the file it needs to read and convert to generate the selected cut. This will be significantly faster, if a small part from very large source file is to be extracted. But there are also drawbacks:
See also remarks on cuts from MPEG data below. |
Extracting channels as mono files | |
SPLIT_INTO_MONO | If this is set to 1, the call to The OUTPUTFILENAME parameter is passed as the OutputFilenameTemplate to AUDIO_SplitFileIntoMonoFilesW. See section 3.5.3 RenderProject of the Audio32.dll Technical Manual for further information. |
FIRST_OUTPUT_FILE_NUMBER | See parameter firstOutputFileNumber of function AUDIO_SplitFileIntoMonoFilesW() . |
Logging | |
WRITELOG | If set to 1, the DLL will write some basic information about the conversion to a text file named “Audio32.DLL.log” in the same directory as the DLL itself. NOTE: This is a preliminary implementation only. It is intended for special testing and debugging requirements, but not for general use. |
General Notes
The sections SAMPLERATE, BITRATE, RESOLUTION and MODE can be omitted, if the respective value(s) should be taken from the input file.
Metadata: This string can contain a SectionString with arbitrary metadata fields. For some output formats (e.g. MP3 with ID3V2 tag), these metadata are written to the target file. For the names of the data fields (e.g. TITLE, etc.), the standard DigaSystem field names should be used.
Audio cuts in MPEG data
When cutting audio (by setting STARTTIME and/or ENDTIME parameters) from non-linear source data, the transcoding normally works like this: Decode the source data to linear, extract the range from STARTTIME to ENDTIME from the linear data (and apply fading, if FADETIME is non-zero), and encode the result to the output format.
When working with MPEG Layer 2 data, some special handling is applied. If the following conditions are all satisfied, the decoding/encoding steps are skipped:
- Source AND destination format is MPEG Layer 2, with the same sampling rate, and the same number of audio channels. Such a conversion is handled by a single MPEG-Transcoding step. Important: If the destination bitrate is higher than or equal to the source bitrate, no actual transcoding is done!
- RAWCUT is “1”. “1” is the default value of the parameter, if condition a) is true. Otherwise, the default for RAWCUT is “0”.
- ENFORCE_INTERMEDIATE_PCM is “0”. This is the default value of the parameter.
- FADETIME is 0. 0 is the default value, if conditions a)-c) are all true. Otherwise, the default for FADETIME is 10.
Summary: If condition 1 is true, and the parameters RAWCUT, ENFORCE_INTERMEDIATE_PCM and FADETIME are not specified (i.e., the defaults apply), then the audio cut will be created without an intermediate MPEG decoding/encoding step.
If the MPEG decoding/encoding is skipped, the automatic extension of the cut length by 50ms (see note in description of ENDTIME parameter) is also omitted.