In April, I blogged on the Digium site with a “primer” on Asterisk Sound files, and interviewed Digium’s Rod Montgomery, Product Manager for Digium, about the basic nature of Asterisk files. Rod never ceases to amaze me with the depth of his knowledge, but also the accessibility with which he explains things. For him to be able to discuss technical topics in depth with those who have a high tolerance for that – but to also adjust his phraseology for those of us less inclined, is an amazing gift. I have many clients who struggle with the technical aspects of Asterisk files (and who really don’t want to get bogged down with too much technical stuff). These articles will hopefully help to de-mystify and clarify Asterisk sound files, and how to install them and get them to play at their optimal level.
This article digs a little deeper into file aspects, characteristics, and offers some helpful conversion “tricks!”
A: Rod, lots of clients ask for A or U Law files. And .vox – that’s another one. Why do some systems require these telephony standards and others don’t?
RM: The µ-law algorithm (also written u-law) and its cousin A-law are dynamic range compression/expansion methods that make the human voice more intelligible over limited-bandwidth telephone lines. Just as our voices provide a wide frequency range (as shown in the chart above), they also provide a wide dynamic range: they can be very soft or very loud. This range is much wider than a telephone can reproduce, so these “companding” methods boost the quieter voices into the upper range of volume. Have you ever noticed when you’re listening to someone on the phone, that you can understand what they’re saying whether whispering or shouting? There are a number of reasons for this, but µ-law/A-law contribute to this.
It’s important to note that µ-law/A-law can quantize and compress 14 bits in 8 compressed bits. Therefore it’s best to convert to µ-law/A-law directly from higher-quality files, often 16-bit. If you first convert to 8-bit source, then you don’t take full advantage of µ-law/A-law’s ability.
VOX files are somewhat lower quality, being a 4-bit ADPCM format sampled at 6053 Hz. This format was popular with some Dialogic products and some PC audio cards and voice modems, and Asterisk can read/write it, but it’s rarely seen anymore.
A: We talked about down sampling last time – does it make sense to provide a high-resolution set of prompts as well as a down sampled version – much like a photographer providing clean, unaltered originals?
RM: As with digital images, down-converting a high quality file can be useful, but up-converting a low quality file yields poor results. The chart “demo-instruct”: …shows a frequency spectrum plot of the first Allison audio prompt that many Asterisk administrators hear from their own Asterisk system: instructions for the Asterisk demonstration. The shaded area shows the portion of the audio that can be reproduced by a typical telephone.
Recording at such a high sampling rate (48kHz) and bit depth (16 bit), exceeding even CD quality audio, provides the freedom to convert to other formats while retaining as much of the original quality as possible in the target format. This makes your custom prompts “future proof” as well, because higher-quality versions of the prompts can be substituted when more capable phones are used. And of course, Asterisk is smart enough to play the highest-quality prompt available for the type of phone in use.
A: How much “head” and “tail” (blank space at the beginning and end of a prompt sound file) is optimal?
RM: This is subjective, to be answered only be the administrator. If you trim out all the silence for typical prompts but play them one right after the other, they’ll sound unnatural and rushed because speakers normally leave a little space after each sentence or take a breath. If you leave too much silence, the prompts can take on an unresponsive or robotic feel. I prefer to leave a half-second or so “tail of silence” at the end; but keep the head very short so the prompt begins immediately when called. Again, this is very subjective and is fortunately easy to tune.
Pro tip: Create audio files of silence in the lengths you require, then call them from the dialplan — no need to edit the prompts to try different pauses. Also, Asterisk sounds include “silence” files in length of 1-10 seconds.
A: Some people have noticed a slight “clicking” sound at the end of their sound file – do we know what that is, and what can be done to rectify it?
RM: Visualize a nice clean sine wave. A speaker reproducing that wave will push one way when the wave is above the zero line, and pull the other way when the wave is below. Now if an audio file begins or ends with data far away from zero, then the reproducing speaker does its best to immediately jump to that position, causing a noise. The easiest way to avoid this is to use your audio editor to fade-in at the beginning and fade-out at the end of each prompt to be sure it begins and ends on a zero-crossing. The fade time can be very short an unnoticeable: it just needs to bring the volume back to zero.
It can’t be emphasized enough: Asterisk is known for its user-friendly nature and its straight-out-of-box usability, but when issues do arise, Digium has resources available to help troubleshoot – and The Support Center is always a good place to start!
Thanks for reading – and as always, your comments and feedback are welcome!
I’ll see everyone at Astricon this October in Denver, where I’ll be giving a talk based on my “15 Commandments of IVR” blog series!