Sunday, May 24, 2009

Press Pound(ing headache) for those pesky telephony formats

When you’re a voice talent, as you progress in your chosen profession, you will probably find yourself doing more of one type of work than another:

….and for those of us that choose to: Telephony.

First things first: How do you to SAY “Telephony"
Click on the little flash player below to hear how to say the word correctly

Ahhh yes…telephony….that which is dreaded by the rest of humanity but loved by those of us that record for it! The “Press 1 for Customer Service” work.

Those of us that record these types of messages love them for one main reason: their repetition….it’s consistent work.

No, it’s not the highest paying VO work and most of it is non-union. But, once you start working in the telephony field and have your voice chosen by a client for their phone system…you’re in like Flynn. YOU have become the voice of that system. And updates happen a lot in telephony.

Now. There are two main ways to get into telephony:

1) The first way is to get on the roster of an IVR or MOH company who will in effect do the selling for you.

Oh…IVR? MOH? Liz’s what’s with all this alphabet soup?????


IVR = Interactive Voice Response – These are the “Press 1, Press 2” type systems that require an interaction with you, a response from you, in order to get you to the right person or department.

MOH = Message on Hold – These are in effect sales and informational messages that clients have us record so that you know about their services or latest company happenings while you’re waiting on hold.


An IVR or MOH company will present their roster of talent to their clients and that client will (hopefully) choose you to record their messages. At this point the IVR/MOH company will send you a script, you record it, and you just send it back usually as an .mp3 or a .wav file. The IVR/MOH company will handle all the production.

In this case it’s pretty much like any other VO job most talents are used to.

2) The second way to get into telephony is to work with clients directly.

Here is where many voice talent’s eyes glaze over (or throats close up in fear).
There are so many different types of digital telephony formats out there and frankly unless you’re familiar with them, when clients start saying that they need a Dialogic 8bit, 8K Mu-Law .vox file…all that many talents hear is Ancient Etruscan.

I seem to have fallen into this niche and often get calls from other voice talents in a panic saying “My client says he needs a .vox file. What do I do?”

My first question is “What type of vox file?”
…at which point I get another panicked “There’s more than one type of vox file???

One of the most important aspects of being successful in this part of the business is knowing what questions to ask.

You will NOT look stupid by asking these questions, you will look like you know what you’re doing, making sure that you give your client exactly the type of audio file he needs.

A little history:
Back in the old days… 15 years ago…before the advent and widespread use of .mp3 files, people – mostly large institutions like banks – discovered that it was A LOT cheaper to have calls routed by a computer than by a human.

One study back in the day found that if a human answered and routed a call, it could cost a company over $4.00 per call.
If the computer did it via an IVR system, it only cost the company $0.25 per call.

Yeah, a no-brainer in the eyes of the companies. And a boon for voiceover talent.

Many companies went IVR crazy with 6, 7, 8 even 9 options that people had to wade through with no way out.... and thus...voicemail jail was born.

Luckily things have changed a bit with "usability studies" and "Opt-out" features like pressing "0"for the operator...But back to history...

What these companies found was that all this audio was taking up a lot of space on the “primitive” servers of the time, so different audio file formats were created to take up less space. In the process, unfortunately, often trading audio quality for file size.

Many different computer systems and many different file formats came to be.

And they don't play nicely together.
If you think MAC's & PC's don't get along:
With telephony, if you don’t provide the right file format for your client’s system it’ll sound like the snow on an old TV set.

Just like there are different types of:
  • .wav files (mono, stereo, 48K, 44.1K, 16bit, 24bit….) and different types of
  • .mp3 files (128K, 256, 96K…)
  • there are also different types of telephony audio files.
You’ve got:
  • Dialogic,
  • OKI,
  • InterVoice,
  • Natural Microsystems,
  • CCITT,
  • NeXT/Sun…just to name a few.

Now, don’t panic.
You cannot, and you SHOULD not have to guess as to the correct format the client needs.

It is the client’s responsibility to tell you what they need.
Let me repeat that: It is the client’s responsibility to tell you what they nee

But frankly sometimes they don’t have a clue.

So what do you do? This is where you become your client's problem solver.
Basically, when a client says he needs a telephony file format you need to ask 3 questions:

  1. What’s the format or sound family?
  2. What’s the bit rate/sound type?
  3. What’s the sampling rate?
If they don’t know the answer, you can tell them that they should be able to find out that information by either contacting
  • their hardware vendor,
  • their IT person or
  • by actually looking at the instruction book that came with the system!
The other option is to Google the name of their hardware and see if you can find the information yourself. That's up to you. If the client is big enough, it might make you look really good!

But again, unless you give them the right file type, it just won’t play, and they’ll blame you for that! Better to get the answer upfront than by trial and error.

But let’s say a client says: “I need a .vox file?”

What’s the format or Sound family?
The format is “.vox” – 9 times out of 10 that will be a Dialogic .vox file so you can at least pretty much go with that assumption.

What’s the bit rate/sound type?
Is it ADPCM or Mu-Law (also pronounced simply “U-Law” – comes from the Greek)
ADPCM is always 4bit. Mu-Law is always 8bit

What’s the sampling rate?
6000 Hz (6K), 8000 Hz (8K)? Another rate?

With Dialogic for example the 2 most common are:
4bit 6K ADPCM and 8bit 8K Mu-Law

But the bottom line is that with so many formats out there, you can never assume anything.
Just ask.

So how do you create all these formats?

Here’s the bad news:
Most professional audio programs that we use for voiceover do NOT convert to telephony formats.

They will convert to some.
SoundForge, for example, can convert to 4bit ADPCM .vox and 8bit 8K CCITT mu-Law .wav, but not .vox.

There is pretty much only one program out there that will convert to all of them: VoxStudio. It was invented by a company in Belgium and has pretty much cornered the market on telephony file conversion. Once you get used to the way to format a voiceover script and import it into the software, it’s a powerful and useful program.

However, at over $550 (399 Euros) it’s not a cheap program, and it only works on the PC platform. I would only suggest you get it if you know you’ll be doing a lot of telephony work directly for clients.


If your eyes haven’t completely glazed over and turned to mush yet, take a break, read over this post a few more times and if you still have questions feel free to email me. I’ll be happy to help!

Like I said, it’s not a glamorous part of the voiceover biz, but it certainly is one of the most consistent.