Voice and Video

In technology today, and especially on the web, there is a constant push for the new shiny thing. Lately it seems like that new shiny thing comes in two flavors: Voice Recognition and Video. In my (not-uncontroversial) opinion those are two of the most overrated technologies in the business right now.


They say that a picture is worth a thousand words. Undoubtedly true, but honestly…don’t use a thousand words when 56 words will do. It seems like today every website is trying to video-enable itself and recently I even saw a pitch for video e-mail! That’s fine when the subject is complicated and best explained in pictures or video, but ususally it’s not.

 I get annoyed when I click on a headline like “Three Sectors Showing Growth In First Quarter.” only to be taken to a video of some talking head news person READING the story to me. I can read, thanks. The information could have been conveyed to me in 3-4 paragraphs of text. I don’t want to sit through 4 minutes of video (plus commercials) to get the same information.

It would be one thing if it were a complicated topic and the video included, in addition to the over-coiffed talking head, some useful graphics, charts, animations or other media that helped to explain the topic – but too often video adds sizzle but no steak.

 Likewise with the video e-mail – in this demonstration the presenter showed how an auto dealership can send a video e-mail to remind a customer that they’re due for an oil change. Why on earth would I want to sit through a two minute video for something that can be explained in a few sentences of text? I’m a busy person, I have things to do. Give me the information as clearly and concisely as possible so I can get on with my day.

 I don’t mind if a site gives me the OPTION to see a video. Give me the information in text form and a link where I can click to see a video that goes into the story in more depth; that’s fine. I can choose to watch the video or not. But don’t force me to watch somebody else read the story to me. That’s just painful. 

 Video is sometimes useful, but it’s often a shiny thing and right now it’s being dramatically overused. 


Voice recognition has been a holy grail of technology for decades. We all watched “Star Trek” where Captain Kirk would just tell the computer what he wanted. In “Iron Man” we saw a utopian computing vision where Tony Stark carried on a conversation with his computer and interacted with it three-dimensionally using gestures and holograms. That’s pretty cool and not as far from reality as we might imagine (if you have the budget) (and a private bunker workshop full of robots and sports cars) but even so…it has a fairly niche market. 

It’s not that you can’t draft a will by standing in a room grabbing holographic clauses out of the air and dictating the client information. It’s that the facility needed to make that happen is increasingly rare for most knowledge workers (like lawyers). You may well dictate your next memo, but are you going to do that at a crowded airport gate? In Starbucks? In the reception area of a client’s office? The trend in technology is to make us increasingly mobile. That’s what iPads and Androids and netbooks and 4G (and to some extent “The Cloud”) are all about. Getting us out from behind our big oak desks and out into the world (or at least out into the coffee shop) And the requirement for good voice recognition is not only a good microphone but privacy.

There are so many conditions that can impede voice recognition that I really question how effective it will be for the masses in the short term. How many users really spend their workdays in a quiet, private environment where they’re not listening to music and are willing to wear a headset while they work?

Don’t get me wrong, there ARE people successfully using voice recognition today and the technology for voice recognition is steadily improving. Someday we’ll be able to talk to our computer well without the headset and our computer will have the sophisticated algorithms needed to pick out our voice among background noise and understand what we want from context so it won’t have to precisely hear every syllable. But we’ll probably never be at a place where we’re willing to dictate things out loud on an airplane (especially in coach). Besides if you had to sit next to somebody on a plane dictating a real estate purchase agreement you’d want to dump your drink over their head before the movie even started.

Voice recognition and video technology is getting better and better, but I still don’t believe it will ever be the way that most of us do most of our computing tasks. It’s not that the tech isn’t ready or won’t get better. It’s that most of the carbon-based technology (us) just doesn’t live/work in an environment that makes voice and video the best way to do all of the interaction with our systems.


  1. G Blair McCune

    “Speech recognition” rather than “voice recognition” is becoming the more-accepted term these days. (For example, Microsoft calls its product “Windows Speech Recognition”.) “Speech recognition” is probably a better term because “voice recognition” could describe the process of identifying individuals by their voices or manner of speaking.
    I use Dragon NaturallySpeaking (“DNS” Ver. 11 Professional on a Win 7 PC) for everything – briefs, motions, letters, e-mail, and this comment. It took a long time to learn how to dictate properly. But it has been worth it. I am a solo practitioner specializing in appeals, so I spend a great deal of time dictating long documents. Using DNS has made my document production quicker and more efficient.
    Also, DNS has saved a lot of wear and tear on my aging hands and wrists. I have been using DNS since Ver. 6 came out in 2002. I was prompted to begin using DNS because I was getting some slight osteoarthritis in my fingers. Needless to say, speech recognition is a godsend for people with more serious disabilities.
    I agree with your comments on problems with using speech recognition in public places. I only use DNS in the quiet of my office. But I know of other people using mobile solutions. DNS can be used with a small hand-held digital recorder. You can dictate into the recorder and download your dictation to your computer later. Also, DNS has an iPhone app, DragonDictate. Maybe it will not seem so unusual to see people dictating in public places in the near future – although I can’t believe dictation will ever be acceptable on an airplane. Noise-canceling microphones are getting better, so dictating in noisy environments is becoming more and more possible. Also, using a headset is not a requirement anymore. I use a desktop microphone and get good accuracy with it.
    A final note: if you use DNS, it is really (!) important to proofread carefully. Perfectly-spelled misrecognitions are hard to detect and can be embarrassing if they are not caught and corrected.
    The following public forum is a great resource for learning more about DNS: