The gadget blogs may work themselves into a frenzy over mega-pixels and processor speed. But if you want to know what really dazzles the masses, consider a feature that’s rarely called out by name: machine recognition of real-world sights and sounds. The success stories in this category represent triumphs of computation and software. Speech transcription on laptops and desktop computers is awesomely accurate. Still OCRs or Optical Character Recognition also need a lot of improvement when it comes to detection of handwriting material.
You can train some advanced OCR softwares like Abbyy Finereader which does a pretty decent job. Gestures on touch screens are generally reliable (there are, after all, a limited number of movements to recognize). The Xbox Kinect and some Samsung television sets have brought us body-movement recognition. The handwriting recognition in Windows 7 and 8 is a hidden gem, whether you print or write in cursive.
Phone apps such as Shazam and SoundHound can recognize pop songs playing in the background—and display their titles, performers and album names. Google Goggles, one of Google’s apps for Android phones and the iPhone, attempts visual recognition: snap a picture of a book cover, DVD box, wine label or painting, and the program instantly shows you the Google search results for that item.
Software can even pick out faces in a video, and YouTube’s copyright protection algorithms can compare your videos against known copyrighted material to make sure you’re not posting a video that originated from some TV network. That’s all fantastic. When they work, sound, image and motion recognition really seem like magic. Unfortunately, the marketers realize that. They tempt us with myriad other computer-based recognition features that work about as well as cold fusion.
You buy something, drawn by its promised ability to recognize human commands, and it just doesn’t work well enough to bother with. Remember the Clapper? Some- times your two claps turned the lamp on, and sometimes it took a few attempts. What about a Whistle Switch? It could turn on your appliances by recognizing sound—in this case, a high- pitched, squeezable whistle. Oh, it turned the lights on, all right— but so did teakettles, squeaky hamster wheels and sharp sneezes.
Even a $700 for handwriting recognition worked maybe two out of five times. More recently, Samsung has been promising that its Galaxy S4 and even the newly launched Galaxy S5 phone can translate speech into another language, Star Trek style. Hold it up to a French speaker saying, “Où sont les toilettes? And the phone is supposed to say, out loud, “Where is the bathroom?” In fact, Samsung has just added one not-there-yet recognition technology on top of another. The S Translator app can’t even recognize foreign-language speakers’ utterances, let alone convert them into spoken English.
How many times will we get our hopes up before we start giving up on these features altogether? How many products will we return before manufacturers start to polish these technologies a little more before advertising their “miraculous” abilities?
We do have to realize that software-based recognition is no easy task. It’s not a crisp problem with one correct outcome, like a spreadsheet adding numbers together. You are asking the software to process fuzzy, vague, variable inputs: sounds, pictures, movements, scrawls. That’s why recognition isn’t 100 percent. It’s not consistent. No wonder it so often disappoints us.
Maybe a few more decades of better sensors, faster processors, bigger data sets and experimentation will finally bring us relief from continuous RFHS. In the meantime, perhaps both electronics companies and their customers should do a little recognizing of their own: machine recognition of our world is exciting but still evolving.
- Scientific American, David Pogue