As Moore’s law pushes the capabilities of technology ever harder, has voice recognition finally got the power it needs to come of age? With more sceptics than converts, I thought I’d write a post about my experiences with it over the last few years to clear away the myths. And as a special aside for translators, how voice recognition (VR) works with CAT tools (translation industry-specific software).
Myth 1 - Voice recognition makes too many mistakes to be worthwhile I have used Dragon Naturally Speaking 9 for two years now and find the amount of mistakes made to be minimal. Those mistakes that appear over and over can be trained away. Common mistakes I find with my accent - a non-region-specific blend of Southern/Midlands UK with hints of other influences - would be “we’re” recognised as “were”, “for” (ie. he’d be there fuh three days) being missed by the software or names of people the software hasn’t heard before. These errors have been trained away in minutes though, and I always keep an eye out for them now, just in case. The software is set to recognise me as a British English speaking male. I have read the training texts for approximately one hour in total.
I don’t use the software on every translation I do as it often isn’t appropriate. VR software seems better suited to dialogue or magazine style texts than, for example, context-less software translations. As I translate comic books, this is where VR really comes into its own. The bottom line, as they say, is that my productivity increases.
Myth 2 - You need the latest computer to run VR software My computer is a trusty Thinkpad T42 with 1GB of RAM and a 32GB HD. It can be slow. But not that slow, in that as stated above, productivity increases. Obviously a faster computer would make the process a lot more satisfying to work with, however, I have found that my low spec has presented no obstacles in this respect.
Myth 3 - Voice recognition software costs too much for too little return The latest version of the Dragon software, version 10, (non-affiliate direct link, amazon.co.uk) is only £40 in the UK and $40 in the US. Reviews and descriptions say it does not require training, with over 99% accuracy out of the box. I haven’t tried it yet, but look forward to doing so. I’ll import all of the training I’ve accrued over the last few years at the same time.
I find this excellent value for money, given the speed increase involved.
Fact 1 - Working with others in the room is an issue Switching the software to “Off” in order to talk to someone in the room can be impolite, as if you’re finishing a conversation that the other person wasn’t aware you were having. Also translating intimate scenes between cartoon characters can be slightly embarrassing, although perhaps I’m in the minority of translators having to deal with that particular issue. General unavailability to talk and being a distraction to others make VR an antisocial piece of software.
A word of advice - don’t leave the room with the VR software and TV/radio on at the same time, it’s not fair on your computer.
Fact 2 - Typical VR errors are hard to detect Homophones, words that sound similar, are the standard errors with VR. These are figured out with the help of context in the VR software. For example, wants vs once. Which, incidentally, is the kind of error I start to make myself when tired. On a slight tangent, this has started me thinking and googling about words being stored as sounds rather than letters, with spellings being a guide and mnemonic for how to write out the sound, but I haven’t got very far with that research.
Homophones are harder to detect than typos. They aren’t picked up in spell-checkers and even proofreading them can be tricky at times. I proof read every sentence after settling on a translation, and then the whole text again in context. I do this anyway, when not using VR, but keeping an eye out for these specific errors is an additional requirement to consider.
Fact 3 - VR is fully compatible with CAT tools Wherever you can input text on your computer, you can use VR. Wherever the cursor blinks, the text is entered. All punctuation must be pronounced. This can take some practice. The software can be set to punctuate automatically, but in my case I’d rather have control of that.
As translation requires a stop-go text input method, while our brains search the memory banks for fuzzy matches, the productivity increases would be less marked than if reading from a book into the software. However the speed gains are still present, especially in larger segments, and it is for this reason that I’d still recommend its use.
Examples online Plenty of examples of typing performance increases can be found through Google et al., here’s the first one I saw, including a video: Typing vs VR. The author found an increase in speed from 73wpm to 126wpm and a reduction in errors. Although as stated in the comments and above, VR errors are of a different nature.
If you have any questions, I’m happy to answer them in the comments section below.
PS Contrary to most voice recognition related blog posts, this one was not dictated with VR.