Speech is the primary means of communication between people. Speech technology relates to the technologies designed to duplicate and respond to human voice.
Since computers were invented, research into the mechanical realization of human speech capabilities, the automation of tasks through human-machine interactions, and automatic speech recognition by machines has been conducted.
During the 1950s and 1960s, the first generation of speech technology, computers were able to recognize vowels and consonants, as well as monosyllabic words, largely from a single human speaker, as recognization differing human speech patterns was still a long way off. During the second generation, from the late 1960s through the 1970s, there was some progress toward solving the problem of non-uniformity in human speech. Using template-based speech technology, computers were able to recognize words, and even respond to simple database queries.
Beginning in the late 1980s, a shift in methodology from a template-based approach to a statistical modeling framework ushered in the third generation of speech technology. Computers were able to recognize a fluently spoken string of connected words. The technology progressed more rapidly in the 1990s. Computer-human speech became more conversational and spontaneous, and computers were less likely to be curtailed by partial words, hesitations, and word repairs that are common in human speech. The 2000s brought a more efficient detection of sentence boundaries, fillers, and disfluencies, and computers were able to recognize natural, unconstrained human-to-human speech, as from radio and television broadcasts, and even foreign conversational speech in multiple languages. Computers are even being taught to recognize facial cues in human speech.
Computers have become far more efficient at duplicating human speech, as well, although most people are still able to tell the difference between computer and human speech. The lines are rapidly becoming blurred, however.
Automatic speech recognition systems are used for call processing in telephone networks and in query-based information systems. The technology is also used to assist the voice-disabled, the hearing-disabled, and the blind, as well as to communicate with computers without using a keyboard. Apple's Siri and Amazon's Alexa are good examples of the technology. Game software has also been greatly enhanced through speech technology.
Just as there are numerous uses for speech technology, there are several subfields, including speech synthesis, speech recognition, speaker recognition, speaker verification, speech encoding, and multimodal interaction.
Speech synthesis refers to the process of generating spoken language by a machine on the basis of written input. The first computer-based speech-synthesis system was created in the late 1950s, although the first English text-to-speech system was developed in Japan in 1968. Modern uses include utilities designed for the vision-impaired, reading the text of emails and webpages, as well as the e-book reader in the Amazon Kindle devices.
Speech recognition is the ability of a computer to identify and respond to human speech. Some of these systems require training, where a user reads text or isolated vocabulary into the system, allowing it to analyze the person's specific voice. Siri and Alexa both use this technology as well.
Speaker recognition is the technology that allows a computer to identify a person from the characteristics of that individual's voice. Speaker verification builds on the technologies developed through speaker recognition and uses it to accept or reject the identity claimed by the speaker. Some banking systems use speaker verification in their customer call centers
Speech encoding is the compression of speech into a code, and is used for transmission with speech codecs that use audio signal processing and speech processing techniques. Mobile phones use speech encoding technologies.
Through multimodal interaction, users are provided with multiple modes of interacting with a computer system. As an example, a multimodal question answering system might allow for text or photo at both the question and answer level.
Speech technology involves both hardware and software, and advances in both areas of technology have resulted in the computer-human speech systems we have today and will drive further improvements in the future. Topics related to either the hardware or the software designed specifically for speech technology is the focus of topics in this category. Websites that discuss the technology itself are also appropriate here.
 
 
Recommended Resources
Developed at Carnegie Mellon University, CMUSphinx refers to a group of speech recognition systems resulting from more than twenty years of CMU research. CMUSphinx tools are designed for low-resource platforms, are of a flexible design, and focus on practical development rather than research. Available under a BSD-like license, support for several languages are included. Commercial support is available, along with an active development community.
https://cmusphinx.github.io/
Customers of this service can receive their emails over the phone. Emails received by the service are converted to voice messages and delivered to the specified telephone number. Using any phone line, customers can dial into their account and choose the messages they want to listen to as voice messages, and they have the option of replying to the sender, with the message recorded as a WAV file. There are three monthly pricing plans, based on anticipated usage.
https://www.email2phone.net/
Originally known as Speak, and written for Acorn/RISC_OS computers in 1995, the current version is a rewrite for Linux and Windows machines, although it has been ported to Android, Mac OS X, and Solaris. Hosted on SourceForge, eSpeak is a compact open-source (GNU General Public License) software speech synthesizer for English and other languages. Its features are listed here, along with a list of supported languages, samples, and documentation.
http://espeak.sourceforge.net/
Founded in 2005, Speech Recognition Solutions assists users of speech recognition software through the provision of high-quality microphones and other hardware, as well as software, training, and support services. The family-owned business is highlighted, including its business philosophy, featured products sorted by category, and clearance items. A list of manufacturers is included, along with a microphone buying guide, training programs, and an online support forum.
https://www.speechrecsolutions.com/
Published quarterly by Information Today, the magazine discusses advances and other industry news, both in its paper magazine and on its website, which includes content from the magazine, including subscription and advertising information. The current issue of the magazine may be downloaded from the site, and previous issues are also available. White papers, webinars, and a reference guide to speech technology are provided. SpeechTEK, an annual conference, is sponsored by the magazine.
https://www.speechtechmag.com/
Known as SpeechPro in the United States, STC is a Russian voice recognition technology company. Founded in 1990, the company evolved out of KGB programs in partnership with the Soviet Ministry of Communications’ scientific development center. Now a commercial development company, it develops facial recognition, voice, and multimodal biometric systems, as well as solutions for audio and video recording, processing, and analysis. Its products and services are highlighted.
https://speechpro.com/
The Seattle company develops class speech technology designed for the assessment of pronunciation and fluency, the goal being to make practicing and improving speaking abilities without intensive one-on-one instruction. Focused solely on education, Its speech API used by publishers, language learning providers, universities, and K-12 schools. Available in a Basic, Pro, and Premium pricing plan, the API covers both US and UK English. Its features are explained in a video.
https://www.speechace.com/
Founded in 2006, TTMT offers a variety of products, including speech devices, communication boards, switches, and alternative access devices, and mounts, as well as software-based on eye gaze technology. A selection of speech-generating devices includes those for individuals with communication difficulties, type-to-talk devices, and eye-tracking speech-generating devices. The company also offers communication boards for parks, playgrounds, and schools.
https://www.talktometechnologies.com/
Tobii Dynavox specializes in products to help individuals with different physical and cognitive limitations, through a variety of devices and software. Its products include devices for eye control of computers, and its software includes programs for eye tracking and eye control, as well as communication software that converts text and symbols to speech, literacy training applications for non-verbal students, and others. Support and training services are also offered.
https://www.tobiidynavox.com/
A US-based supplier of speech recognition software and dictation and transcription equipment, the company has been in business since 1990. Its products include dictation equipment, digital voice recorders, speech recognition software, dictation microphones, mobile dictation equipment, transcription equipment, conference recording equipment, and accessories, from a variety of manufacturers, sorted by product category, manufacturer, or industry. Technical support is available.
https://www.totalvoicetech.com/
Specializing in speech synthesis, Voicery works with businesses, offering custom voices with accents and emotions, with offline, on-premise, cloud-based, or hybrid deployment, as well as real-time streaming audio, audio adjustments with SSML markup, and synthesized content seamlessly embedded in prerecorded audio. The process is outlined on the site. A starter package gives access to its library of voices, with per-character charges, with an enterprise package for custom orders.
https://www.voicery.com/