Speech recognition

playlist image — Speech recognition

https://www.google.com/search?rlz=1C5GCEA_enLK882LK883&ei=5S7rX7WrN8jw9QOh-4KYAQ&q=chemical+tests&oq=chemical+tests&gs_lcp=CgZwc3ktYWIQAzIFCAAQyQMyAggAMgIIADICCAAyAggAMgIIADICCAAyAggAMgIIADICCAA6CAgAEMkDEJECOgUIABCRAjoKCAAQsQMQgwEQQzoICAAQsQMQgwE6BQgAELEDOgIILjoECAAQQzoOCC4QsQMQgwEQxwEQrwE6BwguELEDEEM6BQguELEDOgoIABCxAxDJAxBDOgcIABCxAxBDOggILhDHARCvAToGCAAQFhAeOggIABAWEAoQHlDs-esXWIea7BdgrZvsF2gCcAF4AYABoQKIAeARkgEGMC4xMy4ymAEAoAEBqgEHZ3dzLXdperABAMABAQ&sclient=psy-ab&ved=0ahUKEwj1se7ipfPtAhVIeH0KHaG9ABMQ4dUDCA0&uact=5
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition, or speech to text (STT).

+ view more

Posted by

Rupali Jagtap

{"id"=>1298, "level_no"=>1, "level_title"=>"Introduction ", "notes"=>"Speech recognition applications include <a title=\"Voice user interface\" href=\"https://en.wikipedia.org/wiki/Voice_user_interface\">voice user interfaces</a> such as voice dialing (e.g. \"call home\"), call routing (e.g. \"I would like to make a collect call\"), <a class=\"mw-redirect\" title=\"Domotic\" href=\"https://en.wikipedia.org/wiki/Domotic\">domotic</a> appliance control, search keywords (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), determining speaker characteristics,<a href=\"https://en.wikipedia.org/wiki/Speech_recognition#cite_note-2\">[2]</a> speech-to-text processing (e.g., <a title=\"Word processor\" href=\"https://en.wikipedia.org/wiki/Word_processor\">word processors</a> or <a title=\"Email\" href=\"https://en.wikipedia.org/wiki/Email\">emails</a>), and <a title=\"Aircraft\" href=\"https://en.wikipedia.org/wiki/Aircraft\">aircraft</a> (usually termed <a title=\"Direct voice input\" href=\"https://en.wikipedia.org/wiki/Direct_voice_input\">direct voice input</a>).\nThe term voice recognition or <a title=\"Speaker recognition\" href=\"https://en.wikipedia.org/wiki/Speaker_recognition\">speaker identification</a> refers to identifying the speaker, rather than what they are saying. <a title=\"Speaker recognition\" href=\"https://en.wikipedia.org/wiki/Speaker_recognition\">Recognizing the speaker</a> can simplify the task of translating speech in systems that have been trained on a specific person's voice or it can be used to authenticate or verify the identity of a speaker as part of a security process.", "challenge_id"=>587, "created_at"=>Wed, 27 Jan 2021 07:20:31.002157000 UTC +00:00, "updated_at"=>Wed, 27 Jan 2021 07:20:31.002157000 UTC +00:00}

Description

Speech recognition applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search keywords (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), determining speaker characteristics,^[2] speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed direct voice input).

The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice or it can be used to authenticate or verify the identity of a speaker as part of a security process.

Description

Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach.

Dynamic time warping is an algorithm for measuring similarity between two sequences that may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly, or even if there were accelerations and deceleration during the course of one observation. DTW has been applied to video, audio, and graphics – indeed, any data that can be turned into a linear representation can be analyzed with DTW.

A well-known application has been automatic speech recognition, to cope with different speaking speeds. In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g., time series) with certain restrictions. That is, the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models.

Description

Neural networks emerged as an attractive acoustic modeling approach in ASR. Since then, neural networks have been used in many aspects of speech recognition such as phoneme classification, phoneme classification through multi-objective evolutionary algorithms, isolated word recognition, audiovisual speech recognition, audiovisual speaker recognition, and speaker adaptation.

Neural networks make fewer explicit assumptions about feature statistical properties than HMMs and have several qualities making them attractive recognition models for speech recognition. When used to estimate the probabilities of a speech feature segment, neural networks allow discriminative training in a natural and efficient manner. However, in spite of their effectiveness in classifying short-time units such as individual phonemes and isolated words, early neural networks were rarely successful for continuous recognition tasks because of their limited ability to model temporal dependencies.

Badge Description

speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent".