Home Project-material DESIGN AND IMPLEMENTATION OF TEXT TO SPEECH AUDIO SYSTEM

DESIGN AND IMPLEMENTATION OF TEXT TO SPEECH AUDIO SYSTEM

Dept: COMPUTER SCIENCE File: Word(doc) Chapters: 1-5 Views: 19

Abstract

Design and Implementation of a Text-to-speech/audio system is the generation of synthesized speech from text. The purpose of this study is to make synthesized speech as intelligible, natural and pleasant to listen, as human speech. Speech is the primary means of communication between people. During synthesis very small segments of recorded human speech are concatenated together to produce the synthesized speech. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. The main aim of this study is to develop and implement a Text-to-Speech audio system for the disabled people. The system was developed using JAVA and MySQL programming language. The methodology adopted for this research work is the Structured System Analysis and Design Methodology (SSADM), which was chosen by the researcher due to its numerous benefits. The program was finally completed, tested and works fine to correct the errors in the existing system. As a

CHAPTER ONE

INTRODUCTION

1.1 

Background of the Study

Text-to-speech system (TTS) is the automatic conversion of a text into speech that resembles, as closely as possible, a native speaker of the language reading that text. Text-to-speech synthesizer (TTS) is the technology which lets computer speak to you. The TTS system gets the text as the input and then a computer algorithm which called TTS engine analyses the text, pre-processes the text and synthesizes the speech with some mathematical models. The TTS engine usually generates sound data in an audio format as the output (Dutoit, 2013).

The text-to-speech (TTS) synthesis procedure consists of two main phases. The first is text analysis, where the input text is transcribed into a phonetic or some other linguistic representation, and the second one is the generation of speech waveforms, where the output is produced from this phonetic and prosodic information. These two phases are usually called high and low-level synthesis (Suendermann & Black, 2010). A simplified version of this procedure is presented in figure 1 below. The input text might be for example data from a word processor, standard ASCII from e-mail, a mobile text-message, or scanned text from a newspaper. The character string is then pre-processed and analyzed into phonetic representation which is usually a string of phonemes with some additional information for correct intonation, duration, and stress. Speech sound is finally generated with the low-level synthesizer by the information from high-level one. The artificial production of speech-like sounds has a long history, with documented mechanical attempts dating to the eighteenth century (Allen & Klatt, 2017).

Voice/speech system is a field of computer science that deals with designing computer systems that synthesize written text. It is a technology that allows a computer to convert a written text into speech via a microphone or telephone (Allen & Klatt, 2017). As an emerging technology, not all developers are familiar with speech technology. While the basic functions of both speech synthesis and speech recognition takes only minutes to understand, there are subtle and powerful capabilities provided by computerized speech that developers will want to understand and utilize (Rubin & Baer, 2011).

Automatic speech synthesis is one of the fastest developing fields in the framework of speech science and engineering. As the new generation of computing technology, it comes as the next major innovation in man machine interaction, after functionality of Speech recognition (TTS), supporting Interactive Voice Response (IVR) systems.

The basic idea of text-to-speech (TTS) technology is to convert written input to spoken output by generating synthetic speech. There are several ways of performing speech synthesis:

1. Simple voice recording and playing on demand;

2. Splitting of speech into 30-50 phonemes (basic linguistic units) and their re-assembly in a fluent speech pattern;

3. The use of approximately 400 diaphones (splitting of phrases at the centre of the phonemes and not at the transition).

The most important qualities of modern speech synthesis systems are its naturalness and intelligibility. By naturalness we mean how closely the synthesized speech resembles real human speech. Intelligibility, on the other hand, describes the ease with which the speech is understood. The maximization of these two criteria is the main development goal in the TTS field (Suendermann and Black, 2010).



Recent Project Materials

Abstract migration norms is defined as all policies and laws that govern the movement of people from one cou...
Word(doc) 1-5 46 Read More
Abstract A study on the removal of lead from soil samples in zamfara using modified kaolinite clay was studi...
Word(doc) 1-5 12 Read More
Abstract The study examines the impact of Corona Virus on small and medium scale enterprises in Nigeria. CO...
Word(doc) 1-5 16 Read More
Abstract Weed flora of different management techniques under different cropping systems have been reported b...
Word(doc) 1-5 6 Read More
View More Topics

Browse by Departments