SlideShare une entreprise Scribd logo
1  sur  61
3.0  Speech Synthesis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A E I O U
Von Kempelen’s speaking machine (1791) (Wheatstone’s reconstruction
Voder (Voice Operation Demonstrator) 1939
3.0  Speech Synthesis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Audio clips of synthetic speech illustrating the history of the art and technology of  synthetically produced human speech. http://www.cs.indiana.edu/rhythmsp/ASA/Contents.html http://www.cs.indiana.edu/rhythmsp/ASA/highlights.html http://www. humnet . ucla . edu / humnet /linguistics/ faciliti /demos/ vocalfolds/vocalfolds.htm
 
This may also be referred to as synthesis-by-rule although rules of  one sort or another are common to all synthesis systems. For a  formant synthesis system the output of the high-level component  typically consists of a sequence of allophones together with their  duration and pitch, e.g. DH 7 34   I 5 34   S 8 33 Formant Synthesis Duration measured in 10ms frames Pitch coded into the range 1-63
The low-level component uses this input to provide a sequence of  frames, each frame containing a set of parameters referring to  formant frequencies, formant amplitudes, voicing, fundamental pitch,  etc., e.g. Fn  alf  f1  a1  f2  a2  f3  a3  ahf  s  f0 250  33  280  33  1300  32  2680  34  41  36  34 250  37  280  37  1300  36  2680  38  45  36  34 250  40  280  40  1300  39  2680  41  48  36  34 250  42  280  42  1300  41  2680  43  50  36  34 This information is then fed into a formant synthesiser which uses it  to generate the appropriate audio output. The formant synthesiser may  be implemented in hardware or software.  An example of a formant synthesis system is DECTALK.
RZ R1 R2 R3 R4 RN R1 R2 R3 R4 R5 R6 Glottal   filter Impulse   train Random   number s LP   filter A1 A2 A3 A4 A5 A6 Pre- emphasis + Parallel R5 Synthetic   speech   output Cascade Klatt Synthesiser
Klatt synthesiser A combined serial/parallel formant synthesiser. A serial, or cascade, synthesiser is a better model for the production  of vowel and vowel-like sounds whereas a parallel synthesiser is  better suited to producing nasals, fricatives and stops. The serial synthesiser specifies the formant centre frequencies and  bandwidths.  The parallel synthesiser specifies the formant levels (peak amplitudes)  also.
Waveform Concatenation Synthesis With this system the low-level component generates a speech output  file by concatenating units of previously recorded speech. Information about the duration and pitch of these units is again supplied by the  high-level component. The size of the units is clearly an important  consideration and both in terms of amount of storage required and the  difficulties involved with joining them together (more about this later).  An example of a waveform concatenation synthesis system is the Lernout and Hauspie TTS system.
cer-    -tain-   -ly (ser-)  (-tan-)  (-lee)
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Stored Speech Units
words How many?  ~ 300,000 Perhaps 2000 for most frequently used words. Formant synthesis 1 word ~ 0.5sec 12bytes per 10msec frame 1 word = 0.5/.010 frames = 50 frames = 600 bytes 300,000 words ~180Mbyte Waveform concatenation 1 word ~ 0.5 sec Sampling rate = 16Khz 1 word = 8K samples = 16K bytes at 2 bytes per sample 300,000 words = 4.8Gbyte
morphemes Basic meaningful units that make up words, essentially roots, prefices, suffices e.g. sail, travel, -ed, -s => sail, travel sailed, travelled sails, travels   ~10,000-30,000 entries   Formant synthesis 30,000 entries ~ 10-15Mbyte   Waveform concatenation 3 0,000 entries ~ 120Mbyte
syllables How many?  ~ 10,000?? Formant synthesis 1 syllable ~ 0.3sec 12bytes per 10msec frame 1 syllable = 0.3/.010 frames = 30 frames = 360 bytes 10,000 syllables ~3.6Mbyte Waveform concatenation 1 syllable ~ 0.3 sec Sampling rate = 16Khz 1 syllable = 4.8K samples = 9.6K bytes at 2 bytes per sample 10,000 syllables = 96Mbyte
phonemes Formant synthesis . 1 phoneme ~ 10 frames = 120 bytes 40 phonemes ~5Kbytes Actually need about 70-80 allophones, giving ~ 10Kbytes How many? About 40 for English. Waveform concatenation 1 phoneme ~ 100msec = 0.1sec = 1.6K samples = 3.2K bytes Total ~ 256K
demi-syllables s u m t ie m z  (sometimes) These are units of speech obtained by making cuts in the middle of  the vowel part of the syllable. The reason for doing this is that coarticulation effects are minimal  in the middle of the vowel. The number of demi-syllables is about 4-5000 made up from about  1500 initial demisyllables and 3000 final demisyllables. s u m  -  t ie m z
cer-    -tain-   -ly (ser-)  (-tan-)  (-lee) Demi-syllables
diphones k  u  n  uu  (canoe) These are units of speech obtained by going from the middle of one phone to the middle of another. The reason for doing this is that coarticulation effects are minimal  in the middle of the sound. Theoretically there are about 40x40 = 100 diphones but in practice the number is about 1200. qk | ku | un | nuu | uuq q = silence
cer-    -tain-   -ly (ser-)  (-tan-)  (-lee) Diphones
Words versus phonemes Trade off between storage and processing The larger the unit then the more storage space it requires but in compensation less effort is required in joining the units  together.
Pronunciation  Task Input Text Phonemic Text Broad Phonetic Text Affix Tables Pronunciation Rules Restricted Text Phonemic Text Conversion  Task Exceptions Dictionary Restricted Text Prosody Task Prosody Table Broad Phonetic Text Narrow Phonetic Representation Lower Phonetic  Task Lower Phonetic Table Narrow Phonetic Representation Control Parameters Phonotactic Tables Speech Allophone Task Allophonic Rules
Conversion Task Pronunciation Task Prosody Task Lower Phonetic Task Input text Exceptions Dictionary Phonotactic Tables Affix Tables Pronunciation Rules Prosody Table Lower Phonetic Table The JSRU TTS System Speech Output
This task converts unrestricted text to restricted text. Unrestricted  text consists of non-English words, abbreviations (e.g. Dr.), unusual pronunciation, words to be spelt (e.g. BBC), etc. This task also gets  the phonetic form of words that are in the dictionary and deletes any redundant white space. The date is 1/10/97.   …the first of October, ninety ninety seven. I can’t do it.   …I cant do it.  Or   I cannot do it The price is £23.99.   … twenty three pounds ninety nine pence St. George St.   … Saint George Street   well-behaved  …well behaved CONVERSION TASK
PRONUNCIATION TASK ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],4. Perform pronunciation rules on the stem. ,[object Object],[object Object],6. Perform stress assignment by applying the stress rules. 7. Divide the word into syllables.
1.  Look up the entire word in the dictionary. If found then exit from pronunciation task. sentence “sen-tans sensitive “sen’si-tiv transport “traans-poat cough “kof   Conversion of words to phonemes is far from regular and is very context dependent in English. Some languages are better than others in this respect.   George Bernard Shaw is quoted as saying that  fish   is spelt GHOTI ??   The advantage of using a dictionary is that it can include information on syllables, stress, syntactic types, etc. This is much more easily obtained using a lookup table than  by using algorithms. For example, stress markers and syllable boundaries are more easily identified. But, of course, there will always be some words which are not in the dictionary, however large one makes it.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example of function for -s removal  (not definitive) lastchar = S if (prevchar=S) then return else if (prevtwochars = IO) then remove S else if (prevchar=vowel but not E) then return else if (prevchar=E) then remove S if (prevprevchar=I) then replace IE by Y else if (prevprevchar=H) and (prevprevprevchar    T) then delete E else if ((prevprevchar=S) and (prevprevprevchar=S)) or ((prevprevchar=Z) and (prevprevprevchar=Z)) then delete E SS IOS IES -> Y HES  but not THES SSES/ZZES loss folios parties batches/bathes losses/buzzes AS alias
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The JSRU system has a list of 39 suffixes e.g.   ED,NT, FUL, OUS, ALIC, IBLE, EN, etc.   After a suffix has been removed, the stem is checked to see  if  it is long enough(at least 3 letters) and whether the final consonant cluster is pronounc e able.    If the suffix could have replaced a final 'e' then this 'e' must be added. The algorithm for deciding whether or not an 'e' has been removed is not simple. e.g. alternation  -> alternate + ion i nteraction   -> interact + ion   Rule here is that if stem ends in vowel + consonant then an 'e' should be added. What about taxable? OK when 'x' replaced by 'ks'.Having removed a suffix, the reduced word is looked up in the dictionary. If it ’ s not found then an attempt is made to remove another suffix.   e.g.  wond - er - ful - ly
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
For example, {"A", "", "consY>", "ai"}  lazy {"OUGH", "", "", "ou"}  bough  {"AUGH", "", "", "aw"}  daughter  {"C", "", "E", "S"}  lace  {"PH", "", "", "F"}  phase What about laughter? What about rough, cough, though?
The order of the rules is important. e.g.  Special procedures need to be called in some cases: {"Q", "", "", "KW"}  free q ent     free kw ent  {"E", "cons", "Q", "EE"}  fr e qent     fr ee qent  {"QU", "", "", "Q"}  fre qu ent     fre q ent N.B. Every rule is tried at every position in the word. Hence, time taken  depends very much on the number of rules. {"vowel", "", ""consE", "@001"}  Magic 'e' procedure {"E", "", ">", "@002}  final 'e' procedure  {"@003", "", "", ""}  double letter procedure
5.  Replace the prefixes and suffixes – in their phonetic form, of course, w hich can be obtained from lookup tables.   This may result in some adjustments which are handled by the second  set of rules.  e.g.   {"c", "", "i", "s"}  precious  {"c", "", "", "k"}  practically  {"ig", "", "n-", "ie"}  assignment
6.  Perform stress assignment by applying the stress rules – these can b e quite complicated. Make adjustments to final pronunciation  e. g. reduction of some unstressed vowels.   The stress rules are based on MIT rules and are quite complicated. (From text to speech: the MITalk system  Allen et al) In the first  (cyclic)  phase several rules are applied in sequence first to the stem  then to the stem +  su ffixes  taken one at a time P refixes  are considered as part of the stem     Some affixes(e.g. -ING) do not affect the stress pattern so rules are  omitted, while others(e.g. -ION) force stress onto a particular vowel,  usually the one before the affix. In the second  (non-cyclic)  phase one vowel is selected for mainstress  and all others are reduced to secondary or no stress.
Stress Rules Main Stress Rule (cyclic) 1.  V -> [1-stress] / X – C 0  {[short v] C 0 1  / V} {[short V] C 0  / V} where X contains all prefixes and the symbol ‘-’ indicates the position of the vowel to be stressed C 0 1  matches zero or one consonant C 0  matches any number of consonants (including none) {..} denotes a list of alternative patterns separated by slashes /. Assign 1-stress (primary stress) to the vowel in a syllable which precedes a weak syllable followed by a morph-final syllable containing a short vowel and zero or more consonants. e.g. difficult - > d ” i  f  i k  a l t X – C 0  {..}  {…}
V -> [1-stress] / X – C 0  {[short v] C 0 1  / V} {[short V] C 0  / V} where X contains all prefixes and the symbol ‘-’ indicates the position  of the vowel to be stressed Assign 1-stress to the vowel in a syllable preceding a vowel followed  by a morph-final syllable containing a short vowel and zero or more  consonants. e.g.  secretariat ->  sekret“eir  ee  aat X – C 0  {..}  {..} Assign 1-stress to the vowel in a syllable preceding a vowel followed by a morph-final  vowel. e.g.  oratorio ->  orat“oar  ee  oa X – C 0  {..}  {..}
2.  V -> [1-stress] / X – C 0  {[short V] C 0  / V} where X contains all prefixes Assign 1-stress to the vowel in a syllable preceding a short vowel and zero or more consonants e.g.  edit -> “ed  it  bitumen -> bity”uum  en agenda ->  aj”en  da 3.  V -> [1-stress] / X – C 0  where X contains all prefixes Assign 1-stress to the vowel in the last syllable e.g.  stand  -> st”aand parole -> paar”oal
Stressed Syllable Rule (cyclic) Alternating Stress Rule (cyclic) Destressing Rule (non-cyclic) Compound Stress Rule (non-cyclic) Strong First Syllable Rule (non-cyclic) Cursory Rule (non-cyclic) Vowel Reduction Rule (non-cyclic)
After the stress assignment a third set of rules is applied. e.g. {"41r", "", "cons", "41"}  far gone versus far away (141 = 'schwa') {"ir", "", "cons", "er"}    dirt  versus  direct    and some unstressed vowels are reduced.e.g.  aa    a ai    i  u    a  o    a    bottom => "bot-am
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
This task adds intonation and timing to the phonetic text.  The task processes complete breath groups.  The output consists of a list of phonemes with corresponding  pitch and duration. PROSODY TASK
In No vem ber the  reg ion’s  wea ther was un us ually  dry . Do  you want to  tra vel to  Lon don? Who  is the  Prime   Min ister of the Ba ha mas? Lift  the  safe ty  cov er and  press  the  red   but ton.
[object Object],[object Object],[object Object],Word stress One syllable in a polysyllabic word is given more prominence than  the others. All words have one syllable with  primary stress . electricity  university  tomorrow  controversy    icecream  elec tri city uni ver sity to mor row con troversy ice cream (or ice cream ?)
Examples What  time  is the next train to London? What time is the next train to  London ? What time is the next  train  to London? There’s John cycling down the road. Sentence stress This refers to the way in which one word is singled out in a  sentence as the  focus , or  nuclear stress
Rhythm Rhythm is about how we time the delivery of the sentence. In English we are supposed to time the stressed syllables so that they occur equidistantly in time. English is said to be a  stress-timed  language The  stressed  syll–a-bles-in  Eng-lish  o -  ccur  e - qui-  dis-tant-ly  in  time #  #  #  #  #  # #  #  #  #  #  #  #  #  #  #  #  #  #  #  #  # Les encyclopedies electroniques sont pleines d’informations French is said to be a  syllable-timed  language D Crystal 1995 clo pe die se lec tro niques cy sen Le sont pleines D’in for ma tions
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],JSRU  PROSODY TASK Utterance Tone group 1 Tone group 2 Tone group N …… . /u  u  u  / S  u  u  S  u  u  ……/ S max /  u  S  u/  u  / Head Pre-head Nucleus Tail
Projects represent a substantial part of your marks and you should  spend some time choosing a project which will show you in your  best light. Projects represent  //  a substantial part  //   of your marks  //  and you should  spend  //  some time choosing  //   a project  //  which will show  //  you  in your  best light. “ Pro-jects  rep-res- ” ent  //  a  sub- ” stant-ial  “ part  //  of  your  “ marks  //  and  you  should  “ spend  //  some  “ time  “ choos-ing  //  a  “ pro-ject  //   which  will  “ show  //  you  in  your  “ best  “ light. Unstressed syllable weight 1 Secondary stress   “  2 Primary stress   “  3 Emphatic stress   “  4
Tone groups are amalgamated such that all tone groups  have a weight >=9. “ Pro-jects  ‘ rep-res- ” ent  //  a  ‘sub- ” stant-ial  “ part  //  of  your  “ marks  //  and  you  should  “ spend  //  some  “ time  “ choos-ing  //  a  “ pro-ject  //   which  will  “ show  //  you  in  your  “ best  “ light. Pitch contours are worked out for syllables according to their type,  and the tone group type ,[object Object],[object Object],[object Object],[object Object],[object Object],10 10 5 6 8 5 5 9
See overheads!!!!
Baseline When all syllable patterns have been calculated they are  superimposed on a falling baseline , , The value at the end of the phrase (indicated by the  ,) is half that at the beginning,  and starting value for next phrase is 1.2 times the  final value of previous one
Timing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
This task converts the output from PROSODY TASK into control  parameters for the hardware synthesiser.  LOWER PHONETIC  TASK T he table for each phonetic element contains information relating to how transitions between target values are calculated. For each parameter in the table there are entries for   Its target value The proportion of the target of the  dominated  element used in deriving the  boundary value     A fixed contribution to the boundary value   Transition duration within the  dominant  element       Transition duration within the  dominated  element 3 4 External Duration 2 3 50 950 1900 E 8 4 50 380 760 W Rank Internal Duration Proportion Fixed Contribution Target
It sends a frame of  parameters to the synthesiser every 10msecs. The boundary value is calculated as  Fixed Contribution of  Dominant element  + Target of  dominated   element  *  proportion of  dominant element   i .e. 380 + 1900 * 0.5 = 1330 760 Hz Target value for W 1330 Hz 1900 Hz Target value for E Internal duration 4 frames External duration 4 frames 2 3 3 50 950 1900 E 8 4 4 50 380 760 W Rank External Duration Internal Duration Proportion Fixed Contribution Target
DH A W E 1300 1360 1090 1330 1420 760 Boundary Target Values Values DH|A   650 + 1420 * 0.5 = 1360   DH   1330 A|W   380 + 1420 * 0.5 = 1090   A   1420 W|E   380 + 1900 * 0.5 = 1330   W   760 7 4 8 12 6 11 2 8 2 6 Rank 5 6 3 4 3 5 External Duration 2 50 860 1720 Z 0 50 440 880 L 3 50 950 1900 E 4 50 380 760 W 3 50 710 1420 A 2 50 650 1300 DH Internal Duration Proportion Fixed Contribution Target
Tomorrow will be starting off grey and rather murky. :"to-ma-rou  /wil  /bey  :"star-ting  /of  :"grai  /and  :"raa-dha  :"mer-kee.
T  9  37 TY  2  37 TZ  3  37 O  10  37 M  9  35 A  4  35 R  4  34 OU  7  34 OB  7  33 W  4  34 I  5  34 LP  4  32 B  5  33 BY  1  33 EY  5  33 S  11  36 T  6  35 TY  2  35 AR  14  35 T  6  31 TY  2  31 I  5  31 NG  9  30 O  6  32 F  6  31 G  6  34 GY  2  34 R  2  34 AI  12  34 AJ  9  32 A  4  32 N  6  31 D  2  30 DY  1  30 R  7  33 AA  10  32 DH  7  31 A  5  30 M  11  32 ER  14  32 K  6  25 KY  2  23 I  10  23 QQ  51  22 Q  42  21
* Output from SYNTH: * fn  alf  f1  a1  f2  a2  f3  a3  ahf  s  f0 *  T  9  37  0  Frame  0 250  6  190  6  1780  6  2680  6  6  1  37 250  10  190  10  1780  10  2680  10  10  1  37 250  13  190  13  1780  13  2680  13  13  1  37 250  15  190  15  1780  15  2680  15  15  1  37 250  15  190  15  1780  15  2680  15  15  1  37 250  15  190  15  1780  15  2680  15  15  1  37 250  13  190  13  1780  13  2680  13  13  1  37 250  10  190  10  1780  10  2680  10  10  1  37 250  6  190  6  1780  6  2680  6  6  1  37 * TY  2  37  0  Frame  9 250  15  190  15  1780  40  2680  51  58  1  37 250  15  190  15  1780  40  2680  51  58  1  37 * TZ  3  37  0  Frame  11 250  6  190  6  1780  23  2680  28  40  1  37 250  10  190  10  1780  27  2680  32  44  1  37
BT1 Spruce 2 L&H 2 AT&T 2 Festival 2 BT 2 Mbrola 2 L&H 1 Spruce 1 Festival 1 AT&T 1 Mbrola 1

Contenu connexe

En vedette

En vedette (10)

Speech processing
Speech processingSpeech processing
Speech processing
 
Unit – 2
Unit – 2Unit – 2
Unit – 2
 
8251 USART
8251 USART8251 USART
8251 USART
 
Cryptography
CryptographyCryptography
Cryptography
 
Encryption
EncryptionEncryption
Encryption
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Introduction to computer hardware
Introduction to computer hardwareIntroduction to computer hardware
Introduction to computer hardware
 
Computer hardware component. ppt
Computer hardware component. pptComputer hardware component. ppt
Computer hardware component. ppt
 
Encryption presentation final
Encryption presentation finalEncryption presentation final
Encryption presentation final
 
Data communication and network Chapter -1
Data communication and network Chapter -1Data communication and network Chapter -1
Data communication and network Chapter -1
 

Similaire à Coms30123 Synthesis 3 Projector

Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
speech recognition and removal of disfluencies
speech recognition and removal of disfluenciesspeech recognition and removal of disfluencies
speech recognition and removal of disfluenciesAnkit Sharma
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONIRJET Journal
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 

Similaire à Coms30123 Synthesis 3 Projector (20)

An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
sadf
sadfsadf
sadf
 
speech recognition and removal of disfluencies
speech recognition and removal of disfluenciesspeech recognition and removal of disfluencies
speech recognition and removal of disfluencies
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Phonology
PhonologyPhonology
Phonology
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATIONHUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
HUFFMAN CODING ALGORITHM BASED ADAPTIVE NOISE CANCELLATION
 
Confirmation Talk
Confirmation TalkConfirmation Talk
Confirmation Talk
 
Confirmation Talk
Confirmation TalkConfirmation Talk
Confirmation Talk
 
Linguistics
LinguisticsLinguistics
Linguistics
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 

Plus de Dr. Cupid Lucid

Teaching English for specific purposes
Teaching English for specific purposesTeaching English for specific purposes
Teaching English for specific purposesDr. Cupid Lucid
 
Science and approaches of science
Science and approaches of scienceScience and approaches of science
Science and approaches of scienceDr. Cupid Lucid
 
Content Analysis vs secondary analysis
Content Analysis vs secondary analysisContent Analysis vs secondary analysis
Content Analysis vs secondary analysisDr. Cupid Lucid
 
Basic elements of scientific concepts
Basic elements of scientific  conceptsBasic elements of scientific  concepts
Basic elements of scientific conceptsDr. Cupid Lucid
 
Types of educational_research
Types of educational_researchTypes of educational_research
Types of educational_researchDr. Cupid Lucid
 
History of english literature sajid
History of english literature sajidHistory of english literature sajid
History of english literature sajidDr. Cupid Lucid
 
A guide to_writing_research_papers
A guide to_writing_research_papersA guide to_writing_research_papers
A guide to_writing_research_papersDr. Cupid Lucid
 
101 masterpieces of literature in english
101 masterpieces of literature in english101 masterpieces of literature in english
101 masterpieces of literature in englishDr. Cupid Lucid
 
National Curriculum of English Grade I-XII
National Curriculum of English Grade I-XIINational Curriculum of English Grade I-XII
National Curriculum of English Grade I-XIIDr. Cupid Lucid
 
The Linguistic Variables
The Linguistic VariablesThe Linguistic Variables
The Linguistic VariablesDr. Cupid Lucid
 
Research Proposal Methoo
Research Proposal MethooResearch Proposal Methoo
Research Proposal MethooDr. Cupid Lucid
 

Plus de Dr. Cupid Lucid (20)

Teaching English for specific purposes
Teaching English for specific purposesTeaching English for specific purposes
Teaching English for specific purposes
 
Science and approaches of science
Science and approaches of scienceScience and approaches of science
Science and approaches of science
 
Content Analysis vs secondary analysis
Content Analysis vs secondary analysisContent Analysis vs secondary analysis
Content Analysis vs secondary analysis
 
Basic elements of scientific concepts
Basic elements of scientific  conceptsBasic elements of scientific  concepts
Basic elements of scientific concepts
 
Observational methods
Observational methodsObservational methods
Observational methods
 
Types of educational_research
Types of educational_researchTypes of educational_research
Types of educational_research
 
Types of research
Types of researchTypes of research
Types of research
 
Types of Research
Types of ResearchTypes of Research
Types of Research
 
Literature what is it
Literature what is itLiterature what is it
Literature what is it
 
History of english literature sajid
History of english literature sajidHistory of english literature sajid
History of english literature sajid
 
A guide to_writing_research_papers
A guide to_writing_research_papersA guide to_writing_research_papers
A guide to_writing_research_papers
 
What isliterature
What isliteratureWhat isliterature
What isliterature
 
101 masterpieces of literature in english
101 masterpieces of literature in english101 masterpieces of literature in english
101 masterpieces of literature in english
 
National Curriculum of English Grade I-XII
National Curriculum of English Grade I-XIINational Curriculum of English Grade I-XII
National Curriculum of English Grade I-XII
 
The Linguistic Variables
The Linguistic VariablesThe Linguistic Variables
The Linguistic Variables
 
Term Paper
Term PaperTerm Paper
Term Paper
 
Syllabus Designing
Syllabus DesigningSyllabus Designing
Syllabus Designing
 
Semiotics Final
Semiotics FinalSemiotics Final
Semiotics Final
 
Research Proposal Methoo
Research Proposal MethooResearch Proposal Methoo
Research Proposal Methoo
 
Questionnaire
QuestionnaireQuestionnaire
Questionnaire
 

Dernier

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Dernier (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Coms30123 Synthesis 3 Projector

  • 1.
  • 2. Von Kempelen’s speaking machine (1791) (Wheatstone’s reconstruction
  • 3. Voder (Voice Operation Demonstrator) 1939
  • 4.
  • 5. Audio clips of synthetic speech illustrating the history of the art and technology of synthetically produced human speech. http://www.cs.indiana.edu/rhythmsp/ASA/Contents.html http://www.cs.indiana.edu/rhythmsp/ASA/highlights.html http://www. humnet . ucla . edu / humnet /linguistics/ faciliti /demos/ vocalfolds/vocalfolds.htm
  • 6.  
  • 7. This may also be referred to as synthesis-by-rule although rules of one sort or another are common to all synthesis systems. For a formant synthesis system the output of the high-level component typically consists of a sequence of allophones together with their duration and pitch, e.g. DH 7 34 I 5 34 S 8 33 Formant Synthesis Duration measured in 10ms frames Pitch coded into the range 1-63
  • 8. The low-level component uses this input to provide a sequence of frames, each frame containing a set of parameters referring to formant frequencies, formant amplitudes, voicing, fundamental pitch, etc., e.g. Fn alf f1 a1 f2 a2 f3 a3 ahf s f0 250 33 280 33 1300 32 2680 34 41 36 34 250 37 280 37 1300 36 2680 38 45 36 34 250 40 280 40 1300 39 2680 41 48 36 34 250 42 280 42 1300 41 2680 43 50 36 34 This information is then fed into a formant synthesiser which uses it to generate the appropriate audio output. The formant synthesiser may be implemented in hardware or software. An example of a formant synthesis system is DECTALK.
  • 9. RZ R1 R2 R3 R4 RN R1 R2 R3 R4 R5 R6 Glottal filter Impulse train Random number s LP filter A1 A2 A3 A4 A5 A6 Pre- emphasis + Parallel R5 Synthetic speech output Cascade Klatt Synthesiser
  • 10. Klatt synthesiser A combined serial/parallel formant synthesiser. A serial, or cascade, synthesiser is a better model for the production of vowel and vowel-like sounds whereas a parallel synthesiser is better suited to producing nasals, fricatives and stops. The serial synthesiser specifies the formant centre frequencies and bandwidths. The parallel synthesiser specifies the formant levels (peak amplitudes) also.
  • 11. Waveform Concatenation Synthesis With this system the low-level component generates a speech output file by concatenating units of previously recorded speech. Information about the duration and pitch of these units is again supplied by the high-level component. The size of the units is clearly an important consideration and both in terms of amount of storage required and the difficulties involved with joining them together (more about this later). An example of a waveform concatenation synthesis system is the Lernout and Hauspie TTS system.
  • 12. cer- -tain- -ly (ser-) (-tan-) (-lee)
  • 13.
  • 14. words How many? ~ 300,000 Perhaps 2000 for most frequently used words. Formant synthesis 1 word ~ 0.5sec 12bytes per 10msec frame 1 word = 0.5/.010 frames = 50 frames = 600 bytes 300,000 words ~180Mbyte Waveform concatenation 1 word ~ 0.5 sec Sampling rate = 16Khz 1 word = 8K samples = 16K bytes at 2 bytes per sample 300,000 words = 4.8Gbyte
  • 15. morphemes Basic meaningful units that make up words, essentially roots, prefices, suffices e.g. sail, travel, -ed, -s => sail, travel sailed, travelled sails, travels   ~10,000-30,000 entries Formant synthesis 30,000 entries ~ 10-15Mbyte Waveform concatenation 3 0,000 entries ~ 120Mbyte
  • 16. syllables How many? ~ 10,000?? Formant synthesis 1 syllable ~ 0.3sec 12bytes per 10msec frame 1 syllable = 0.3/.010 frames = 30 frames = 360 bytes 10,000 syllables ~3.6Mbyte Waveform concatenation 1 syllable ~ 0.3 sec Sampling rate = 16Khz 1 syllable = 4.8K samples = 9.6K bytes at 2 bytes per sample 10,000 syllables = 96Mbyte
  • 17. phonemes Formant synthesis . 1 phoneme ~ 10 frames = 120 bytes 40 phonemes ~5Kbytes Actually need about 70-80 allophones, giving ~ 10Kbytes How many? About 40 for English. Waveform concatenation 1 phoneme ~ 100msec = 0.1sec = 1.6K samples = 3.2K bytes Total ~ 256K
  • 18. demi-syllables s u m t ie m z (sometimes) These are units of speech obtained by making cuts in the middle of the vowel part of the syllable. The reason for doing this is that coarticulation effects are minimal in the middle of the vowel. The number of demi-syllables is about 4-5000 made up from about 1500 initial demisyllables and 3000 final demisyllables. s u m - t ie m z
  • 19. cer- -tain- -ly (ser-) (-tan-) (-lee) Demi-syllables
  • 20. diphones k u n uu (canoe) These are units of speech obtained by going from the middle of one phone to the middle of another. The reason for doing this is that coarticulation effects are minimal in the middle of the sound. Theoretically there are about 40x40 = 100 diphones but in practice the number is about 1200. qk | ku | un | nuu | uuq q = silence
  • 21. cer- -tain- -ly (ser-) (-tan-) (-lee) Diphones
  • 22. Words versus phonemes Trade off between storage and processing The larger the unit then the more storage space it requires but in compensation less effort is required in joining the units together.
  • 23. Pronunciation Task Input Text Phonemic Text Broad Phonetic Text Affix Tables Pronunciation Rules Restricted Text Phonemic Text Conversion Task Exceptions Dictionary Restricted Text Prosody Task Prosody Table Broad Phonetic Text Narrow Phonetic Representation Lower Phonetic Task Lower Phonetic Table Narrow Phonetic Representation Control Parameters Phonotactic Tables Speech Allophone Task Allophonic Rules
  • 24. Conversion Task Pronunciation Task Prosody Task Lower Phonetic Task Input text Exceptions Dictionary Phonotactic Tables Affix Tables Pronunciation Rules Prosody Table Lower Phonetic Table The JSRU TTS System Speech Output
  • 25. This task converts unrestricted text to restricted text. Unrestricted text consists of non-English words, abbreviations (e.g. Dr.), unusual pronunciation, words to be spelt (e.g. BBC), etc. This task also gets the phonetic form of words that are in the dictionary and deletes any redundant white space. The date is 1/10/97. …the first of October, ninety ninety seven. I can’t do it. …I cant do it. Or I cannot do it The price is £23.99. … twenty three pounds ninety nine pence St. George St. … Saint George Street well-behaved …well behaved CONVERSION TASK
  • 26.
  • 27. 1. Look up the entire word in the dictionary. If found then exit from pronunciation task. sentence “sen-tans sensitive “sen’si-tiv transport “traans-poat cough “kof   Conversion of words to phonemes is far from regular and is very context dependent in English. Some languages are better than others in this respect.   George Bernard Shaw is quoted as saying that fish is spelt GHOTI ??   The advantage of using a dictionary is that it can include information on syllables, stress, syntactic types, etc. This is much more easily obtained using a lookup table than by using algorithms. For example, stress markers and syllable boundaries are more easily identified. But, of course, there will always be some words which are not in the dictionary, however large one makes it.
  • 28.
  • 29. Example of function for -s removal (not definitive) lastchar = S if (prevchar=S) then return else if (prevtwochars = IO) then remove S else if (prevchar=vowel but not E) then return else if (prevchar=E) then remove S if (prevprevchar=I) then replace IE by Y else if (prevprevchar=H) and (prevprevprevchar  T) then delete E else if ((prevprevchar=S) and (prevprevprevchar=S)) or ((prevprevchar=Z) and (prevprevprevchar=Z)) then delete E SS IOS IES -> Y HES but not THES SSES/ZZES loss folios parties batches/bathes losses/buzzes AS alias
  • 30.
  • 31. The JSRU system has a list of 39 suffixes e.g. ED,NT, FUL, OUS, ALIC, IBLE, EN, etc.   After a suffix has been removed, the stem is checked to see if it is long enough(at least 3 letters) and whether the final consonant cluster is pronounc e able.   If the suffix could have replaced a final 'e' then this 'e' must be added. The algorithm for deciding whether or not an 'e' has been removed is not simple. e.g. alternation -> alternate + ion i nteraction -> interact + ion Rule here is that if stem ends in vowel + consonant then an 'e' should be added. What about taxable? OK when 'x' replaced by 'ks'.Having removed a suffix, the reduced word is looked up in the dictionary. If it ’ s not found then an attempt is made to remove another suffix.   e.g. wond - er - ful - ly
  • 32.
  • 33.
  • 34. For example, {"A", "", "consY>", "ai"} lazy {"OUGH", "", "", "ou"} bough {"AUGH", "", "", "aw"} daughter {"C", "", "E", "S"} lace {"PH", "", "", "F"} phase What about laughter? What about rough, cough, though?
  • 35. The order of the rules is important. e.g. Special procedures need to be called in some cases: {"Q", "", "", "KW"} free q ent  free kw ent {"E", "cons", "Q", "EE"} fr e qent  fr ee qent {"QU", "", "", "Q"} fre qu ent  fre q ent N.B. Every rule is tried at every position in the word. Hence, time taken depends very much on the number of rules. {"vowel", "", ""consE", "@001"} Magic 'e' procedure {"E", "", ">", "@002} final 'e' procedure {"@003", "", "", ""} double letter procedure
  • 36. 5. Replace the prefixes and suffixes – in their phonetic form, of course, w hich can be obtained from lookup tables.   This may result in some adjustments which are handled by the second set of rules. e.g.   {"c", "", "i", "s"} precious {"c", "", "", "k"} practically {"ig", "", "n-", "ie"} assignment
  • 37. 6. Perform stress assignment by applying the stress rules – these can b e quite complicated. Make adjustments to final pronunciation e. g. reduction of some unstressed vowels.   The stress rules are based on MIT rules and are quite complicated. (From text to speech: the MITalk system Allen et al) In the first (cyclic) phase several rules are applied in sequence first to the stem then to the stem + su ffixes taken one at a time P refixes are considered as part of the stem   Some affixes(e.g. -ING) do not affect the stress pattern so rules are omitted, while others(e.g. -ION) force stress onto a particular vowel, usually the one before the affix. In the second (non-cyclic) phase one vowel is selected for mainstress and all others are reduced to secondary or no stress.
  • 38. Stress Rules Main Stress Rule (cyclic) 1. V -> [1-stress] / X – C 0 {[short v] C 0 1 / V} {[short V] C 0 / V} where X contains all prefixes and the symbol ‘-’ indicates the position of the vowel to be stressed C 0 1 matches zero or one consonant C 0 matches any number of consonants (including none) {..} denotes a list of alternative patterns separated by slashes /. Assign 1-stress (primary stress) to the vowel in a syllable which precedes a weak syllable followed by a morph-final syllable containing a short vowel and zero or more consonants. e.g. difficult - > d ” i f i k a l t X – C 0 {..} {…}
  • 39. V -> [1-stress] / X – C 0 {[short v] C 0 1 / V} {[short V] C 0 / V} where X contains all prefixes and the symbol ‘-’ indicates the position of the vowel to be stressed Assign 1-stress to the vowel in a syllable preceding a vowel followed by a morph-final syllable containing a short vowel and zero or more consonants. e.g. secretariat -> sekret“eir ee aat X – C 0 {..} {..} Assign 1-stress to the vowel in a syllable preceding a vowel followed by a morph-final vowel. e.g. oratorio -> orat“oar ee oa X – C 0 {..} {..}
  • 40. 2. V -> [1-stress] / X – C 0 {[short V] C 0 / V} where X contains all prefixes Assign 1-stress to the vowel in a syllable preceding a short vowel and zero or more consonants e.g. edit -> “ed it bitumen -> bity”uum en agenda -> aj”en da 3. V -> [1-stress] / X – C 0 where X contains all prefixes Assign 1-stress to the vowel in the last syllable e.g. stand -> st”aand parole -> paar”oal
  • 41. Stressed Syllable Rule (cyclic) Alternating Stress Rule (cyclic) Destressing Rule (non-cyclic) Compound Stress Rule (non-cyclic) Strong First Syllable Rule (non-cyclic) Cursory Rule (non-cyclic) Vowel Reduction Rule (non-cyclic)
  • 42. After the stress assignment a third set of rules is applied. e.g. {"41r", "", "cons", "41"} far gone versus far away (141 = 'schwa') {"ir", "", "cons", "er"} dirt versus direct   and some unstressed vowels are reduced.e.g. aa  a ai  i u  a o  a   bottom => "bot-am
  • 43.
  • 44. This task adds intonation and timing to the phonetic text. The task processes complete breath groups. The output consists of a list of phonemes with corresponding pitch and duration. PROSODY TASK
  • 45. In No vem ber the reg ion’s wea ther was un us ually dry . Do you want to tra vel to Lon don? Who is the Prime Min ister of the Ba ha mas? Lift the safe ty cov er and press the red but ton.
  • 46.
  • 47. Examples What time is the next train to London? What time is the next train to London ? What time is the next train to London? There’s John cycling down the road. Sentence stress This refers to the way in which one word is singled out in a sentence as the focus , or nuclear stress
  • 48. Rhythm Rhythm is about how we time the delivery of the sentence. In English we are supposed to time the stressed syllables so that they occur equidistantly in time. English is said to be a stress-timed language The stressed syll–a-bles-in Eng-lish o - ccur e - qui- dis-tant-ly in time # # # # # # # # # # # # # # # # # # # # # # Les encyclopedies electroniques sont pleines d’informations French is said to be a syllable-timed language D Crystal 1995 clo pe die se lec tro niques cy sen Le sont pleines D’in for ma tions
  • 49.
  • 50. Projects represent a substantial part of your marks and you should spend some time choosing a project which will show you in your best light. Projects represent // a substantial part // of your marks // and you should spend // some time choosing // a project // which will show // you in your best light. “ Pro-jects rep-res- ” ent // a sub- ” stant-ial “ part // of your “ marks // and you should “ spend // some “ time “ choos-ing // a “ pro-ject // which will “ show // you in your “ best “ light. Unstressed syllable weight 1 Secondary stress “ 2 Primary stress “ 3 Emphatic stress “ 4
  • 51.
  • 53. Baseline When all syllable patterns have been calculated they are superimposed on a falling baseline , , The value at the end of the phrase (indicated by the ,) is half that at the beginning, and starting value for next phrase is 1.2 times the final value of previous one
  • 54.
  • 55. This task converts the output from PROSODY TASK into control parameters for the hardware synthesiser. LOWER PHONETIC TASK T he table for each phonetic element contains information relating to how transitions between target values are calculated. For each parameter in the table there are entries for Its target value The proportion of the target of the dominated element used in deriving the boundary value   A fixed contribution to the boundary value  Transition duration within the dominant element       Transition duration within the dominated element 3 4 External Duration 2 3 50 950 1900 E 8 4 50 380 760 W Rank Internal Duration Proportion Fixed Contribution Target
  • 56. It sends a frame of parameters to the synthesiser every 10msecs. The boundary value is calculated as  Fixed Contribution of Dominant element + Target of dominated element * proportion of dominant element   i .e. 380 + 1900 * 0.5 = 1330 760 Hz Target value for W 1330 Hz 1900 Hz Target value for E Internal duration 4 frames External duration 4 frames 2 3 3 50 950 1900 E 8 4 4 50 380 760 W Rank External Duration Internal Duration Proportion Fixed Contribution Target
  • 57. DH A W E 1300 1360 1090 1330 1420 760 Boundary Target Values Values DH|A 650 + 1420 * 0.5 = 1360 DH 1330 A|W 380 + 1420 * 0.5 = 1090 A 1420 W|E 380 + 1900 * 0.5 = 1330 W 760 7 4 8 12 6 11 2 8 2 6 Rank 5 6 3 4 3 5 External Duration 2 50 860 1720 Z 0 50 440 880 L 3 50 950 1900 E 4 50 380 760 W 3 50 710 1420 A 2 50 650 1300 DH Internal Duration Proportion Fixed Contribution Target
  • 58. Tomorrow will be starting off grey and rather murky. :"to-ma-rou /wil /bey :"star-ting /of :"grai /and :"raa-dha :"mer-kee.
  • 59. T 9 37 TY 2 37 TZ 3 37 O 10 37 M 9 35 A 4 35 R 4 34 OU 7 34 OB 7 33 W 4 34 I 5 34 LP 4 32 B 5 33 BY 1 33 EY 5 33 S 11 36 T 6 35 TY 2 35 AR 14 35 T 6 31 TY 2 31 I 5 31 NG 9 30 O 6 32 F 6 31 G 6 34 GY 2 34 R 2 34 AI 12 34 AJ 9 32 A 4 32 N 6 31 D 2 30 DY 1 30 R 7 33 AA 10 32 DH 7 31 A 5 30 M 11 32 ER 14 32 K 6 25 KY 2 23 I 10 23 QQ 51 22 Q 42 21
  • 60. * Output from SYNTH: * fn alf f1 a1 f2 a2 f3 a3 ahf s f0 * T 9 37 0 Frame 0 250 6 190 6 1780 6 2680 6 6 1 37 250 10 190 10 1780 10 2680 10 10 1 37 250 13 190 13 1780 13 2680 13 13 1 37 250 15 190 15 1780 15 2680 15 15 1 37 250 15 190 15 1780 15 2680 15 15 1 37 250 15 190 15 1780 15 2680 15 15 1 37 250 13 190 13 1780 13 2680 13 13 1 37 250 10 190 10 1780 10 2680 10 10 1 37 250 6 190 6 1780 6 2680 6 6 1 37 * TY 2 37 0 Frame 9 250 15 190 15 1780 40 2680 51 58 1 37 250 15 190 15 1780 40 2680 51 58 1 37 * TZ 3 37 0 Frame 11 250 6 190 6 1780 23 2680 28 40 1 37 250 10 190 10 1780 27 2680 32 44 1 37
  • 61. BT1 Spruce 2 L&H 2 AT&T 2 Festival 2 BT 2 Mbrola 2 L&H 1 Spruce 1 Festival 1 AT&T 1 Mbrola 1