SlideShare une entreprise Scribd logo
1  sur  60
Ihr Logo
Supervisor: Submitted by:
K. Rajalakshmi Tanya Saxena(10503894)
Abhinav Mathur(10503858)
MAJOR PROJECT
Video has become one of the most popular multimedia artefacts
used on PCs and the Internet. In a majority of cases within a video, the
sound holds an important place. From this statement, it appears essential
to make the understanding of a sound video available for people with
auditory problems as well as for people with gaps in the spoken language.
The most natural way lies in the use of subtitles.
However, manual subtitle creation is a long and boring activity
and requires the presence of the user. Consequently, the study of
automatic subtitle generation appears to be a valid subject of research.
PROBLEM STATEMENT...
The system should take a video file as input and generate a subtitle file (srt/txt) as
output. The Three modules are:-
Audio Extraction:
The audio extraction routine is expected to return a suitable audio
format that can be used by the speech recognition module as pertinent material. It
must handle a defined list of video and audio formats. It has to verify the file
given in input so that it can evaluate the extraction feasibility. The audio track
has to be returned in the most reliable format.
INTRODUCTION...
Speech Recognition:
The speech recognition routine is the key part of the system. Indeed, it
affects directly performance and results evaluation. First, it must get the type of
the input file then, if the type is provided, an appropriate processing method is
chosen. Otherwise, the routine uses a default configuration. It must be able to
recognize silences so that text delimitations can be established.
Subtitle Generation:
The subtitle generation routine aims to create and write in a file in
order to add multiple chunks of text corresponding to utterances limited by
silences and their respective start and end times. Time synchronization
considerations are of main importance.
BENEFITS OF USING SUBTITLES....
 The major benefit is that the viewer does not need to download the subtitle from
internet if he wants to watch the video with subtitle.
 Captions help children with word identification, meaning, acquisition, and
retention.
 Captions can help children establish a systematic link between the written word
and the spoken word.
 Captioning has been related to higher comprehension skills when compared to
viewers watching the same media without captions.
 Captions provide missing information for individuals who have difficulty
processing speech and auditory components of the visual media (regardless of
whether this difficulty is due to a hearing loss).
 Captioning is essential for children who are deaf and hard of hearing, can be
very beneficial to those learning English as a second language, can help those
with reading and literacy problems, and can help those who are learning to
read.
CONTINUED....
H E R E C O M E S Y O U R F O O T E R  P A G E 7
H E R E C O M E S Y O U R F O O T E R  P A G E 9
H E R E C O M E S Y O U R F O O T E R  P A G E 1 3
AUDIO EXTRACTION…
H E R E C O M E S Y O U R F O O T E R  P A G E 1 4
SPEECH RECOGNITION…
H E R E C O M E S Y O U R F O O T E R  P A G E 1 5
SUBTITLE GENERATION…
H E R E C O M E S Y O U R F O O T E R  P A G E 1 6
FFMPEG…
H E R E C O M E S Y O U R F O O T E R  P A G E 1 7
FFMPEG libraries are used to do most of our multimedia tasks quickly and
easily say, audio compression, audio/video format conversion, extract images
from a video and a lot more. It can be used by developers for transcoding,
streaming and playing. It is very stable framework for transcoding of videos
and audio.
JAVA SPEECH API…
It allows developers to incorporate speech technology into user
interfaces for their Java programming language applets and
applications. This API specifies a cross-platform interface to support
command and control recognizers, dictation systems and speech
synthesizers. . Sun has also developed the JSGF(Java Speech Grammar
Format) to provide cross-platform grammar of speech recognizers .
CURRENT PROBLEMS…
H E R E C O M E S Y O U R F O O T E R  P A G E 1 9
 Robustness.
 Automatic generation of word lexicons.
 Finding the theoretical limit for FSM implementations of ASR systems.
 Optimal utterance verification-rejection algorithms.
 Accuracy and Word Error Rate.
 Filling up missing offset samples with silence.
 Synchronize between tracks.
H E R E C O M E S Y O U R F O O T E R  P A G E 2 0
 All MPEG standard formats are supported like MP2, MP3 etc. for
audio/video.
 Audio of any format can be extracted but speech recognition is done
only in English.
 The extracted text from the audio/video is in the .srt format. The text
displayed will have a readable format
 Captions appear on-screen long enough to be read. It is preferable to
limit on-screen captions to no more than two lines. Captions are
synchronized with spoken words.
 User can convert the extracted audio in any suitable format supported
under MPEG standards.
 System Requirements – The software is compatible on all the Operating
Systems. The user needs to install the .exe file of the software in their
PCs.
 Security – The system has no security constraints.
 Performance – The text is synchronized with the song.
 Maintainability – The software is easy to maintain.
 Reliability - The software will provide a good level of precision.
 Modifiability- The software cannot be modified by external user.
 Scalability- The software is scalable as a number of users can utilize it
for their benefits simultaneously.
MP3 ALGORITHM…
1. Initialize i=0, j=1.
2. tincr = 1.0 / sample_rate
3. dstp = dst, c = 2 * M_PI * 440.0;
4. Generate sin tone with 440Hz frequency and duplicated channels
5. Check if i < nb_samplesIf it is true then generate ths sine wave and store it in dstp
= sin(c * *t)
6. Check if j < nb_channels
7. Store the packets in the destination buffer.
8. Increment dstp += nb_channels and t += tincr
9. Repeat till the dst buffer is filled with nb_samples, generated starting from t
MFCC (MEL FREQUENCY CEPSTRAL COEFFECIENT)
Check if Delta frequency which is the ratio between sample rate and number of
fft points
if (deltaFreq == 0) {
Print “deltaFreq has zero value"; }
Check if the left and right boundaries of the filter are too close.
if ((Math.round(rightEdge - leftEdge) == 0)|| (Math.round(centerFreq - leftEdge)
== 0) || (Math.round(rightEdge - centerFreq) == 0))
{
throw new IllegalArgumentException("Filter boundaries too close"); }
Find how many frequency bins we can fit in the current frequency range.
numberElementsWeightField =(int) Math.round((rightEdge - leftEdge) / deltaFreq
+ 1);
Initialize the weight field.
if (numberElementsWeightField == 0) {
throw new IllegalArgumentException("Number of elements in mel" + " is zero."); }
weight = new double[numberElementsWeightField];
CONTINUED…
filterHeight = 2.0f / (rightEdge - leftEdge);
Now compute the slopes based on the height.
leftSlope = filterHeight / (centerFreq - leftEdge);
rightSlope = filterHeight / (centerFreq - rightEdge);
Now let's compute the weight for each frequency bin.
for (currentFreq = initialFreq, indexFilterWeight = 0; currentFreq <= rightEdge;
currentFreq += deltaFreq, indexFilterWeight++) {
if (currentFreq < centerFreq) {
weight[indexFilterWeight] = leftSlope * (currentFreq - leftEdge); } else {
weight[indexFilterWeight] = filterHeight + rightSlope * (currentFreq - centerFreq);
}}
Convert linear frequency to mel frequency
private double linToMelFreq(double inputFreq) {
return (2595.0 * (Math.log(1.0 + inputFreq / 700.0) / Math.log(10.0))); }
H E R E C O M E S Y O U R F O O T E R  P A G E 3 4
Risk
ID
Classification Description of Risk Risk Area Probability Impact RE
(P*I)
1. Product
Engineering
Word Error Rate Performance L H M
2. Product
Engineering
Aliasing Performance M M M
3. Development
Environment
Bitrate of extracted
audio more than that
of input audio
Testing Environment L L L
4. Product
Engineering
Accuracy and Speed Performance L H M
5. Program Constraint Format not recognized External Input L H M
Risk ID Description of Risk Risk Area Mitigation
1. Word Error Rate Performance Having an effecient database
(Training Set).
2. Aliasing Performance Resampling the samples at a fix
frequency.
3. Bitrate of extracted audio more than
that of input audio
Testing Environment Encode and Decode audio at
the bitrate of the input audio.
4. Accuracy and Speed Performance Synchronization
5. Format not recognized External Input Input audio/video supported
by MPEG standard formats.
H E R E C O M E S Y O U R F O O T E R  P A G E 3 9
Test Case ID Input Expected Output Status
1. 1.1 File.mp3 File.mp3 Pass
1.2 File.mp4 File.mp3 Pass
1.3 File.mp2 File.mp3 Pass
1.4 File.au File.au Pass
1.5 File.aac File.aac Pass
1.6 File.wav File.wav Pass
1.7 File.flac File.flac Pass
1.8 File.wma (format not supported by
MPEG standards)
File.wma Fail
1.9 File.als (format not supported by
MPEG standards)
File.als Fail
2. 2.1 File.wav (Words present in the
dictionary)
Speech Recognized.
Text Printed.
Pass
2.2 File.mp3 (not a .wav file) Speech Recognized.
Text Printed.
Fail
2.3 File.au (not a .wav file) Speech Recognized.
Text Printed.
Fail
2.4 File.flac (not a .wav file) Speech Recognized.
Text Printed.
Fail
2.5 File.wav (Words not found in the
Dictionary)
Speech Recognized.
Text Printed.
Fail
3. 3.1 File.srt (Incorrect Timecode) Subtitles generated but
synchronized with the video
Fail
3.2 File.srt (Correct Timecode)
File.avi
Subtitles generated and
synchronized with the video file
File.avi
Pass
3.3 File.txt (not containing the
Timecode)
Subtitles generated and
synchronized with the video
Fail
3.4 File.srt (Correct Timecode)
File.mp4
Subtitles generated and
synchronized with the video file
File.mp4
Pass
3.5 File.srt (Correct Timecode)
File.wma
Subtitles generated and
synchronized with the video file
Pass
H E R E C O M E S Y O U R F O O T E R  P A G E 4 3
AUDIO EXTRACTION…
CC=E-N+2
Where,
E=No. of Edges(80)
N=No. of Nodes(72)
CC=80-72+2=10
CYCLOMATIC COMPLEXITY…
SPEECH RECOGNITION…
CC=E-N+2
Where,
E=No. of Edges(80)
N=No. of Nodes(72)
CC=98-91+2=9
CYCLOMATIC COMPLEXITY…
Test Case ID Components Debugging Technique
1.8 Audio Extraction Backtracking Debugging
1.9 Audio Extraction Backtracking Debugging
2.2 Speech Recognition Backtracking Debugging
2.3 Speech Recognition Backtracking Debugging
2.4 Speech Recognition Backtracking Debugging
2.5 Speech Recognition Print Debugging
3.1 Subtitles Generation Print Debugging
3.3 Subtitles Generation Backtracking Debugging
Test Case ID Input Expected Output Status
1.8 File.au (format supported by
MPEG standards)
File.au Pass
1.9 File.mp4 (format supported by
MPEG standards)
File.mp3 Pass
2.2 File.wav Speech Recognized.
Text Printed.
Pass
2.3 File.wav Speech Recognized.
Text Printed.
Pass
2.4 File.wav Speech Recognized.
Text Printed.
Pass
2.5 File.wav (Words found in the
Dictionary)
Speech Recognized.
Text Printed.
Pass
3.1 File.srt (Correct Timecode) Subtitles generated and
synchronized with the video
Pass
3.3 File.srt Subtitles generated and
synchronized with the video
Pass
DETAILED STUDY OF INPUT AND EXTRACTED FILES…
Time
Taken
for
Extract
ion
(in ms)
Size Bitrate Size Bitrate
(MB) (kbps) (MB) (kbps)
1
Despicable
.avi
10.8 1628 8.24 1411 00:49 0.6 24%
2 Time.mp4 48.1 1663 44.4 1536 04:02 3.12 8%
3
Florida.mp
4
76 2723 39.3 1411 03:54 1.08 48%
4
Internation
al.mp4
79.1 2673 41.7 1411 04:08 1.3 47%
5 Justin.mp4 43.2 1615 41 1536 03:44 1.54 5%
6 Love.mp4 67.1 2112 44.8 1411 04:26 1.98 33%
7 Jojo.avi 61.8 2183 39.9 1411 03:57 1.86 35%
8 Baby.mp4 43.2 1615 41 1536 03:44 3.34 5%
9 Never.mp4 52.5 1657 48.5 1536 04:25 2.15 8%
10 Beep.avi 51.4 1628 38.4 1411 03:48 01:58 25%
Average 53.3 1950 38.7 1461 03:41 1.71 24%
Redu
ction
Rate
S.
N
o.
Input File
Before Audio
Extraction
After Audio
Extraction
Length
of the
input/ou
tput file
(min:sec
)
COMPARISON BETWEEN THE SIZE OF THE INPUT FILE AND THE
EXTRACTED FILE
0
20
40
60
80
100
Sizeoffile(inMB)
Input Files (.mp4/.avi)
Size Before Extraction(MB)
Size After Extraction(MB)
From the above graph we can observe that the size of each input file is reduced as the
audio has been extracted from the input video. The maximum reduction rate of the
size of the file is 0.48 and the minimum reduction is 0.05 giving an average
reduction rate of 24%.
COMPARISON BETWEEN THE BITRATE OF THE INPUT FILE AND THE
EXTRACTED FILE
0
500
1000
1500
2000
2500
3000
Bitrate(inkbps)
Input Files (.mp4/.avi)
Bitrate Before Extraction(kbps)
Bitrate After Extraction(kbps)
The bitrates of each of the input files range from 1615kbps to 2723kbps and the bitrates
of the extracted files reduces to a minimum of 1411kbps and maximum of 1536kbps
giving an average bitrate of 1461kbps.
TIME TAKEN FOR EXTRACTION OF INPUT FILE
0
0.5
1
1.5
2
2.5
3
3.5
4
Time(inms)
Input Files (.mp4/.avi)
Time Taken for Extraction (in
ms)
The time taken to extract each files vary from 0.6 ms to 3.34 ms with the average
extraction time of 1.71 ms
H E R E C O M E S Y O U R F O O T E R  P A G E 5 5
 The ASG aims at automatically generating the text for the input
audio/video.
 It supports all the MPEG standards.
 The video and subtitles are synchronized.
 User can extract audio in any MPEG standard formats.
 Audio of any format can be extracted but speech recognition
[1] B. H. Juang; L. R. Rabiner, “Hidden Markov Models for Speech Recognition” Journal of
Technometrics, Vol.33, No. 3. Aug., 1991.
[2] Hong Zhou and Changhui Yu , “Research and design of the audio coding scheme ,” IEEE
Transactions on Consumer Electronics, International Conference on Multimedia
Technology(ICMT) 2011.
[3] Seymour Shlien,”Guide to MPEG-1 Audio Standard”, Broadcast Technology, IEEE
Transactions on Broadcasting, December 1994.
[4] Justin Burdick, “Building a Regionally Inclusive Dictionary for Speech Recognition”,
Computer Science and Linguistics, Spring 2004.
[5] Anand Vardhan Bhalla, Shailesh Khaparkar, “Performance Improvement of Speaker
Recognition System”,International Journal of Advanced Research in Computer Science
and Software Engineering, Volume 2, Issue 3, March 2012.
[6] Petr Pollak, Martin Behunek, “Accuracy of MP3 Speech Recognition Under Real-World
Conditions”, Electrical Engineering, Czech Technical University in Prague, Technick´a 2.
REFERENCES…
[7] Yu Li, LingHua Zhang, “Implementation and Research of Streaming Media System and
AV Codec Based on Handheld Devices” 12th IEEE International Conference on
Communication Technology (ICCT), 2010.
[8] Ibrahim Patel1 Dr. Y. Srinivas Rao, “Speech Recognition Using HMM with MFCC- An
Analysis using Frequency Spectral Decomposition Technique”, Signal & Image
Processing: An International Journal(SIPIJ), Vol.1, No.2, December 2010.
[9] Jorge Martinez, Hector Perez, Enrique Escamilla, Masahisa Mabo Suzuki,” Speaker
recognition using Mel Frequency Cepstral Coefficients (MFCC) and Vector Quantization
(VQ) Techniques”, 22nd International Conference on Electrical Communications and
Computers (CONIELECOMP), 2012.
[10] Sadaoki Furui, Li Deng, Mark Gales,Hermann Ney, and Keiichi Tokuda,, ” Fundamental
Technologies in Modern Speech Recognition”, Signal Processing, IEEE Signal Processing
Society, November 2012.
[11] Youhao Yu “Research on Speech Recognition Technology and Its Application”,
Electronics and Information Engineering, International Conference on Computer
Science and Electronics Engineering, 2012.
CONTINUED…
Abhinav Mathur, Tanya Saxena, “Generating Subtitles Automatically using
Audio Extraction and Speech Recognition”, 7th International Conference on
Contemporary Computing (IC3), 2014. (Under Review).
PUBLICATION…
Automatic subtitle generation

Contenu connexe

Tendances

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compressionanithabalaprabhu
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUNDazira96
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compressionMr SMAK
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationAnsgar Scherp
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Audio encoding principles
Audio encoding principlesAudio encoding principles
Audio encoding principlesPhillip Doyle
 
Multimedia color in image and video
Multimedia color in image and videoMultimedia color in image and video
Multimedia color in image and videoMazin Alwaaly
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1Rajat Kumar
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 

Tendances (20)

Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUND
 
VIDEO STEGANOGRAPHY
VIDEO STEGANOGRAPHYVIDEO STEGANOGRAPHY
VIDEO STEGANOGRAPHY
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
 
Sound
SoundSound
Sound
 
Data compression
Data  compressionData  compression
Data compression
 
Mp3
Mp3Mp3
Mp3
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, Application
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Audio encoding principles
Audio encoding principlesAudio encoding principles
Audio encoding principles
 
Adaptive Huffman Coding
Adaptive Huffman CodingAdaptive Huffman Coding
Adaptive Huffman Coding
 
Multimedia color in image and video
Multimedia color in image and videoMultimedia color in image and video
Multimedia color in image and video
 
Data Redundacy
Data RedundacyData Redundacy
Data Redundacy
 
Image formats
Image formatsImage formats
Image formats
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
Voicemorphing
VoicemorphingVoicemorphing
Voicemorphing
 
Chapter Eight
Chapter Eight Chapter Eight
Chapter Eight
 

En vedette

Writing multi-language documentation using Sphinx
Writing multi-language documentation using SphinxWriting multi-language documentation using Sphinx
Writing multi-language documentation using SphinxMarkus Zapke-Gründemann
 
R+1 5+shades+of+meaning+power+point (1)
R+1 5+shades+of+meaning+power+point (1)R+1 5+shades+of+meaning+power+point (1)
R+1 5+shades+of+meaning+power+point (1)Kourtney Moscarello
 
Sample position paper
Sample position paperSample position paper
Sample position paperTere Gf
 
How to make a position paper
How to make a position paperHow to make a position paper
How to make a position paperNicola Massarelli
 

En vedette (6)

Writing multi-language documentation using Sphinx
Writing multi-language documentation using SphinxWriting multi-language documentation using Sphinx
Writing multi-language documentation using Sphinx
 
R+1 5+shades+of+meaning+power+point (1)
R+1 5+shades+of+meaning+power+point (1)R+1 5+shades+of+meaning+power+point (1)
R+1 5+shades+of+meaning+power+point (1)
 
Sample position paper
Sample position paperSample position paper
Sample position paper
 
Position Paper
Position PaperPosition Paper
Position Paper
 
Genre ng pelikula
Genre ng pelikula Genre ng pelikula
Genre ng pelikula
 
How to make a position paper
How to make a position paperHow to make a position paper
How to make a position paper
 

Similaire à Automatic subtitle generation

IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work SheetKyleFielding
 
Extract the Audio from Video by using python
Extract the Audio from Video by using pythonExtract the Audio from Video by using python
Extract the Audio from Video by using pythonIRJET Journal
 
1019上課資料
1019上課資料1019上課資料
1019上課資料abunc8
 
Automatic Subtitle Generation for Sound in Videos
Automatic Subtitle Generation for Sound in VideosAutomatic Subtitle Generation for Sound in Videos
Automatic Subtitle Generation for Sound in VideosIRJET Journal
 
Automatic Subtitle Generation For Sound In Videos
Automatic Subtitle Generation For Sound In VideosAutomatic Subtitle Generation For Sound In Videos
Automatic Subtitle Generation For Sound In VideosAsia Smith
 
IRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET Journal
 
Ben white ig2 task 1 work sheet
Ben white   ig2 task 1 work sheetBen white   ig2 task 1 work sheet
Ben white ig2 task 1 work sheetBenWhite101
 
Ben white ig2 task 1 work sheet
Ben white   ig2 task 1 work sheetBen white   ig2 task 1 work sheet
Ben white ig2 task 1 work sheetBenWhite101
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossaryamybrockbank
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
8th Ethiopian ICT Conference Bazaar and Exhibition.pptx
8th Ethiopian ICT Conference Bazaar and Exhibition.pptx8th Ethiopian ICT Conference Bazaar and Exhibition.pptx
8th Ethiopian ICT Conference Bazaar and Exhibition.pptxssusera032bc
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET Journal
 
Sound recording glossary improved version
Sound recording glossary improved versionSound recording glossary improved version
Sound recording glossary improved versionnazaryth98
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labIRJET Journal
 

Similaire à Automatic subtitle generation (20)

Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
Ig2 task 1 re edit version
Ig2 task 1 re edit versionIg2 task 1 re edit version
Ig2 task 1 re edit version
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
Extract the Audio from Video by using python
Extract the Audio from Video by using pythonExtract the Audio from Video by using python
Extract the Audio from Video by using python
 
IG1 Task 1
IG1 Task 1IG1 Task 1
IG1 Task 1
 
1019上課資料
1019上課資料1019上課資料
1019上課資料
 
Automatic Subtitle Generation for Sound in Videos
Automatic Subtitle Generation for Sound in VideosAutomatic Subtitle Generation for Sound in Videos
Automatic Subtitle Generation for Sound in Videos
 
Automatic Subtitle Generation For Sound In Videos
Automatic Subtitle Generation For Sound In VideosAutomatic Subtitle Generation For Sound In Videos
Automatic Subtitle Generation For Sound In Videos
 
IRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language Processing
 
Sound analysis draft 2
Sound analysis draft 2Sound analysis draft 2
Sound analysis draft 2
 
Ben white ig2 task 1 work sheet
Ben white   ig2 task 1 work sheetBen white   ig2 task 1 work sheet
Ben white ig2 task 1 work sheet
 
Ben white ig2 task 1 work sheet
Ben white   ig2 task 1 work sheetBen white   ig2 task 1 work sheet
Ben white ig2 task 1 work sheet
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
8th Ethiopian ICT Conference Bazaar and Exhibition.pptx
8th Ethiopian ICT Conference Bazaar and Exhibition.pptx8th Ethiopian ICT Conference Bazaar and Exhibition.pptx
8th Ethiopian ICT Conference Bazaar and Exhibition.pptx
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
 
Sound recording glossary improved version
Sound recording glossary improved versionSound recording glossary improved version
Sound recording glossary improved version
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Dernier (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 

Automatic subtitle generation

  • 1. Ihr Logo Supervisor: Submitted by: K. Rajalakshmi Tanya Saxena(10503894) Abhinav Mathur(10503858) MAJOR PROJECT
  • 2. Video has become one of the most popular multimedia artefacts used on PCs and the Internet. In a majority of cases within a video, the sound holds an important place. From this statement, it appears essential to make the understanding of a sound video available for people with auditory problems as well as for people with gaps in the spoken language. The most natural way lies in the use of subtitles. However, manual subtitle creation is a long and boring activity and requires the presence of the user. Consequently, the study of automatic subtitle generation appears to be a valid subject of research. PROBLEM STATEMENT...
  • 3. The system should take a video file as input and generate a subtitle file (srt/txt) as output. The Three modules are:- Audio Extraction: The audio extraction routine is expected to return a suitable audio format that can be used by the speech recognition module as pertinent material. It must handle a defined list of video and audio formats. It has to verify the file given in input so that it can evaluate the extraction feasibility. The audio track has to be returned in the most reliable format. INTRODUCTION...
  • 4. Speech Recognition: The speech recognition routine is the key part of the system. Indeed, it affects directly performance and results evaluation. First, it must get the type of the input file then, if the type is provided, an appropriate processing method is chosen. Otherwise, the routine uses a default configuration. It must be able to recognize silences so that text delimitations can be established. Subtitle Generation: The subtitle generation routine aims to create and write in a file in order to add multiple chunks of text corresponding to utterances limited by silences and their respective start and end times. Time synchronization considerations are of main importance.
  • 5. BENEFITS OF USING SUBTITLES....  The major benefit is that the viewer does not need to download the subtitle from internet if he wants to watch the video with subtitle.  Captions help children with word identification, meaning, acquisition, and retention.  Captions can help children establish a systematic link between the written word and the spoken word.  Captioning has been related to higher comprehension skills when compared to viewers watching the same media without captions.
  • 6.  Captions provide missing information for individuals who have difficulty processing speech and auditory components of the visual media (regardless of whether this difficulty is due to a hearing loss).  Captioning is essential for children who are deaf and hard of hearing, can be very beneficial to those learning English as a second language, can help those with reading and literacy problems, and can help those who are learning to read. CONTINUED....
  • 7. H E R E C O M E S Y O U R F O O T E R  P A G E 7
  • 8.
  • 9. H E R E C O M E S Y O U R F O O T E R  P A G E 9
  • 10.
  • 11.
  • 12.
  • 13. H E R E C O M E S Y O U R F O O T E R  P A G E 1 3 AUDIO EXTRACTION…
  • 14. H E R E C O M E S Y O U R F O O T E R  P A G E 1 4 SPEECH RECOGNITION…
  • 15. H E R E C O M E S Y O U R F O O T E R  P A G E 1 5 SUBTITLE GENERATION…
  • 16. H E R E C O M E S Y O U R F O O T E R  P A G E 1 6
  • 17. FFMPEG… H E R E C O M E S Y O U R F O O T E R  P A G E 1 7 FFMPEG libraries are used to do most of our multimedia tasks quickly and easily say, audio compression, audio/video format conversion, extract images from a video and a lot more. It can be used by developers for transcoding, streaming and playing. It is very stable framework for transcoding of videos and audio.
  • 18. JAVA SPEECH API… It allows developers to incorporate speech technology into user interfaces for their Java programming language applets and applications. This API specifies a cross-platform interface to support command and control recognizers, dictation systems and speech synthesizers. . Sun has also developed the JSGF(Java Speech Grammar Format) to provide cross-platform grammar of speech recognizers .
  • 19. CURRENT PROBLEMS… H E R E C O M E S Y O U R F O O T E R  P A G E 1 9  Robustness.  Automatic generation of word lexicons.  Finding the theoretical limit for FSM implementations of ASR systems.  Optimal utterance verification-rejection algorithms.  Accuracy and Word Error Rate.  Filling up missing offset samples with silence.  Synchronize between tracks.
  • 20. H E R E C O M E S Y O U R F O O T E R  P A G E 2 0
  • 21.  All MPEG standard formats are supported like MP2, MP3 etc. for audio/video.  Audio of any format can be extracted but speech recognition is done only in English.  The extracted text from the audio/video is in the .srt format. The text displayed will have a readable format  Captions appear on-screen long enough to be read. It is preferable to limit on-screen captions to no more than two lines. Captions are synchronized with spoken words.  User can convert the extracted audio in any suitable format supported under MPEG standards.
  • 22.
  • 23.  System Requirements – The software is compatible on all the Operating Systems. The user needs to install the .exe file of the software in their PCs.  Security – The system has no security constraints.  Performance – The text is synchronized with the song.  Maintainability – The software is easy to maintain.  Reliability - The software will provide a good level of precision.  Modifiability- The software cannot be modified by external user.  Scalability- The software is scalable as a number of users can utilize it for their benefits simultaneously.
  • 24.
  • 25. MP3 ALGORITHM… 1. Initialize i=0, j=1. 2. tincr = 1.0 / sample_rate 3. dstp = dst, c = 2 * M_PI * 440.0; 4. Generate sin tone with 440Hz frequency and duplicated channels 5. Check if i < nb_samplesIf it is true then generate ths sine wave and store it in dstp = sin(c * *t) 6. Check if j < nb_channels 7. Store the packets in the destination buffer. 8. Increment dstp += nb_channels and t += tincr 9. Repeat till the dst buffer is filled with nb_samples, generated starting from t
  • 26. MFCC (MEL FREQUENCY CEPSTRAL COEFFECIENT) Check if Delta frequency which is the ratio between sample rate and number of fft points if (deltaFreq == 0) { Print “deltaFreq has zero value"; } Check if the left and right boundaries of the filter are too close. if ((Math.round(rightEdge - leftEdge) == 0)|| (Math.round(centerFreq - leftEdge) == 0) || (Math.round(rightEdge - centerFreq) == 0)) { throw new IllegalArgumentException("Filter boundaries too close"); } Find how many frequency bins we can fit in the current frequency range. numberElementsWeightField =(int) Math.round((rightEdge - leftEdge) / deltaFreq + 1); Initialize the weight field. if (numberElementsWeightField == 0) { throw new IllegalArgumentException("Number of elements in mel" + " is zero."); } weight = new double[numberElementsWeightField];
  • 27. CONTINUED… filterHeight = 2.0f / (rightEdge - leftEdge); Now compute the slopes based on the height. leftSlope = filterHeight / (centerFreq - leftEdge); rightSlope = filterHeight / (centerFreq - rightEdge); Now let's compute the weight for each frequency bin. for (currentFreq = initialFreq, indexFilterWeight = 0; currentFreq <= rightEdge; currentFreq += deltaFreq, indexFilterWeight++) { if (currentFreq < centerFreq) { weight[indexFilterWeight] = leftSlope * (currentFreq - leftEdge); } else { weight[indexFilterWeight] = filterHeight + rightSlope * (currentFreq - centerFreq); }} Convert linear frequency to mel frequency private double linToMelFreq(double inputFreq) { return (2595.0 * (Math.log(1.0 + inputFreq / 700.0) / Math.log(10.0))); }
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34. H E R E C O M E S Y O U R F O O T E R  P A G E 3 4
  • 35.
  • 36. Risk ID Classification Description of Risk Risk Area Probability Impact RE (P*I) 1. Product Engineering Word Error Rate Performance L H M 2. Product Engineering Aliasing Performance M M M 3. Development Environment Bitrate of extracted audio more than that of input audio Testing Environment L L L 4. Product Engineering Accuracy and Speed Performance L H M 5. Program Constraint Format not recognized External Input L H M
  • 37.
  • 38. Risk ID Description of Risk Risk Area Mitigation 1. Word Error Rate Performance Having an effecient database (Training Set). 2. Aliasing Performance Resampling the samples at a fix frequency. 3. Bitrate of extracted audio more than that of input audio Testing Environment Encode and Decode audio at the bitrate of the input audio. 4. Accuracy and Speed Performance Synchronization 5. Format not recognized External Input Input audio/video supported by MPEG standard formats.
  • 39. H E R E C O M E S Y O U R F O O T E R  P A G E 3 9
  • 40. Test Case ID Input Expected Output Status 1. 1.1 File.mp3 File.mp3 Pass 1.2 File.mp4 File.mp3 Pass 1.3 File.mp2 File.mp3 Pass 1.4 File.au File.au Pass 1.5 File.aac File.aac Pass 1.6 File.wav File.wav Pass 1.7 File.flac File.flac Pass 1.8 File.wma (format not supported by MPEG standards) File.wma Fail 1.9 File.als (format not supported by MPEG standards) File.als Fail
  • 41. 2. 2.1 File.wav (Words present in the dictionary) Speech Recognized. Text Printed. Pass 2.2 File.mp3 (not a .wav file) Speech Recognized. Text Printed. Fail 2.3 File.au (not a .wav file) Speech Recognized. Text Printed. Fail 2.4 File.flac (not a .wav file) Speech Recognized. Text Printed. Fail 2.5 File.wav (Words not found in the Dictionary) Speech Recognized. Text Printed. Fail 3. 3.1 File.srt (Incorrect Timecode) Subtitles generated but synchronized with the video Fail 3.2 File.srt (Correct Timecode) File.avi Subtitles generated and synchronized with the video file File.avi Pass 3.3 File.txt (not containing the Timecode) Subtitles generated and synchronized with the video Fail 3.4 File.srt (Correct Timecode) File.mp4 Subtitles generated and synchronized with the video file File.mp4 Pass 3.5 File.srt (Correct Timecode) File.wma Subtitles generated and synchronized with the video file Pass
  • 42.
  • 43. H E R E C O M E S Y O U R F O O T E R  P A G E 4 3 AUDIO EXTRACTION…
  • 44. CC=E-N+2 Where, E=No. of Edges(80) N=No. of Nodes(72) CC=80-72+2=10 CYCLOMATIC COMPLEXITY…
  • 46. CC=E-N+2 Where, E=No. of Edges(80) N=No. of Nodes(72) CC=98-91+2=9 CYCLOMATIC COMPLEXITY…
  • 47.
  • 48. Test Case ID Components Debugging Technique 1.8 Audio Extraction Backtracking Debugging 1.9 Audio Extraction Backtracking Debugging 2.2 Speech Recognition Backtracking Debugging 2.3 Speech Recognition Backtracking Debugging 2.4 Speech Recognition Backtracking Debugging 2.5 Speech Recognition Print Debugging 3.1 Subtitles Generation Print Debugging 3.3 Subtitles Generation Backtracking Debugging
  • 49. Test Case ID Input Expected Output Status 1.8 File.au (format supported by MPEG standards) File.au Pass 1.9 File.mp4 (format supported by MPEG standards) File.mp3 Pass 2.2 File.wav Speech Recognized. Text Printed. Pass 2.3 File.wav Speech Recognized. Text Printed. Pass 2.4 File.wav Speech Recognized. Text Printed. Pass 2.5 File.wav (Words found in the Dictionary) Speech Recognized. Text Printed. Pass 3.1 File.srt (Correct Timecode) Subtitles generated and synchronized with the video Pass 3.3 File.srt Subtitles generated and synchronized with the video Pass
  • 50.
  • 51. DETAILED STUDY OF INPUT AND EXTRACTED FILES… Time Taken for Extract ion (in ms) Size Bitrate Size Bitrate (MB) (kbps) (MB) (kbps) 1 Despicable .avi 10.8 1628 8.24 1411 00:49 0.6 24% 2 Time.mp4 48.1 1663 44.4 1536 04:02 3.12 8% 3 Florida.mp 4 76 2723 39.3 1411 03:54 1.08 48% 4 Internation al.mp4 79.1 2673 41.7 1411 04:08 1.3 47% 5 Justin.mp4 43.2 1615 41 1536 03:44 1.54 5% 6 Love.mp4 67.1 2112 44.8 1411 04:26 1.98 33% 7 Jojo.avi 61.8 2183 39.9 1411 03:57 1.86 35% 8 Baby.mp4 43.2 1615 41 1536 03:44 3.34 5% 9 Never.mp4 52.5 1657 48.5 1536 04:25 2.15 8% 10 Beep.avi 51.4 1628 38.4 1411 03:48 01:58 25% Average 53.3 1950 38.7 1461 03:41 1.71 24% Redu ction Rate S. N o. Input File Before Audio Extraction After Audio Extraction Length of the input/ou tput file (min:sec )
  • 52. COMPARISON BETWEEN THE SIZE OF THE INPUT FILE AND THE EXTRACTED FILE 0 20 40 60 80 100 Sizeoffile(inMB) Input Files (.mp4/.avi) Size Before Extraction(MB) Size After Extraction(MB) From the above graph we can observe that the size of each input file is reduced as the audio has been extracted from the input video. The maximum reduction rate of the size of the file is 0.48 and the minimum reduction is 0.05 giving an average reduction rate of 24%.
  • 53. COMPARISON BETWEEN THE BITRATE OF THE INPUT FILE AND THE EXTRACTED FILE 0 500 1000 1500 2000 2500 3000 Bitrate(inkbps) Input Files (.mp4/.avi) Bitrate Before Extraction(kbps) Bitrate After Extraction(kbps) The bitrates of each of the input files range from 1615kbps to 2723kbps and the bitrates of the extracted files reduces to a minimum of 1411kbps and maximum of 1536kbps giving an average bitrate of 1461kbps.
  • 54. TIME TAKEN FOR EXTRACTION OF INPUT FILE 0 0.5 1 1.5 2 2.5 3 3.5 4 Time(inms) Input Files (.mp4/.avi) Time Taken for Extraction (in ms) The time taken to extract each files vary from 0.6 ms to 3.34 ms with the average extraction time of 1.71 ms
  • 55. H E R E C O M E S Y O U R F O O T E R  P A G E 5 5
  • 56.  The ASG aims at automatically generating the text for the input audio/video.  It supports all the MPEG standards.  The video and subtitles are synchronized.  User can extract audio in any MPEG standard formats.  Audio of any format can be extracted but speech recognition
  • 57. [1] B. H. Juang; L. R. Rabiner, “Hidden Markov Models for Speech Recognition” Journal of Technometrics, Vol.33, No. 3. Aug., 1991. [2] Hong Zhou and Changhui Yu , “Research and design of the audio coding scheme ,” IEEE Transactions on Consumer Electronics, International Conference on Multimedia Technology(ICMT) 2011. [3] Seymour Shlien,”Guide to MPEG-1 Audio Standard”, Broadcast Technology, IEEE Transactions on Broadcasting, December 1994. [4] Justin Burdick, “Building a Regionally Inclusive Dictionary for Speech Recognition”, Computer Science and Linguistics, Spring 2004. [5] Anand Vardhan Bhalla, Shailesh Khaparkar, “Performance Improvement of Speaker Recognition System”,International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 3, March 2012. [6] Petr Pollak, Martin Behunek, “Accuracy of MP3 Speech Recognition Under Real-World Conditions”, Electrical Engineering, Czech Technical University in Prague, Technick´a 2. REFERENCES…
  • 58. [7] Yu Li, LingHua Zhang, “Implementation and Research of Streaming Media System and AV Codec Based on Handheld Devices” 12th IEEE International Conference on Communication Technology (ICCT), 2010. [8] Ibrahim Patel1 Dr. Y. Srinivas Rao, “Speech Recognition Using HMM with MFCC- An Analysis using Frequency Spectral Decomposition Technique”, Signal & Image Processing: An International Journal(SIPIJ), Vol.1, No.2, December 2010. [9] Jorge Martinez, Hector Perez, Enrique Escamilla, Masahisa Mabo Suzuki,” Speaker recognition using Mel Frequency Cepstral Coefficients (MFCC) and Vector Quantization (VQ) Techniques”, 22nd International Conference on Electrical Communications and Computers (CONIELECOMP), 2012. [10] Sadaoki Furui, Li Deng, Mark Gales,Hermann Ney, and Keiichi Tokuda,, ” Fundamental Technologies in Modern Speech Recognition”, Signal Processing, IEEE Signal Processing Society, November 2012. [11] Youhao Yu “Research on Speech Recognition Technology and Its Application”, Electronics and Information Engineering, International Conference on Computer Science and Electronics Engineering, 2012. CONTINUED…
  • 59. Abhinav Mathur, Tanya Saxena, “Generating Subtitles Automatically using Audio Extraction and Speech Recognition”, 7th International Conference on Contemporary Computing (IC3), 2014. (Under Review). PUBLICATION…