SlideShare une entreprise Scribd logo
1  sur  28
Veermata Jijabai Technological Institute
132011005
QUERY BY HUMMING...
Seminar Report
Shital Katkar
2
SEMINARS
OF
SEMISTER – II
[ YEAR 2013-2014 ]
NAME: SHITAL KATKAR
TOPIC : Query By Humming
SIGNATURE:________________
3
INDEX
1 Introduction
1.1 Query By Humming
2 Basic Architecture
2.1 Extraction
2.2 Transcription
2.3 Comparison
3 Applications
3.1 Shazam
3.2 Sound-Hound
3.3 Midomi
3.4 Musipedia
4 The art of Singing
4.1 Challenges
5 File Formats
5.1 Wav File format
5.2 MIDI File format
6 System Architecture
6.1 Wav to MIDI conversion
7 Parson Code algorithm
7.1 Rules
7.2 Advantages
8 Benchmarking MIR System
8.1 Online MIR System
8.1.1 CatFind
8.1.2 MelDex
8.1.3 MelodyHound
8.1.4 ThemeFinder
8.1.5 Music Retrieval Demo
4
8.2 Comparison of MIR System
8.3 Evaluation Issues
8.4 Subjective and objective testing
9 Conclusion
5
1. INTRODUCTION
Many people often remember as short tidbit of a song but fail to recall the song's name. If
you can remember lyrics that correspond to the song you are trying to recall, finding the
song is as easy as performing a text query on a web search engine. A query by humming
system allows a user to find a song even if he merely knows the tune from part of the
melody.
• “I don’t know the name. I don’t know who does it.
• But I can’t get this song out of my head.”
• Well, why not just hum it.
Query by humming System
It is a music retrieval technology in which users can hum or sing a melody to retrieve the
song.
The user simply sings or hums the tune into a computer microphone, and the system
searches through a database of song for melodies containing the tune and returns a ranked
list of search results. Thus user can then find the desired song by listening to the results.
6
A Query by Humming (QBH) system enables a user to hum a melody into a microphone
connected to a computer in order to retrieve a list of possible song titles that match the
query melody. The system analyzes the melodic and rhythmic information of the input
signal. The extracted data set is used as a database query. The result is presented as a list of
e.g. ten best matching results.
Generally, a QBH system is a Music Information Retrieval (MIR) system. A MIR systems
provides several means for music retrieval, which can be hummed audio signal, but also
music genre classification or text information about the artist or title.
7
2. BASIC ARCHITECTURE
Fig- Basic System Architecture
The basic architecture of the system is depicted in above figure. A microphone takes the
hummed input and sends this as a PCM signal to extraction block. The extracted information
results here which is given to the transcription part. The transcription block forms Melody
Contour to be compared with all contours residing in the database. A result list is finally
presented to the user.
Extraction
The extraction block is also referred as the acoustic front end. After recording the signal
with a computer sound card the signal is band pass filtered to reduce environmental noise
and distortion. In this system a sampling rate of 8000 Hz is used. The signal is band limited
to 80 to 800 Hz, which is sufficient for sung input. This frequency range corresponds to a
musical note range of D2–G5.
Transcription
The transcription block transcribes the extracted information into the representation that is
needed for comparison. The main task is to segment the input stream into single notes. This
can be done using parson code algorithm.
8
Comparison
The transcription result is used as database query. Several distance measures can be used to
find a similar piece of music. The database contains a collection of already transcribed
melodies formatted according to the MelodyContourType.
The Result is finally presented to the user.
9
3. APPLICATIONS
These are some examples of QBH Systems.
Shazam
Shazam is a commercial mobile phone-based music identification service. The company was
founded in 1999 by Chris Barton, Philip Inghelbrecht, Avery Wang and Dhiraj Mukherjee.
Shazam uses a mobile phone's built-in microphone to gather a brief sample of music being
played. An acoustic fingerprint is created based on the sample, and is compared against a
central database for a match. If a match is found, information such as the artist, song title,
and album are relayed back to the user.
Shazam can identify prerecorded music being broadcast from any source, such as a radio,
television, cinema or club, provided that the background noise level is not high enough to
prevent an acoustic fingerprint being taken, and that the song is present in the software's
database.
10
SoundHound
SoundHound (known as Midomi until December 2009) is a mobile device service that allows
users to identify music by humming, singing or playing a recorded track. The service was
launched by Melodis Corporation (now SoundHound Inc), under Chief Executive Keyvan
Mohajer in 2007 and has received funding from Global Catalyst Partners, TransLink Capital
and Walden Venture Capital.
SoundHound is a music search engine available on the Apple App Store, Google Play,
Windows Phone Store, and on June 5, 2013, was available on the BlackBerry 10 platform. It
enables users to identify music by playing, singing or humming a piece. It is also possible to
speak or type the name of the artist, composer, song and piece. Unlike competitor Shazam,
SoundHound can recognise tracks from singing, humming, speaking, or typing, as well as
from a recording. Sound matching is achieved through the company's 'Sound2Sound'
technology, which can match even poorly-hummed performances to professional
recordings.
11
Midomi
Midomi is the ultimate music search tool. Sing, hum, or whistle to instantly find your
favorite music and connect with a community that shares your musical interests.
At midomi you can create your own profile, sing your favorite songs and share them with
your friends and get discovered by other midomi users. You can listen to and rate other
users' musical performances, see their pictures, send them messages, buy original music,
and more.
midomi features an extensive digital music store with a growing collection of more than two
million legal music tracks. You can listen to samples of original recordings, buy the full studio
versions directly from midomi, and play them on your Windows computer or compatible
music players.
12
Musipedia
Musipedia is a search engine for identifying pieces of music. This can be done by whistling a
theme, playing it on a virtual piano keyboard, tapping the rhythm on the computer
keyboard, or entering the Parsons code. Anybody can modify the collection of melodies and
enter MIDI files, bitmaps with sheet music, lyrics or some text about the piece, or the
melodic contours as Parsons Code.
Musipedia's search engine works differently from that of search engines such as Shazam.
The latter can identify short snippets of audio (a few seconds taken from a recording), even
if it is transmitted over a phone connection. Shazam uses Audio Fingerprinting for that, a
technique that makes it possible to identify recordings. Musipedia, on the other hand, can
identify pieces of music that contain a given melody. Shazam finds exactly the recording that
contains a given snippet, but no other recordings of the same piece.
13
4. THE ART OF SINGING
It is obvious that people have imperfect memories for melodies or may lack any formal
singing practice.
1.People sing any part of the melody. A repetitive melodic passage in a song may represent
the ’hook-line’ of a song that ’gets stuck in people’s head’.
2.People sing at the wrong key. People chose a random pitch to start their singing. Only for
their most favorite songs, people are thought to have a latent ability of absolute pitch.
3. People sing at a reasonably correct global tempo. People knew or had a feeling, by
previous hearings, what the correct tempo would be and were able to approach this tempo
reasonably accurately. But still it is not possible to sing in correct tempo.
4.People sing too many or too few notes. Human memory is imperfect to recall all pitches
in the right order. People sang just the line they remembered. They also added all kinds of
ornaments (e.g., grace notes, filler notes, or thinner notes) to beautify their singing or to
ease the muscular motor processes involved in singing.
5.People sing the wrong intervals or confuse some with others. People sang about 59% of
the intervals correctly, though there were differences due to singing experience, song
familiarity and recent song exposure. Interval confusion seems to be symmetric;
interchanging an interval with another was found to be equally likely as the other way
around. A large interval (thirds and larger) tends to be more easily interchanged for another.
6. People sing the contour reasonably accurately. People largely knew when to go up and
when to go down in pitch when singing; they did that correctly in 80% of the times.
14
7. People with singing experience sing better on some aspects than people without singing
experience do. The non-experienced and experienced singers did not differ in singing the
contour of a melody accurately. However, experienced singers reproduced proportionally
more correct intervals and sang at a better timing.
8. People sing familiar melodies better than less familiar ones. Less familiar melodies were
reproduced with fewer notes and had proportionally fewer correct intervals than familiar
melodies. Also, both experienced and non-experienced singers improved their singing of
intervals when they had heard the melody very recently.
15
4.1 CHALLENGES
Building such a system, however, presents some significantly greater challenges than
creating a conventional text-based search engine. Unlike lyrical content, there exists no
intuitively obvious way to represent and store melodic content in a database. The chosen
representation must be indexable for efficient searching. Furthermore, several issues
unique to query by humming systems pose significant challenges to creating an efficient and
accurate music search system.
1. Users may not make perfect queries. Even if a user has a perfect memory of a particular
tune, he may start at the wrong key, or he may hum a few notes off-pitch throughout the
course of the tune. Sometimes he may even drop some notes entirely or add notes that did
not exist in the original melody. Additionally, no user is expected to be able to perfectly
hum at the same tempo as the songs stored in the database. Finally, since none of these
errors are mutually exclusive, a humming query may contain any combination of these
errors.
2. Accurately capturing pitches and notes from user hums is difficult, even if the user
manages to submit a perfect query. Currently existing software for converting raw audio
data into discrete pitch information is mediocre at best and oftentimes will introduce a
great deal of noise when extracting the pitches from a user’s hum.
3. Similarly, accurately capturing melodic information from a pre-recorded music file is
difficult. Properly extracting the melody from a given song is a field of study on its own but
is absolutely critical for an accurate query by would be of little use if the database contains
inaccurate representations of the target songs.
16
5.FILE FORMATS
Wav File Format
WAVE or WAV format is the short form of the Wave Audio File Format (rarely referred to as
the audio for Windows). WAV format compatible with Windows, Macintosh or Linux.
Despite the fact that the WAV file can hold compressed audio, the most common use is to
store it is just an uncompressed audio in linear PCM (LPCM). The standard format of Audio-
CD, for example, is the audio in LPCM, 2-channel, sampling frequency of 44,100 Hz and 16
bits per sample.
As a format, derived from the Resource Interchange File Format (RIFF), WAV-files can have
metadata (tags) in the chunk INFO. In addition, the WAV files can contain metadata
standard Extensible Metadata Platform (XMP).
Uncompressed WAV files are quite large in size, so, as file sharing over the Internet has
become popular, the WAV format has declined in popularity. However, it is still a widely
used, relatively "pure", i.e. lossless, file type, suitable for retaining "first generation"
archived files of high quality, or use on a system where high fidelity sound is required and
disk space is not restricted.
MIDI File Format
The term MIDI stands for Musical Instrument Digital Interface and is essentially a
communications protocol for computers and electronic musical instruments.
Although the produced MIDI files are not exactly the same as the typical digital audio
formats we use (like MP3, AAC, WMA, etc.) to listen to music, MIDI files can still be thought
of as digital music.
Rather than an actual audio recording stored as binary data, a MIDI file in its simplest form
is made up of information that describes what musical notes are to be played, along with
the types of instruments that are to be used
17
MIDI Files therefore do not contain any 'real world' recordings like voice (e.g. Audio books),
live performances, etc.,
However, MIDI files are very small and can be played on a wide range of devices that
support the MIDI protocol. Examples of hardware that can play MIDI files include: cell
phones, smart phones, and even your computer using the right software. Examples of MIDI
file format is Monophonic and polyphonic Ringtones.
In QBH system it is chose to create our database of songs using songs in the midi file format.
Because the midi representation already discretizes the notes, making it easier to extract
the pitch and timing information necessary for our song matching. Alternate music file
formats such as wav, mp3, aiff, etc. would require complicated waveform and signal
processing that could lead to many inaccuracies. Each of our songs is also mapped to a set
of metadata attributes such as song name and song artist for eventual display in the GUI
result list.
18
6. SYSTEM ARCHITECTURE
The architecture is illustrated in above Figure. Operation of the system is straight-forward.
Queries are hummed into a microphone, digitized, and fed into a pitch-tracking module. The
result, a contour representation of the hummed melody, is fed into the query engine, which
produces a ranked list of matching melodies. The database of melodies will be acquired by
processing public domain MIDI songs, and is stored as a flat file database. Pitch tracking can
be performed. Hummed queries may be recorded in a variety of formats. The query engine
uses an approximate pattern matching algorithm, in order to tolerate humming errors. The
melody database is essentially an indexed set of soundtracks. The acoustic query, which is
typically a few notes hummed by the user, is processed to detect its melody line. The
database is searched to find those songs that best match the query.
While the overall task is one that is easily performed by humans, many challenging
problems arise in the implementation of an automatic system. These include the signal
processing needed for extracting the melody from the stored audio and from the acoustic
query, and the pattern matching algorithms to achieve proper ranked retrieval. Further, a
robust system must be able to account for inaccuracies in the user’s singing
19
6.1 WAV TO MIDI CONVERSION
To create a MIDI a file for a song recorded in WAV format a musician must determine pitch,
velocity and duration of each note being played and record these parameters into a
sequence of MIDI events. The Midi created represents the basic melody and chords of
recognized music. The difference between WAV and MIDI formats consists in representation
of sound and music. WAV format is digital recording of any sound (including speech) and
MIDI format is principally sequence of notes (or MIDI events). Here we have an Output File
(.mid) from an Input File (.wav) that contains musical data, and a Tone File (.wav) that
consists of monotone data. An advantage of such a structure is also the fact that the query
is prepared on the client side of the system. In this case the query is very short. Besides,
there is a possibility to evaluate its quality before sending to the server. The system provides
for playback of the recognized melody notes in MIDI format. This allows the user to listen to
a query and take a decision either to send it to the server or to sing it once again.
20
7. PARSON CODE ALGORITHM
The Parsons code, formally named the Parsons Code for Melodic Contours, is a simple
notation used to identify a piece of music through melodic motion—the motion of
the pitch up and down. Denys Parsons developed this system for his 1975 book, The
Directory of Tunes and Musical Themes. Representing a melody in this manner makes it easy
to index or search for particular pieces.
User input to the system (humming) is converted into a sequence of relative pitch
transitions.
A note in the input is classified in one of three ways
1. U = "up," if the note is higher than the previous note
2. D = "down," if the note is lower than the previous note
3. r = "repeat," if the note is the same pitch as the previous note
4. * = first tone as reference
21
First note is C (72nd note). We will make it as reference note. And put the * Second note is
also C, Since it is repeating, we will put R. Next is G. G note is upper than C so we will put U
(U for upper) For second G , We put R. and so on.
This textual pattern will store into database for comparison.
Advantages
1. Pattern remains same, even if user hum the tune in different scale even if user hum
some note off key.
2. Require less space since it is stored in textual file
22
8. BENCHMARKING MUSIC INFORMATION RETRIEVAL SYSTEMS
Research Paper
Benchmarking Music Information Retrieval Systems
Josh Reiss Department of Electronic Engineering Queen Mary, University of London Mile End
Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk
Department of Electronic Engineering Queen Mary, University of London Mile End Road,
London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk
Mark Sandler Department of Electronic Engineering Queen Mary, University of London Mile
End Road, London E1 4NS UK +44-207-882-7680 mark.sandler@elec.qmul.ac.uk
--
Goal of this research paper is to create an accurate and effective benchmarking system for
music information retrieval (MIR) systems. This will serve the multiple purposes of inspiring
the MIR community to add additional features and increased speed into existing projects,
and to measure the performance of their work and incorporate the ideas of other works. To
date, there has been no systematic rigorous review of the field, and thus there is little
knowledge of when an MIR implementation might fail in a real world setting.
ONLINE MIR SYSTEMS
For the purposes of this work, we considered five online MIR systems. The systems
considered all have certain properties in common. They may all be used online via the World
Wide Web. They all are used by entering a query concerning a piece of music, and all may
return information about music that matches that query. However, these systems differ
greatly in their features, goals and implementation. These differences are discussed in detail
below.
CatFind
CatFind allows one to search MIDI files using either a musical transcription or a melodic
profile based on the Parson’s Code. It has minimal features, and was intended primarily for
demonstration. Although it seems unlikely that this system will be extended, it is still useful
here as a system for comparison.
23
MelDex
This allows searching of the New Zealand Digital Library. The MELody inDEX system is
designed to retrieve melodies from a database on the basis of a few notes sung into a
microphone. It accepts acoustic input from the user, transcribes it into common music
notation, then searches a database for tunes that contain the sung pattern, or patterns
similar to it. Thus the query is audio although the retrieved files are in symbolic
representation. Retrieval is ranked according to the closeness of the match. A variety of
different mechanisms are provided to control the search, depending on the precision of the
input.
MelodyHound
This melody recognition system was developed by Rainer Typke in 1997. It was originally
known as "Tuneserver" and hosted by the university of Karlsruhe. It searches directly on the
Parsons Code and was designed initially for Query By Whistling. That is, it will return the
song in the database that most closely matches a whistled query.
ThemeFinder
Themefinder, created by David Huron, et. al., allows one to identify common themes in
Western classical music, Folksongs, and latin Motets of the sixteenth century. Themefinder
provides a web-based interface to the Humdrum thema command, which in turn allows
searching of databases containing musical themes or incipits (opening note sequences).
Themes and incipits available through Themefinder are first encoded in the kern music data
format. Groups of incipits are assembled into databases. Currently there are three
databases: Classical Instrumental Music, European Folksongs, and Latin Motets from the
sixteenth century. Matched themes are displayed on-screen in graphical notation.
Music Retrieval Demo
The Music Retrieval Demo is notably different from the other MIR systems considered
herein. The Music Retrieval Demo performs similarity searches on raw audio data (WAV
files). No transcription of any kind is applied. It works by calculating the distance between
the selected file and all other files in the database. The other files can then be displayed in a
list ranked by their similarity, such that the more similar files are nearer the top. Distances
24
are computed between templates, which are representations of the audio files, not the
audio itself. The waveform is Hamming-windowed into overlapping segments; each segment
is processed into a spectral representation of Mel- frequency cepstral coefficients. This is a
data-reducing transformation that replaces each 20ms window with 12 cepstral coefficients
plus an energy term, yielding a 13-valued vector. The next step is to quantize each vector
using a specially- designed quantization tree. This recursively divides the vector space into
bins, each of which corresponds to a leaf of the tree. Any MFCC vector will fall into one and
only one bin. Given a segment of audio, the distribution of the vectors in the various bins
characterize that audio. Counting how many vectors fall into each bin yields a histogram
template that is used in the distance measure. For this demonstration, the distance
between audio files is the simple Euclidean distance between their corresponding templates
(or rather 1 minus the distance, so closer files have larger scores). Once scores have been
computed for each audio clip, they are sorted by magnitude to produce a ranked list like
other search engines.
COMPARISON OF MIR SYSTEMS
In Table 1, we present a comparison of the features of the various MIR systems under
investigation. Note first that each of these systems was designed for a different purpose,
25
and none of them can be considered a finished product. This table allows one to get an
overview of the state of the MIR systems available., the features that one may wish to
include in an MIR system, and the areas where improvement is most necessary. It also
highlights the need for a standardized testbed. Each of the MIR systems use a different
database of files for audio retrieval. Both CatFind and the Music Retrieval Demo have
databases with less than 500 files. Thus, any benchmarking estimates, such as retrieval
times and efficiency, are rendered useless. MelDex, MelodyHound and ThemeFinder have
databases containing over 10,000 files. This should be sufficient for estimating search
efficiency and salability.
EVALUATION ISSUES
Table 1 listed and compared the features available in existing online MIR systems. However,
this is not sufficient for effective benchmarking and evaluation of possible music
information retrieval systems that may appear in the near future and be used with large file
collection. The question of what features to evaluate is determined by what we can
measure that will reflect the ability of the system to satisfy the user. In a landmark paper,
Cleverdon[21] listed six main measurable quantities. This has become known as the
Cranfield model of information retrieval evaluation. Here, those properties are listed and
modified as applicable for MIR.
1. The coverage of the collection, that is, the extent to which the system includes relevant
matter.
2. The time lag, that is, the average interval between the time the search request is made
and the time an answer is given. Consideration should also be made of worst case or
close to worst case scenarios. It may be that certain genres or formats of music, as well
as certain types of queries, e. g., query and retrieval of polyphonic transcription based
audio may require far more time than other queries. Furthermore, if the testbed is
particularly large, dispersed or unindexed, such as with peer-to-peer based internet, then
bandwidth limitations and scalability may greatly reduce efficiency while maximizing the
collection size.
26
3. The form of presentation of the output. For MIR systems this not only means having the
option of retrieving various formats, symbolic and audio, but it also implies identifying
multiple performances of the same composition.
4. The effort involved on the part of the user in obtaining answers to his search requests. So
far, MIR research has been dominated by audio engineers, computer scientists,
musicologists and librarians. As the field expands to include developers and user
interface experts this issue will acquire more significance.
5. The recall of the system, that is, the proportion of relevant material actually retrieved in
answer to a search request;
6. The precision of the system, that is, the proportion of retrieved material that is actually
relevant.
27
9.CONCLUSION
Music retrieval is becoming more natural, simple and user friendly with the advancement of
QBH. Thus this technology will give broader application prospects for music retrieval.
Using Parson code algorithm it become easy to implement Query Matching System.
In this work, we have laid down a framework for benchmarking of future MIR systems. At
the moment, this field is in its infancy. There are only a handful of MIR systems available
online, each of which is quite limited in scope. Still, these benchmarking techniques were
applied to five online systems. Proposals were made concerning future benchmarking of full
online audio retrieval systems. It is hoped that these recommendations will be considered
and expanded upon as such systems become available.
28
10.REFERENCES
Benchmarking Music Information Retrieval Systems
Josh Reiss Department of Electronic Engineering Queen Mary, University of London Mile End
Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk
Mark Sandler Department of Electronic Engineering Queen Mary, University of London Mile
End Road, London E1 4NS UK +44-207-882-7680 mark.sandler@elec.qmul.ac.uk
A Query by Humming system using MPEG-7 Descriptors
Jan-Mark Batke, Gunnar Eisenberg, Philipp Weishaupt, and Thomas Sikora
Communication Systems Group, Technical University of Berlin
Correspondence should be addressed to Jan-Mark Batke (batke@nue.tu-berlin.de)
MusicDB: A Query by Humming System
Edmond Lau, Annie Ding, Calvin On
6.830: Database Systems Final Project Report Massachusetts Institute of Technology
{edmond, annie_d, calvinon}@mit.edu

Contenu connexe

Tendances

Apostila batidas de violao vol 2
Apostila batidas de violao   vol 2Apostila batidas de violao   vol 2
Apostila batidas de violao vol 2奈莫 里玛
 
Ninguém explica deus violin i
Ninguém explica deus   violin iNinguém explica deus   violin i
Ninguém explica deus violin iAlbertino Moura
 
Music of the Lowlands of Luzon
Music of the Lowlands  of LuzonMusic of the Lowlands  of Luzon
Music of the Lowlands of LuzonS Marley
 
Curso de flauta doce
Curso de flauta doceCurso de flauta doce
Curso de flauta doceJouhilton
 
CONTEMPORARY-ART-PRESENTATION.pdf
CONTEMPORARY-ART-PRESENTATION.pdfCONTEMPORARY-ART-PRESENTATION.pdf
CONTEMPORARY-ART-PRESENTATION.pdfAlexaArsenio
 
Caderno de exercicios flauta
Caderno de exercicios flautaCaderno de exercicios flauta
Caderno de exercicios flautaDyego Fernandes
 
Solfege for Dummies
Solfege for DummiesSolfege for Dummies
Solfege for Dummiesjimusik
 
Y en Triana la O - J.J. Espinosa de los Monteros - Set of Clarinets
Y en Triana la O - J.J. Espinosa de los Monteros - Set of ClarinetsY en Triana la O - J.J. Espinosa de los Monteros - Set of Clarinets
Y en Triana la O - J.J. Espinosa de los Monteros - Set of Clarinetskristal41
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x musicYi-Hsuan Yang
 
Harmonia Funcional - Sergio Solimando
Harmonia Funcional - Sergio SolimandoHarmonia Funcional - Sergio Solimando
Harmonia Funcional - Sergio SolimandoSergio Solimando
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technologyShital Kat
 
Método de trombone para iniciantes gilberto gagliardi
Método de trombone para iniciantes   gilberto gagliardiMétodo de trombone para iniciantes   gilberto gagliardi
Método de trombone para iniciantes gilberto gagliardiWagner Recart
 
Ncea level 1 music theory
Ncea level 1 music theoryNcea level 1 music theory
Ncea level 1 music theoryMary Lin
 

Tendances (18)

Harmony.ppt
Harmony.pptHarmony.ppt
Harmony.ppt
 
Libras parte-2
Libras parte-2Libras parte-2
Libras parte-2
 
Apostila batidas de violao vol 2
Apostila batidas de violao   vol 2Apostila batidas de violao   vol 2
Apostila batidas de violao vol 2
 
Ninguém explica deus violin i
Ninguém explica deus   violin iNinguém explica deus   violin i
Ninguém explica deus violin i
 
Music of the Lowlands of Luzon
Music of the Lowlands  of LuzonMusic of the Lowlands  of Luzon
Music of the Lowlands of Luzon
 
Curso de flauta doce
Curso de flauta doceCurso de flauta doce
Curso de flauta doce
 
CONTEMPORARY-ART-PRESENTATION.pdf
CONTEMPORARY-ART-PRESENTATION.pdfCONTEMPORARY-ART-PRESENTATION.pdf
CONTEMPORARY-ART-PRESENTATION.pdf
 
Caderno de exercicios flauta
Caderno de exercicios flautaCaderno de exercicios flauta
Caderno de exercicios flauta
 
Solfege for Dummies
Solfege for DummiesSolfege for Dummies
Solfege for Dummies
 
Y en Triana la O - J.J. Espinosa de los Monteros - Set of Clarinets
Y en Triana la O - J.J. Espinosa de los Monteros - Set of ClarinetsY en Triana la O - J.J. Espinosa de los Monteros - Set of Clarinets
Y en Triana la O - J.J. Espinosa de los Monteros - Set of Clarinets
 
6 Line Rhyming Schemes
6 Line Rhyming Schemes6 Line Rhyming Schemes
6 Line Rhyming Schemes
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x music
 
Digitação tuba[v1]
Digitação tuba[v1]Digitação tuba[v1]
Digitação tuba[v1]
 
Harmonia Funcional - Sergio Solimando
Harmonia Funcional - Sergio SolimandoHarmonia Funcional - Sergio Solimando
Harmonia Funcional - Sergio Solimando
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technology
 
Método de trombone para iniciantes gilberto gagliardi
Método de trombone para iniciantes   gilberto gagliardiMétodo de trombone para iniciantes   gilberto gagliardi
Método de trombone para iniciantes gilberto gagliardi
 
Ncea level 1 music theory
Ncea level 1 music theoryNcea level 1 music theory
Ncea level 1 music theory
 
Da Capo - Tuba Eb
Da Capo - Tuba EbDa Capo - Tuba Eb
Da Capo - Tuba Eb
 

Similaire à Query By Humming - Music Retrieval Technique

Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier
 
groman_shin_finalreport
groman_shin_finalreportgroman_shin_finalreport
groman_shin_finalreportDaniel Shin
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDJaideep Ghosh
 
Application of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music GenerationApplication of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music GenerationIRJET Journal
 
How Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be UsedHow Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be Usedlahtrumpet
 
How Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be UsedHow Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be Usedlahtrumpet
 
Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2JordanSmith96
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsstephlizahawkins123
 
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Kosetsu Tsukuda
 
web based music genre classification.pptx
web based music genre classification.pptxweb based music genre classification.pptx
web based music genre classification.pptxUmaMahesh786960
 
IRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET Journal
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicEric Battenberg
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMLconf
 
Project presentation.pptx
Project presentation.pptxProject presentation.pptx
Project presentation.pptxSundaresanB5
 

Similaire à Query By Humming - Music Retrieval Technique (20)

Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
groman_shin_finalreport
groman_shin_finalreportgroman_shin_finalreport
groman_shin_finalreport
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUND
 
Application of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music GenerationApplication of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music Generation
 
auto_playlist
auto_playlistauto_playlist
auto_playlist
 
Emofy
Emofy Emofy
Emofy
 
Ism2011
Ism2011Ism2011
Ism2011
 
How Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be UsedHow Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be Used
 
How Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be UsedHow Can The Essen Associative Code Be Used
How Can The Essen Associative Code Be Used
 
Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2Jordan smith ig2 task 1 revisited v2
Jordan smith ig2 task 1 revisited v2
 
Ig2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkinsIg2 task 1 work sheet (glossary) steph hawkins
Ig2 task 1 work sheet (glossary) steph hawkins
 
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
Toward an Understanding of Lyrics-viewing Behavior While Listening to Music o...
 
web based music genre classification.pptx
web based music genre classification.pptxweb based music genre classification.pptx
web based music genre classification.pptx
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
IRJET- The Complete Music Player
IRJET- The Complete Music PlayerIRJET- The Complete Music Player
IRJET- The Complete Music Player
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
 
Project presentation.pptx
Project presentation.pptxProject presentation.pptx
Project presentation.pptx
 

Plus de Shital Kat

Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningShital Kat
 
Introduction to HADOOP
Introduction to HADOOPIntroduction to HADOOP
Introduction to HADOOPShital Kat
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
School admission process management system (Documention)
School admission process management system (Documention)School admission process management system (Documention)
School admission process management system (Documention)Shital Kat
 
WiFi technology Writeup
WiFi technology WriteupWiFi technology Writeup
WiFi technology WriteupShital Kat
 
WIFI Introduction (PART I)
WIFI Introduction (PART I)WIFI Introduction (PART I)
WIFI Introduction (PART I)Shital Kat
 

Plus de Shital Kat (8)

Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Introduction to HADOOP
Introduction to HADOOPIntroduction to HADOOP
Introduction to HADOOP
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
School admission process management system (Documention)
School admission process management system (Documention)School admission process management system (Documention)
School admission process management system (Documention)
 
WiFi technology Writeup
WiFi technology WriteupWiFi technology Writeup
WiFi technology Writeup
 
Wifi Security
Wifi SecurityWifi Security
Wifi Security
 
WiFi part II
WiFi part IIWiFi part II
WiFi part II
 
WIFI Introduction (PART I)
WIFI Introduction (PART I)WIFI Introduction (PART I)
WIFI Introduction (PART I)
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Query By Humming - Music Retrieval Technique

  • 1. Veermata Jijabai Technological Institute 132011005 QUERY BY HUMMING... Seminar Report Shital Katkar
  • 2. 2 SEMINARS OF SEMISTER – II [ YEAR 2013-2014 ] NAME: SHITAL KATKAR TOPIC : Query By Humming SIGNATURE:________________
  • 3. 3 INDEX 1 Introduction 1.1 Query By Humming 2 Basic Architecture 2.1 Extraction 2.2 Transcription 2.3 Comparison 3 Applications 3.1 Shazam 3.2 Sound-Hound 3.3 Midomi 3.4 Musipedia 4 The art of Singing 4.1 Challenges 5 File Formats 5.1 Wav File format 5.2 MIDI File format 6 System Architecture 6.1 Wav to MIDI conversion 7 Parson Code algorithm 7.1 Rules 7.2 Advantages 8 Benchmarking MIR System 8.1 Online MIR System 8.1.1 CatFind 8.1.2 MelDex 8.1.3 MelodyHound 8.1.4 ThemeFinder 8.1.5 Music Retrieval Demo
  • 4. 4 8.2 Comparison of MIR System 8.3 Evaluation Issues 8.4 Subjective and objective testing 9 Conclusion
  • 5. 5 1. INTRODUCTION Many people often remember as short tidbit of a song but fail to recall the song's name. If you can remember lyrics that correspond to the song you are trying to recall, finding the song is as easy as performing a text query on a web search engine. A query by humming system allows a user to find a song even if he merely knows the tune from part of the melody. • “I don’t know the name. I don’t know who does it. • But I can’t get this song out of my head.” • Well, why not just hum it. Query by humming System It is a music retrieval technology in which users can hum or sing a melody to retrieve the song. The user simply sings or hums the tune into a computer microphone, and the system searches through a database of song for melodies containing the tune and returns a ranked list of search results. Thus user can then find the desired song by listening to the results.
  • 6. 6 A Query by Humming (QBH) system enables a user to hum a melody into a microphone connected to a computer in order to retrieve a list of possible song titles that match the query melody. The system analyzes the melodic and rhythmic information of the input signal. The extracted data set is used as a database query. The result is presented as a list of e.g. ten best matching results. Generally, a QBH system is a Music Information Retrieval (MIR) system. A MIR systems provides several means for music retrieval, which can be hummed audio signal, but also music genre classification or text information about the artist or title.
  • 7. 7 2. BASIC ARCHITECTURE Fig- Basic System Architecture The basic architecture of the system is depicted in above figure. A microphone takes the hummed input and sends this as a PCM signal to extraction block. The extracted information results here which is given to the transcription part. The transcription block forms Melody Contour to be compared with all contours residing in the database. A result list is finally presented to the user. Extraction The extraction block is also referred as the acoustic front end. After recording the signal with a computer sound card the signal is band pass filtered to reduce environmental noise and distortion. In this system a sampling rate of 8000 Hz is used. The signal is band limited to 80 to 800 Hz, which is sufficient for sung input. This frequency range corresponds to a musical note range of D2–G5. Transcription The transcription block transcribes the extracted information into the representation that is needed for comparison. The main task is to segment the input stream into single notes. This can be done using parson code algorithm.
  • 8. 8 Comparison The transcription result is used as database query. Several distance measures can be used to find a similar piece of music. The database contains a collection of already transcribed melodies formatted according to the MelodyContourType. The Result is finally presented to the user.
  • 9. 9 3. APPLICATIONS These are some examples of QBH Systems. Shazam Shazam is a commercial mobile phone-based music identification service. The company was founded in 1999 by Chris Barton, Philip Inghelbrecht, Avery Wang and Dhiraj Mukherjee. Shazam uses a mobile phone's built-in microphone to gather a brief sample of music being played. An acoustic fingerprint is created based on the sample, and is compared against a central database for a match. If a match is found, information such as the artist, song title, and album are relayed back to the user. Shazam can identify prerecorded music being broadcast from any source, such as a radio, television, cinema or club, provided that the background noise level is not high enough to prevent an acoustic fingerprint being taken, and that the song is present in the software's database.
  • 10. 10 SoundHound SoundHound (known as Midomi until December 2009) is a mobile device service that allows users to identify music by humming, singing or playing a recorded track. The service was launched by Melodis Corporation (now SoundHound Inc), under Chief Executive Keyvan Mohajer in 2007 and has received funding from Global Catalyst Partners, TransLink Capital and Walden Venture Capital. SoundHound is a music search engine available on the Apple App Store, Google Play, Windows Phone Store, and on June 5, 2013, was available on the BlackBerry 10 platform. It enables users to identify music by playing, singing or humming a piece. It is also possible to speak or type the name of the artist, composer, song and piece. Unlike competitor Shazam, SoundHound can recognise tracks from singing, humming, speaking, or typing, as well as from a recording. Sound matching is achieved through the company's 'Sound2Sound' technology, which can match even poorly-hummed performances to professional recordings.
  • 11. 11 Midomi Midomi is the ultimate music search tool. Sing, hum, or whistle to instantly find your favorite music and connect with a community that shares your musical interests. At midomi you can create your own profile, sing your favorite songs and share them with your friends and get discovered by other midomi users. You can listen to and rate other users' musical performances, see their pictures, send them messages, buy original music, and more. midomi features an extensive digital music store with a growing collection of more than two million legal music tracks. You can listen to samples of original recordings, buy the full studio versions directly from midomi, and play them on your Windows computer or compatible music players.
  • 12. 12 Musipedia Musipedia is a search engine for identifying pieces of music. This can be done by whistling a theme, playing it on a virtual piano keyboard, tapping the rhythm on the computer keyboard, or entering the Parsons code. Anybody can modify the collection of melodies and enter MIDI files, bitmaps with sheet music, lyrics or some text about the piece, or the melodic contours as Parsons Code. Musipedia's search engine works differently from that of search engines such as Shazam. The latter can identify short snippets of audio (a few seconds taken from a recording), even if it is transmitted over a phone connection. Shazam uses Audio Fingerprinting for that, a technique that makes it possible to identify recordings. Musipedia, on the other hand, can identify pieces of music that contain a given melody. Shazam finds exactly the recording that contains a given snippet, but no other recordings of the same piece.
  • 13. 13 4. THE ART OF SINGING It is obvious that people have imperfect memories for melodies or may lack any formal singing practice. 1.People sing any part of the melody. A repetitive melodic passage in a song may represent the ’hook-line’ of a song that ’gets stuck in people’s head’. 2.People sing at the wrong key. People chose a random pitch to start their singing. Only for their most favorite songs, people are thought to have a latent ability of absolute pitch. 3. People sing at a reasonably correct global tempo. People knew or had a feeling, by previous hearings, what the correct tempo would be and were able to approach this tempo reasonably accurately. But still it is not possible to sing in correct tempo. 4.People sing too many or too few notes. Human memory is imperfect to recall all pitches in the right order. People sang just the line they remembered. They also added all kinds of ornaments (e.g., grace notes, filler notes, or thinner notes) to beautify their singing or to ease the muscular motor processes involved in singing. 5.People sing the wrong intervals or confuse some with others. People sang about 59% of the intervals correctly, though there were differences due to singing experience, song familiarity and recent song exposure. Interval confusion seems to be symmetric; interchanging an interval with another was found to be equally likely as the other way around. A large interval (thirds and larger) tends to be more easily interchanged for another. 6. People sing the contour reasonably accurately. People largely knew when to go up and when to go down in pitch when singing; they did that correctly in 80% of the times.
  • 14. 14 7. People with singing experience sing better on some aspects than people without singing experience do. The non-experienced and experienced singers did not differ in singing the contour of a melody accurately. However, experienced singers reproduced proportionally more correct intervals and sang at a better timing. 8. People sing familiar melodies better than less familiar ones. Less familiar melodies were reproduced with fewer notes and had proportionally fewer correct intervals than familiar melodies. Also, both experienced and non-experienced singers improved their singing of intervals when they had heard the melody very recently.
  • 15. 15 4.1 CHALLENGES Building such a system, however, presents some significantly greater challenges than creating a conventional text-based search engine. Unlike lyrical content, there exists no intuitively obvious way to represent and store melodic content in a database. The chosen representation must be indexable for efficient searching. Furthermore, several issues unique to query by humming systems pose significant challenges to creating an efficient and accurate music search system. 1. Users may not make perfect queries. Even if a user has a perfect memory of a particular tune, he may start at the wrong key, or he may hum a few notes off-pitch throughout the course of the tune. Sometimes he may even drop some notes entirely or add notes that did not exist in the original melody. Additionally, no user is expected to be able to perfectly hum at the same tempo as the songs stored in the database. Finally, since none of these errors are mutually exclusive, a humming query may contain any combination of these errors. 2. Accurately capturing pitches and notes from user hums is difficult, even if the user manages to submit a perfect query. Currently existing software for converting raw audio data into discrete pitch information is mediocre at best and oftentimes will introduce a great deal of noise when extracting the pitches from a user’s hum. 3. Similarly, accurately capturing melodic information from a pre-recorded music file is difficult. Properly extracting the melody from a given song is a field of study on its own but is absolutely critical for an accurate query by would be of little use if the database contains inaccurate representations of the target songs.
  • 16. 16 5.FILE FORMATS Wav File Format WAVE or WAV format is the short form of the Wave Audio File Format (rarely referred to as the audio for Windows). WAV format compatible with Windows, Macintosh or Linux. Despite the fact that the WAV file can hold compressed audio, the most common use is to store it is just an uncompressed audio in linear PCM (LPCM). The standard format of Audio- CD, for example, is the audio in LPCM, 2-channel, sampling frequency of 44,100 Hz and 16 bits per sample. As a format, derived from the Resource Interchange File Format (RIFF), WAV-files can have metadata (tags) in the chunk INFO. In addition, the WAV files can contain metadata standard Extensible Metadata Platform (XMP). Uncompressed WAV files are quite large in size, so, as file sharing over the Internet has become popular, the WAV format has declined in popularity. However, it is still a widely used, relatively "pure", i.e. lossless, file type, suitable for retaining "first generation" archived files of high quality, or use on a system where high fidelity sound is required and disk space is not restricted. MIDI File Format The term MIDI stands for Musical Instrument Digital Interface and is essentially a communications protocol for computers and electronic musical instruments. Although the produced MIDI files are not exactly the same as the typical digital audio formats we use (like MP3, AAC, WMA, etc.) to listen to music, MIDI files can still be thought of as digital music. Rather than an actual audio recording stored as binary data, a MIDI file in its simplest form is made up of information that describes what musical notes are to be played, along with the types of instruments that are to be used
  • 17. 17 MIDI Files therefore do not contain any 'real world' recordings like voice (e.g. Audio books), live performances, etc., However, MIDI files are very small and can be played on a wide range of devices that support the MIDI protocol. Examples of hardware that can play MIDI files include: cell phones, smart phones, and even your computer using the right software. Examples of MIDI file format is Monophonic and polyphonic Ringtones. In QBH system it is chose to create our database of songs using songs in the midi file format. Because the midi representation already discretizes the notes, making it easier to extract the pitch and timing information necessary for our song matching. Alternate music file formats such as wav, mp3, aiff, etc. would require complicated waveform and signal processing that could lead to many inaccuracies. Each of our songs is also mapped to a set of metadata attributes such as song name and song artist for eventual display in the GUI result list.
  • 18. 18 6. SYSTEM ARCHITECTURE The architecture is illustrated in above Figure. Operation of the system is straight-forward. Queries are hummed into a microphone, digitized, and fed into a pitch-tracking module. The result, a contour representation of the hummed melody, is fed into the query engine, which produces a ranked list of matching melodies. The database of melodies will be acquired by processing public domain MIDI songs, and is stored as a flat file database. Pitch tracking can be performed. Hummed queries may be recorded in a variety of formats. The query engine uses an approximate pattern matching algorithm, in order to tolerate humming errors. The melody database is essentially an indexed set of soundtracks. The acoustic query, which is typically a few notes hummed by the user, is processed to detect its melody line. The database is searched to find those songs that best match the query. While the overall task is one that is easily performed by humans, many challenging problems arise in the implementation of an automatic system. These include the signal processing needed for extracting the melody from the stored audio and from the acoustic query, and the pattern matching algorithms to achieve proper ranked retrieval. Further, a robust system must be able to account for inaccuracies in the user’s singing
  • 19. 19 6.1 WAV TO MIDI CONVERSION To create a MIDI a file for a song recorded in WAV format a musician must determine pitch, velocity and duration of each note being played and record these parameters into a sequence of MIDI events. The Midi created represents the basic melody and chords of recognized music. The difference between WAV and MIDI formats consists in representation of sound and music. WAV format is digital recording of any sound (including speech) and MIDI format is principally sequence of notes (or MIDI events). Here we have an Output File (.mid) from an Input File (.wav) that contains musical data, and a Tone File (.wav) that consists of monotone data. An advantage of such a structure is also the fact that the query is prepared on the client side of the system. In this case the query is very short. Besides, there is a possibility to evaluate its quality before sending to the server. The system provides for playback of the recognized melody notes in MIDI format. This allows the user to listen to a query and take a decision either to send it to the server or to sing it once again.
  • 20. 20 7. PARSON CODE ALGORITHM The Parsons code, formally named the Parsons Code for Melodic Contours, is a simple notation used to identify a piece of music through melodic motion—the motion of the pitch up and down. Denys Parsons developed this system for his 1975 book, The Directory of Tunes and Musical Themes. Representing a melody in this manner makes it easy to index or search for particular pieces. User input to the system (humming) is converted into a sequence of relative pitch transitions. A note in the input is classified in one of three ways 1. U = "up," if the note is higher than the previous note 2. D = "down," if the note is lower than the previous note 3. r = "repeat," if the note is the same pitch as the previous note 4. * = first tone as reference
  • 21. 21 First note is C (72nd note). We will make it as reference note. And put the * Second note is also C, Since it is repeating, we will put R. Next is G. G note is upper than C so we will put U (U for upper) For second G , We put R. and so on. This textual pattern will store into database for comparison. Advantages 1. Pattern remains same, even if user hum the tune in different scale even if user hum some note off key. 2. Require less space since it is stored in textual file
  • 22. 22 8. BENCHMARKING MUSIC INFORMATION RETRIEVAL SYSTEMS Research Paper Benchmarking Music Information Retrieval Systems Josh Reiss Department of Electronic Engineering Queen Mary, University of London Mile End Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk Department of Electronic Engineering Queen Mary, University of London Mile End Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk Mark Sandler Department of Electronic Engineering Queen Mary, University of London Mile End Road, London E1 4NS UK +44-207-882-7680 mark.sandler@elec.qmul.ac.uk -- Goal of this research paper is to create an accurate and effective benchmarking system for music information retrieval (MIR) systems. This will serve the multiple purposes of inspiring the MIR community to add additional features and increased speed into existing projects, and to measure the performance of their work and incorporate the ideas of other works. To date, there has been no systematic rigorous review of the field, and thus there is little knowledge of when an MIR implementation might fail in a real world setting. ONLINE MIR SYSTEMS For the purposes of this work, we considered five online MIR systems. The systems considered all have certain properties in common. They may all be used online via the World Wide Web. They all are used by entering a query concerning a piece of music, and all may return information about music that matches that query. However, these systems differ greatly in their features, goals and implementation. These differences are discussed in detail below. CatFind CatFind allows one to search MIDI files using either a musical transcription or a melodic profile based on the Parson’s Code. It has minimal features, and was intended primarily for demonstration. Although it seems unlikely that this system will be extended, it is still useful here as a system for comparison.
  • 23. 23 MelDex This allows searching of the New Zealand Digital Library. The MELody inDEX system is designed to retrieve melodies from a database on the basis of a few notes sung into a microphone. It accepts acoustic input from the user, transcribes it into common music notation, then searches a database for tunes that contain the sung pattern, or patterns similar to it. Thus the query is audio although the retrieved files are in symbolic representation. Retrieval is ranked according to the closeness of the match. A variety of different mechanisms are provided to control the search, depending on the precision of the input. MelodyHound This melody recognition system was developed by Rainer Typke in 1997. It was originally known as "Tuneserver" and hosted by the university of Karlsruhe. It searches directly on the Parsons Code and was designed initially for Query By Whistling. That is, it will return the song in the database that most closely matches a whistled query. ThemeFinder Themefinder, created by David Huron, et. al., allows one to identify common themes in Western classical music, Folksongs, and latin Motets of the sixteenth century. Themefinder provides a web-based interface to the Humdrum thema command, which in turn allows searching of databases containing musical themes or incipits (opening note sequences). Themes and incipits available through Themefinder are first encoded in the kern music data format. Groups of incipits are assembled into databases. Currently there are three databases: Classical Instrumental Music, European Folksongs, and Latin Motets from the sixteenth century. Matched themes are displayed on-screen in graphical notation. Music Retrieval Demo The Music Retrieval Demo is notably different from the other MIR systems considered herein. The Music Retrieval Demo performs similarity searches on raw audio data (WAV files). No transcription of any kind is applied. It works by calculating the distance between the selected file and all other files in the database. The other files can then be displayed in a list ranked by their similarity, such that the more similar files are nearer the top. Distances
  • 24. 24 are computed between templates, which are representations of the audio files, not the audio itself. The waveform is Hamming-windowed into overlapping segments; each segment is processed into a spectral representation of Mel- frequency cepstral coefficients. This is a data-reducing transformation that replaces each 20ms window with 12 cepstral coefficients plus an energy term, yielding a 13-valued vector. The next step is to quantize each vector using a specially- designed quantization tree. This recursively divides the vector space into bins, each of which corresponds to a leaf of the tree. Any MFCC vector will fall into one and only one bin. Given a segment of audio, the distribution of the vectors in the various bins characterize that audio. Counting how many vectors fall into each bin yields a histogram template that is used in the distance measure. For this demonstration, the distance between audio files is the simple Euclidean distance between their corresponding templates (or rather 1 minus the distance, so closer files have larger scores). Once scores have been computed for each audio clip, they are sorted by magnitude to produce a ranked list like other search engines. COMPARISON OF MIR SYSTEMS In Table 1, we present a comparison of the features of the various MIR systems under investigation. Note first that each of these systems was designed for a different purpose,
  • 25. 25 and none of them can be considered a finished product. This table allows one to get an overview of the state of the MIR systems available., the features that one may wish to include in an MIR system, and the areas where improvement is most necessary. It also highlights the need for a standardized testbed. Each of the MIR systems use a different database of files for audio retrieval. Both CatFind and the Music Retrieval Demo have databases with less than 500 files. Thus, any benchmarking estimates, such as retrieval times and efficiency, are rendered useless. MelDex, MelodyHound and ThemeFinder have databases containing over 10,000 files. This should be sufficient for estimating search efficiency and salability. EVALUATION ISSUES Table 1 listed and compared the features available in existing online MIR systems. However, this is not sufficient for effective benchmarking and evaluation of possible music information retrieval systems that may appear in the near future and be used with large file collection. The question of what features to evaluate is determined by what we can measure that will reflect the ability of the system to satisfy the user. In a landmark paper, Cleverdon[21] listed six main measurable quantities. This has become known as the Cranfield model of information retrieval evaluation. Here, those properties are listed and modified as applicable for MIR. 1. The coverage of the collection, that is, the extent to which the system includes relevant matter. 2. The time lag, that is, the average interval between the time the search request is made and the time an answer is given. Consideration should also be made of worst case or close to worst case scenarios. It may be that certain genres or formats of music, as well as certain types of queries, e. g., query and retrieval of polyphonic transcription based audio may require far more time than other queries. Furthermore, if the testbed is particularly large, dispersed or unindexed, such as with peer-to-peer based internet, then bandwidth limitations and scalability may greatly reduce efficiency while maximizing the collection size.
  • 26. 26 3. The form of presentation of the output. For MIR systems this not only means having the option of retrieving various formats, symbolic and audio, but it also implies identifying multiple performances of the same composition. 4. The effort involved on the part of the user in obtaining answers to his search requests. So far, MIR research has been dominated by audio engineers, computer scientists, musicologists and librarians. As the field expands to include developers and user interface experts this issue will acquire more significance. 5. The recall of the system, that is, the proportion of relevant material actually retrieved in answer to a search request; 6. The precision of the system, that is, the proportion of retrieved material that is actually relevant.
  • 27. 27 9.CONCLUSION Music retrieval is becoming more natural, simple and user friendly with the advancement of QBH. Thus this technology will give broader application prospects for music retrieval. Using Parson code algorithm it become easy to implement Query Matching System. In this work, we have laid down a framework for benchmarking of future MIR systems. At the moment, this field is in its infancy. There are only a handful of MIR systems available online, each of which is quite limited in scope. Still, these benchmarking techniques were applied to five online systems. Proposals were made concerning future benchmarking of full online audio retrieval systems. It is hoped that these recommendations will be considered and expanded upon as such systems become available.
  • 28. 28 10.REFERENCES Benchmarking Music Information Retrieval Systems Josh Reiss Department of Electronic Engineering Queen Mary, University of London Mile End Road, London E1 4NS UK +44-207-882-5528 josh.reiss@elec.qmul.ac.uk Mark Sandler Department of Electronic Engineering Queen Mary, University of London Mile End Road, London E1 4NS UK +44-207-882-7680 mark.sandler@elec.qmul.ac.uk A Query by Humming system using MPEG-7 Descriptors Jan-Mark Batke, Gunnar Eisenberg, Philipp Weishaupt, and Thomas Sikora Communication Systems Group, Technical University of Berlin Correspondence should be addressed to Jan-Mark Batke (batke@nue.tu-berlin.de) MusicDB: A Query by Humming System Edmond Lau, Annie Ding, Calvin On 6.830: Database Systems Final Project Report Massachusetts Institute of Technology {edmond, annie_d, calvinon}@mit.edu