SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Volume 3, Number 4, December 2012                                                                                       Journal of Convergence



      Novel Query-by-Humming/Singing Method with
                 Fuzzy Inference System
                       Yo-Ping Huang                                                                   Shin-Liang Lai
          Department of Electrical Engineering                                       Department of Computer Science and Engineering
         National Taipei University of Technology                                                  Tatung University
                   Taipei, Taiwan 10608                                                          Taipei, Taiwan 10451
                  yphuang@ntut.edu.tw                                                             sinla.lai@gmail.com



Abstract—Music Information Retrieval (MIR) is a crucial topic in              means for a MIR inquiry can obtain a desirable effect without
the domain of information retrieval. According to the major                   other apparatus. However, compared with other methods, it is
characteristics of music, the Query-by-Humming system retrieves               more difficult to retrieve related music and it is always
interesting music by finding melodies that contains similar or                returned with a lower accuracy, especially when using singing
equal melodies to the humming query. Basing on the fuzzy
                                                                              as the way of query the accuracy [6].
inference model designed in this paper, a novel Query-by-
Humming/Singing system is proposed to extract pitch contour                          To translate the hummed melody into the contrasting
information from WAV and MIDI files. To verify the                            MIDI format, many researchers have described multi Query-
effectiveness of the presented work, the MIREX QBSH Database                  by-Humming systems with melody comparison [12]. MIDI
is employed as our experimental database and a large amount of                can be regarded as a music format expressed in words or
human vocal data is used as queries to test the robustness of the             numbers. Therefore, most researchers change the melody into
MIR. Then, the Longest Common Subsequence (LCS) is used as                    a series of symbolic representations to be compared with the
an approximate matching algorithm to identify the top 5 music                 MIDI. When a query has been changed into symbolic
samples as an evaluation standard for the system. Experimental                representation, it can be proceeded to contrast with a melody
results show that the proposed system achieves 85% accuracy in
                                                                              that has been previously coped with. Several methods have
the top 5 retrievals.
                                                                              been proposed for comparison, such as granular events [13],
    Keywords-MIDI; query-by-humming; pitch contour; fuzzy                     hidden Markov models [14] [15], string matching [16] [17],
inference system.                                                             dynamic programming [18] [19], dynamic time warping
                                                                              (DTW) [20] [21], and tree-based searching [22] [23]. At the
                       I. INTRODUCTION                                        end of the contrasting procedures, MIR can then select the
                                                                              item that matches the guiding rules.
    With its extensive resources, Internet services prevail over                  The remainder of this paper is organised as follows.
whole areas, including the music market. The Internet is                      Section 2 starts with some basic definitions and notations used
something like a huge database that satisfies all kinds of needs              throughout the paper, and provides a brief look at the previous
of its client at all times. For instance, many researchers study              work on this subject. Section 3 presents the details of our
Music Information Retrieval (MIR) for the purpose of finding                  approach. In Section 4, a novel MIR is presented and test
quickly and correctly the programs a user needs from a vast                   results are discussed. Finally, Section 5 concludes the paper
music database. To achieve the goal, researchers have probed                  and gives a perspective on further study on this subject.
into all types of music format, including MIDI, MP3, Wave,
and Voice. In recent years, more and more researchers rely on                                      II. RELATED WORK
MIDI (Musical Instrument Data Interface) as their research                        Ghias et al. [25] built a MIR to process MIDI files in 1995.
focus due to the fact that MIDI has superiority over other                    They used three symbols (U, D and S) to depict the three
music formats. For example, MIDI can be played with                           different levels of pitch contour. U represents up, D is down
electronic synthesizers and the pitch or length can be changed                and S is the same. In any time series graph those are the only
according to a user’s needs. With a smaller size of file forms,               three possibilities for a graph to grow, shrink or remain the
MIDI is easier to record and thus is widely applied in karaoke.               same.
With this advantage, MIDI has its characteristics in saving                       Many researchers ameliorate afterwards with this, such as
storage space, speeding up enquiries, and raising the accuracy                Typke et al. [26] who researched the Parsons Code. They also
of enquiry results.                                                           use the pitch contour from humming, and after acquiring the
    However, the introduction of MIR is accompanied with                      pitch contour represent them as U, D, and R strings. McNab et
questions on query methods when considering how to extract                    al. [27] incorporated rhythm with the idea of pitch contours.
and represent features of a query melody. Previous studies                    Sonoda et al. [28], Li et al. [7], and Mo et al. [29] considered
have developed many query models, including Query-by-                         the characteristic of duration, and used L (longer), R (Repeat),
Humming (Singing) [1]–[7], Query-by-Tapping [8], Query-                       and S (Same) to represent the changes.
by-Example [9], Query-by-Tag [10], and Query-by-                                  Tom et al. [30] used a dynamically-calculated threshold,
Description [11]. Many researchers are interested in Query-by-                applied it to get the contour of the signal to segment notes, and
Humming (QBH), because humming is the simplest and most                       used autocorrelation to detect pitches. Then notes are
direct way for people to express music. Using QBH as the

      Copyright ⓒ 2010 Future Technology Research Association International                                                                       1
Journal of Convergence                                                                                  Volume 3, Number 4, December 2012

represented as a string sequence of U, D, and R. Raju et al.             audio input in real-time, by tracking note dynamics and pitch
[31] proposed a similar approach to represent the melody. The            bends, and using different harmonic models to improve the
melodies in the database are indexed by the U, D, and S                  recognition of appropriate instruments. In the meantime, v3.0
strings. They used a time-domain autocorrelation function for            has increased a lot in terms of identification rate. Hence we
pitch extraction and gave a dynamic programming-based edit               also use the Akoff Music Composer as a tool for transforming
distance algorithm as a similarity metric. But these techniques          WAV to MIDI as shown in Fig. 2.
are time-consuming.
      Therefore, in this paper we propose different pitch
contour coding methods that use a large amount of human
                                                                                 WAV File                                    MIDI
vocal data as the query and combine Fuzzy Inference System
                                                                                                                            Database
(FIS) to search the MIDI database for testing the accuracy of
MIR.
                III. THE PROPOSED APPROACH                                        Translate
                                                                                                                             Melody
                                                                                     to
    Since the user’s query is an important factor that will                        MIDI            Pitch Contour            Extraction
influence the accuracy of the results, massive and diverse                                            Database
query data are experimented with to test the accuracy of the
system. To solve the above problems, the MIREX QBSH                                Fuzzy                                      Fuzzy
Database is employed as our experimental database, which is                      Inference                                  Inference
unique in that it contains a large amount of query data that                      System                 LCS                 System
consists of different individuals and sexes. In general, there                                          String
are several steps to establish a Query-by-Humming system:                                              Matching
                                                                                   Pitch                                      Pitch
    1. Building a music database: whatever the music format,                      Contour                                    Contour
       the first step is to collect massive music programs in the
       database, proceed to pre-processing or format
       transference after comparison, and save the processed                                        Ranked List
       results into the database.
    2. Inputting query: based on the above discussions, the
                                                                                               Figure 1. System framework.
       query may have multi modes, for example picking up
       the features of the query after the user inputs the query
       and then transforming them into pre-defined formats for
       comparison.
    3. Comparing procedures: utilise the algorithm to compare
       the query and melody in the database. Most systems
       transformed both into symbolic representation for a
       handy contrast, but it takes more time as well as
       occupies more database storage space.
    4. Returning results: based on the results of the comparison,
       the Query-by-Humming system can return results that
       are sorted by similarities. Usually it will return the top
       five or top ten results to the user to differentiate whether
       the results fit the request. This is an important index to
       determine the accuracy of the Query-by-Humming                                Figure 2. Example of transforming WAV to MIDI.
       system.

    According to the above discussions, the system                       B. Pitch Contour
framework is shown in Fig. 1 where the blue dotted rectangle
contains the pre-process steps and the red rectangle indicates               After translating WAV to MIDI, we then analyse the pitch
the query processing steps. The comparing section that                   interval in each note. Different from [25], which divided the
employs the Longest Common Subsequence (LCS) will be                     pitch interval into U, D, and S, two neighbouring pitch
expatiated in the following.                                             intervals are divided into six symbols as given in Table 1.

A. WAV to MIDI                                                                                   Table 1: Pitch Contour Symbol
    Zhu and Shasha [32] mentioned that rather than employing
some unreliable note-segmentation algorithm, they would                                        Symbol       Pitch Interval
rather use the best commercial software – Akoff Music                                              H                >8
Composer v2.0 – to record and transcribe notes from a user’s                                       R               5~8
query. The Akoff Music Composer software by Akoff Sound
Labs is designed for the recognition of polyphonic music from                                      U               1~4
audio sources and its conversion to a MIDI score. Recognition                                      D              -1 ~ -4
is performed from pre-recorded WAV files or directly from

2                                                                     Copyright ⓒ 2010 Future Technology Research Association International
Volume 3, Number 4, December 2012                                                                                           Journal of Convergence


                          B              -5 ~ -8
                          L               < -8

    If a user’s query includes several consecutive same pitches
it may produce segment note errors during the processing from
WAV to MIDI, which is particularly obvious in melodies with
faster tempos. To exclude this problem, we do not record the
melody when the pitch interval is zero, but record the variation
in pitch interval only to later reduce the timing of comparing
word strings. An example is listed in Fig. 3.


                                                                                         Figure 4. Input membership functions for query.


                   Figure 3. Pitch contour example.



C. Fuzzy Inference System
   In general, fuzzy inference systems are based on four
major modules for operation.

   1. Fuzzification process: transforms the system inputs,
      which are crisp numbers, into fuzzy sets. This is done by
      applying the membership functions to calculate the
      membership degrees.
   2. Knowledge base: stores fuzzy operations, fuzzy rule
      bases, etc.                                                                       Figure 5. Output membership functions for symbol.
   3. Conflict resolution process: more than one rule may be
      fired by an input datum, and this module resolves
      conflicts by predefined operations.                                               Table 2: Parametric Values of Membership Functions
   4. Defuzzification process: uses the predefined
      defuzzification method, such as the centre of gravity, to                            Pitch Interval        Parametric Value
      transform the fuzzy regions obtained by the inference
                                                                                               large up                  9.5
      procedure into a crisp output value.
                                                                                             medium up                   6.5
      In this paper, we use FIS to transform pitch intervals in
                                                                                               little up                 2.5
the query into symbolic representation. After a WAV file is
transformed to a MIDI, none of any software or algorithm can                                 little down                 -2.5
be 100% accurate in retrieval. Hence, we utilise the
                                                                                           medium down                   -6.5
characteristics of fuzzy sets to blur the pitch contour so as to
offset the errors made while transforming it, to favour the                                  large down                  -9.5
underlying contrasting procedures.
    In the fuzzy membership function settings, frequently used                    According to Table 2, six membership functions are
functions are triangular, Gaussian, or trapezoidal membership                 defined for the input query variable. Those six terms are
functions. Among them, curves of Gaussian membership                          labelled by mf1 to mf6 and correspond to the output
functions are smoother and therefore have better nonlinear                    membership functions L, B, D, U, R, and H, respectively. We
traits. The general formulae are shown below:                                 define six fuzzy rules accordingly, and the users’ queries can
                                   x  m 2                                be transformed into pitch contours by these rules. Finally, the
                  A ( x )  exp                            (1)          pitch contours of the users’ queries and repeating patterns are
                                     
                                                                            compared in the matching process. Fig. 6 illustrates how the
    Here, m is the mean and is the centre point of the Gaussian               proposed fuzzy system infers the output from the
membership function. σ is the standard deviation which                        corresponding input.
corresponds to the width of the Gaussian membership function.
This study chose Gaussian membership functions as the test                       Rule 1: IF pitch interval is large down,
membership functions to process pitch contour transform for                              THEN pitch contour symbol is L.
the underlying comparison procedures. Fig. 4 and Fig. 5 show                     Rule 2: IF pitch interval is medium down,
the membership functions for query and symbol, respectively,
                                                                                         THEN pitch contour symbol is B.
and the parametric values of membership functions are given
in Table 2.                                                                      Rule 3: IF pitch interval is little down,


      Copyright ⓒ 2010 Future Technology Research Association International                                                                          3
Journal of Convergence                                                                                                                  Volume 3, Number 4, December 2012


            THEN pitch contour symbol is D.                                                             for i = 1 to m
    Rule 4: IF pitch interval is little up,                                                                for j = 1 to n
                                                                                                              if X[i] = Y[j]
            THEN pitch contour symbol is U.                                                                       C[i][j] = C[i-1][j-1] + 1
    Rule 5: IF pitch interval is medium up,                                                                   else
            THEN pitch contour symbol is R.                                                                       C[i][j] = max(C[i][j-1], C[i-1][j])
    Rule 6: IF pitch interval is large up,                                                              Return C[m][n]
            THEN pitch contour symbol is H.
                                                                                                For example, we can compare two strings “U, R, R, D, R,
                                                                                            D, R, B” and “U, D, U, R, B”, and the length of the LCS is 4
                                                                                            as shown in Fig. 7. Then, we can use this algorithm to
                                                                                            calculate the length of LCS. Last, we can compare the query
                                                                                            and the melody in the database to the length of LCS as a basis
                                                                                            for sorting and selecting the top ten from the system as
                                                                                            candidate sets.

                                                                                              U
                                                                                              U     R
                                                                                                    R       R
                                                                                                            R       D
                                                                                                                    D       R
                                                                                                                            R       D
                                                                                                                                    D       R
                                                                                                                                            R   B
                                                                                                                                                B


                                                                                                                                                         U
                                                                                                                                                         U      D
                                                                                                                                                                D   R
                                                                                                                                                                    R   B
                                                                                                                                                                        B
                                                                                                        U
                                                                                                        U       D
                                                                                                                D       U
                                                                                                                        U       R
                                                                                                                                R       B
                                                                                                                                        B



                                                                                                                            Figure 7. LCS example.


                Figure 6. Illustration of defuzzification process.
                                                                                                    IV. EXPERIMENTAL RESULTS AND DISCUSSIONS
                                                                                                The MIREX QBSH Database is used as our experimental
D. Longest Common Subsequence                                                               database, which includes 48 MIDI and 718 WAV files. There
    When we finished the query translation and MIDI database                                are 35 subjects’ recorded vocal fragments in WAV files,
representation, the next task is to match the symbolic strings                              including one subject who has recorded 20 records in
between the query string and the strings in the database. The                               humming, and the rest of the subjects in singing. Each section
Longest Common Subsequence (LCS) algorithm is a widely-                                     of WAV file lasts 7 seconds, and includes male and female
used and famous dynamic programming algorithm to                                            voices to test the accuracy of the query system. In addition, we
implement approximate matching. Like other dynamic                                          employ the remaining 48 MIDI files as the music database in
programming methods, LCS also resolved the problems by a                                    which the programs include Chinese and western folk songs.
recurrence relation. For example, X(1, 2, ..., m) and Y(1, 2, ...,                          The original WAV files translated to MIDI via the Akoff
n) are string sequences of length m and n, respectively. Xi                                 Music Composer can use FIS to translate them into pitch
represents a subsequence of X(1, 2, ..., i) or the prefix of the                            contours. After the original query transforms from the FIS of
sequence X of length i and xi represents the ith element of the                             this system to pitch contours, its length distribution scope falls
subsequence. The LCS of X and Y is described in the                                         between 15 and 24. The number of all length query statistics is
following formula:                                                                          shown in Fig. 8, where the horizontal axis is the length of the
                                                                                            pitch contour and the vertical axis is its accumulated number.
                                                             if i  0 or j  0
                    
LCS ( X i , Y j )  ( LCS ( X i 1 , Y j 1 ), xi )           if xi  y j
                    longest( LCS ( X , Y ), LCS ( X , Y )) if x  y               (2)
                                             i    j 1 i 1 j     i     j




    Then, we can use the following pseudo codes to calculate
the length of LCS. Lastly we can contrast the query and the
melody in the database to the length of the LCS as a basis for
sorting and selecting the top five from the system as candidate
sets.

    Function LCSLen(X[1, …, m], Y[1, …, n])
         Initialize C[m][n] to zeroes
         for i = 0 to m
           C[i][0] = 0
         for j = 0 to n
                                                                                                                Figure 8. Length statistics of pitch contour.
           C[0][j] = 0


4                                                                                        Copyright ⓒ 2010 Future Technology Research Association International
Volume 3, Number 4, December 2012                                                                                               Journal of Convergence


    The FIS is implemented by MATLAB. There are 6
membership functions as previously mentioned for either input
or output variables. The system output is divided into 6 groups,
as Top 1, Top 2, Top 3, Top 4, Top 5, and Top 5+ based on
the LCS computing results to calculate their recall rates for
evaluation.
    To compare the effectiveness of the FIS in the proposed
system to other methods, we contrast the results with and
without using the FIS. The experimental results from the top
five and beyond retrieval are summarised in Table 3, and the
histograms for each retrieval level are compared in Fig. 9.




                                                                                         Figure 10. Histograms of accumulated retrieval results.


                                                                                            V. CONCLUSION AND FUTURE WORK
                                                                                  Unlike research in the past, in this paper we employ a
                                                                              massive singing query as the experimental data via diverse
                                                                              vocal records to test the robustness of the proposed system. In
                                                                              the pitch interval coding, we change the past U, S, and D
                                                                              coding method to L, B, D, U, R, and H to improve the
                                                                              accuracy of the coding. Besides, we also abandon the same
                                                                              pitch coding to reduce situations that occur due to consecutive
     Figure 9. Retrieval results from the top five and beyond levels.         same pitch note segmentation errors. In the processing of
                                                                              coding, a fuzzy inference model is used as a coding tool to
                                                                              reduce the errors produced when WAV files were translated to
    We can see that although 11 records in the FIS are not                    MIDI files via their characteristics of blurring. LCS is also
successfully retrieved compared to the normal method in the                   applied as an approximate matching algorithm to locate the
Top 1 level in Table 3, the accumulated figure reached nearly                 Top 5 retrieval results in the system as a standard for
twice the retrieval in the Top 2 level. Apparently the MIR                    evaluating the system’s performance.
effect can indeed be improved after applying FIS to the system.                    During the experiment, we compare the differences with
Moreover, we can see the accumulated results from each                        and without the addition of FIS. From the results, we can see
retrieval level in Fig. 10, where the accumulated results from                that the amount increases radically in the Top 2 samples after
both the Top 1 and Top 2 levels in FIS find 332 records. This                 FIS is added, which means that FIS has indeed fulfilled its
figure indicates that almost half the query can be successfully               function to raise the system’s accuracy. It reaches 65% in the
retrieved. Table 4 shows the percentage accumulated is 65%                    Top 3 retrievals, which indicates that the proposed system has
from the Top 1 to Top 3 levels, and this rises to 85% if the                  a great effect on Query-by-Humming/Singing.
Top 1 to Top 5 levels are considered. Consequently, this result                    We are not satisfied with the current performance. More
verifies that using FIS can indeed raise the system accuracy                  efforts are needed in the future to increase the size of the
and fit the needs of the user query.                                          melody database to test the system’s scalability and to
                                                                              improve the WAV (voice) to MIDI (string) translation process
            Table 3: Total Number of Queries in Each Group                    as well as the incorporation of new features for the basis of
                                                                              analysis to improve the system’s accuracy.

            Top 1     Top 2      Top 3      Top 4     Top 5      Top 5+                                ACKNOWLEDGMENTS
                                                                                  This work was supported in part by the National Science
 Normal      117        137       158         90         86        130        Council, Taiwan under Grant NSC100-2221-E-027-110, and
                                                                              in part by the joint project between the National Taipei
   FIS       106        226       137         72         73        104        University of Technology and Mackay Memorial Hospital
                                                                              under Grant NTUT-MMH-99-03 and Grant NTUT-MMH-
              Table 4: Retrieval Percentage of Each Group                     100-09.
                                                                                                            REFERENCES
            Top 1     Top 2      Top 3      Top 4     Top 5      Top 5+
                                                                              [1]   N. Ben Salem, and J. P. Hubaux, “Securing wireless mesh
                                                                                    networks,” IEEE Comm. Mag., vol. 13, no. 2, pp. 50–55, April
 Normal     16%        19%        22%        13%       12%         18%              2006.
                                                                              [2] N. H. Adams, M. A. Bartsch, and G. H. Wakefield, “Note
   FIS      15%        31%        19%        10%       10%         15%              segmentation and quantization for music information retrieval,”



      Copyright ⓒ 2010 Future Technology Research Association International                                                                              5
Journal of Convergence                                                                                     Volume 3, Number 4, December 2012

    IEEE Trans. on Audio, Speech, and Language Processing, vol.              [19] J.-S. R. Jang, and H.-U. Lee, “A general framework of
    14, no. 1, pp. 131–141, January 2006.                                         progressive filtering and its application to query by
                                                                                  singing/humming,” IEEE Trans. on Audio, Speech, and
[3] B. D. Roger, P. B. William, P. Bryan, H. Ning, M. Colin, and T.               Language Processing, vol. 16, no. 2, pp. 350–358, February 2008.
    George, “A comparative evaluation of search techniques for
    query-by-humming using the MUSART testbed,” J. Am. Soc.                  [20] T. Nishimura, H. Hashiguchi, J. Takita, J. X. Zhang, M. Goto,
    Info. Sci. Technol., vol. 58, no. 5, pp. 687–701, March 2007.                 and R. Oka, “Music signal spotting retrieval by a humming query
                                                                                  using start frame feature dependent continuous dynamic
[4] E. Unal, E. Chew, P. G. Georgiou, and S. S. Narayanan,                        programming,” Proc. Int. Symp. on Music Information Retrieval,
    “Challenging uncertainty in query by humming systems: a                       Bloomington, Indiana, USA, pp. 211–218, October 2001.
    fingerprinting approach,” IEEE Trans. on Audio, Speech, and
    Language Processing, vol. 16, no. 2, pp. 359–371, February 2008.         [21] H.-M. Yu, W.-H. Tsai, and H.-M. Wang, “A query-by-singing
                                                                                  system for retrieving karaoke music,” IEEE Trans. on
[5] Y. Jinhee, P. Sanghyun, and K. Inbum, “An efficient frequent                  Multimedia, vol. 10, no. 8, pp. 1626–1637, December 2008.
    melody indexing method to improve the performance of query-
    by-humming systems,” J. Info. Sci., vol. 34, no. 6, pp. 777–798,         [22] A. N. Myna, V. Chaitra, and K. S. Smitha, “Melody information
    December 2008.                                                                retrieval system using dynamic time warping,” Proc. 2009 WRI
                                                                                  World Congress on Computer Science and Information
[6] S. Rho, B. Han, E. Hwang, and M. Kim, “MUSEMBLE: A novel                      Engineering, vol. 5, Los Angeles, California, USA, pp. 266–270,
    music retrieval system with automatic voice query transcription               March-April 2009.
    and reformulation,” J. Syst. Softw., vol. 81, no. 7, pp. 1065–1080,
    July 2008.                                                               [23] U. Bagci, and E. Erzin, “Automatic classification of musical
                                                                                  genres using inter-genre similarity,” IEEE Signal Process. Lett.,
[7] P. Li, M. Zhou, X. Wang, X. Wang, N. Li, and L. Xie, “A novel                 vol. 14, no. 8, pp. 521–524, August 2007.
    MIR framework and application with automatic voice processing,
    database construction and fuzzy matching,” Proc. 2nd Int. Conf.          [24] J. Shen, D. Tao, and X. Li, “QUC-Tree: Integrating query
    Computer and Automation Engineering, Singapore, vol. 1, pp.                   context information for efficient music retrieval,” IEEE Trans.
    20–24, February 2010.                                                         on Multimedia, vol. 11, no. 2, pp. 313–323, February 2009.
[8] K. Kichul, R. P. Kang, S.-J. Park, S.-P. Lee, and Y. K. Moo,             [25] A. Ghias, J. Logan, D. Chamberlain, and B. Smith, “Query by
    “Robust query-by-singing/humming system against background                    humming – Musical information retrieval in an audio database,”
    noise environments,” IEEE Trans. on Consumer Electronics, vol.                Proc. ACM Multimedia, San Francisco, California, USA, pp.
    57, no. 2, pp. 720–725, May 2011.                                             231–236, November 1995.
[9] H. Pierre, and R. Matthias, “Query by tapping system based on            [26] R. Typke, and L. Prechelt, “An interface for melody input,”
    alignment algorithm,” Proc. IEEE Int. Conf. Acoustics, Speech,                ACM Trans. on Computer-Human Interaction, vol. 8, no. 2, pp.
    and Signal Processing, Taipei, Taiwan, pp. 1881–1884, April                   133–149, June 2001.
    2009.
                                                                             [27] R. J. McNab, L. A. Smith, I. H. Witten, and C. L. Henderson,
[10] F. Pereira, A. Vetro, and T. Sikora, “Multimedia retrieval and               “Tune retrieval in the multimedia library,” Multimedia Tools
    delivery: Essential metadata challenges and standards,” Proc.                 Appl., vol. 10, pp. 113–132, April 2000.
    IEEE, vol. 96, no. 4, pp. 721–744, April 2008.
                                                                             [28] T. Sonoda, and Y. Muraoka, “A www-based melody retrieval
[11] J.-C. Wang, M.-S. Wu, H.-M. Wang, and S.-K. Jeng, “A content-                system - An indexing method for a large melody database,” Proc.
    based music search system using query by multi-tags with multi-               of Int. Computer Music Conference, Berlin, Germany, pp. 170–
    levels of preference,” Proc. IEEE Int. Conf. Multimedia and                   173, August–September 2000.
    Expo, Suntec City, Singapore, pp. 1–6, July 2010.
                                                                             [29] J.-S. Mo, C. H. Han, and Y.-S. Kim, “A melody-based similarity
[12] B. Whitman, and R. Rifkin, “Musical query-by-description as a                computation algorithm for musical information,” Proc.
    multiclass learning problem multimedia signal processing,” Proc.              Workshop on Knowledge and Data Engineering Exchange,
    IEEE Workshop on Multimedia Signal Processing, St. Thomas,                    Illinois, Chicago, USA, pp. 114–121, November 1999.
    USA, pp. 153–156, December 2002.
                                                                             [30] B. Tom, A. Søren, F. Brian, H. Christian, K. Jimmy, W. N. Lau,
[13] N. Hu, and R. B. Dannenberg, “A comparison of melodic                        and R. Thomas, “A system for recognition of hummed tunes,”
    database retrieval techniques using sung queries,” Proc. 2nd                  Proc. COST G-6 Conf. on Digital Audio Effects, Limerick,
    ACM/IEEE-CS Joint Conf. Digital Libraries, Portland, Oregon,                  Ireland, pp. 203–206, December 2001.
    USA, pp. 301–307, July 2002.
                                                                             [31] M. A. Raju, B. Sundaram, and P. Rao, “TANSEN: A query-by-
[14] J. T. Yao, and Y. Y. Yao, “Information granulation for web                   humming based music retrieval system,” Proc. of National Conf.
    based information retrieval support systems,” Proc. Int. Society              on Communications, Madras, India, pp. 75–79, January-February
    for Optical Engineering, Orlando, Florida, USA, pp. 138–146,                  2003.
    April 2003.
                                                                             [32] Y. Zhu and D. Shasha, “Query by humming: A time series
[15] J.-S. R. Jang, C.-L. Hsu, and H.-R. Lee, “Continuous HMM and                 database approach,” Proc. ACM Special Interest Group on
    its enhancement for singing/humming query retrieval,” Proc. Int.              Management of Data, San Diego, California, USA, June 2003.
    Symp. Music Information Retrieval, London, UK, pp. 546–551,
    September 2005.
                                                                                                         BIOGRAPHIES
[16] H. Takeda, N. Saito, T. Otsuki, M. Nakai, H. Shimodaira, and S.
    Sagayama, “Hidden Markov model for automatic transcription of                                            Yo-Ping Huang received his PhD
    MIDI signals,” Proc. IEEE Workshop on Multimedia Signal
    Processing, St. Thomas, Virgin Islands, USA, pp. 428–431,                                                in Electrical Engineering from
    December 2002.                                                                                           Texas Tech University, Lubbock,
[17] K. Lemström, String Matching Techniques for Music Retrieval,                                            TX, USA. He is currently a
    PhD Thesis, Department of Computer Science, Faculty of                                                   Professor in the Department of
    Science, University of Helsinki, 2000.                                                                   Electrical Engineering at National
[18] C. Parker, “Towards intelligent string matching in query-by-                                            Taipei University of Technology
    humming systems,” Proc. IEEE Int. Conf. Multimedia and Expo,                                             (NTUT), Taiwan. He also serves
    vol. 2, Baltimore, Maryland, USA, pp. 25–28, July 2003.                                                  as CEO of the Joint Commission
                                                                                                             of Technological and Vocational


6                                                                         Copyright ⓒ 2010 Future Technology Research Association International
Volume 3, Number 4, December 2012                                             Journal of Convergence


College Admission Committee in Taiwan. He was Secretary
General at NTUT, Chairman of IEEE CIS Taipei Chapter, and
Vice Chairman of IEEE SMC Taipei Chapter. He was
Professor and Dean of the College of Electrical Engineering
and Computer Science, Tatung University, Taipei, before
joining NTUT. His research interests include medical
knowledge mining, intelligent control systems, and handheld
device application systems design. Prof. Huang is a senior
member of the IEEE and a fellow of the IET.

                            Shin-Liang Lai received BS and
                            MS degrees in Mathematics from
                            the National Taipei University of
                            Education, Taipei, Taiwan, in 1998
                            and 2002, respectively. In 2012, he
                            received his PhD in Computer
                            Science from the Department of
                            Computer Science and Engineering,
                            Tatung University, Taipei, Taiwan.
                            His research interests include
                            content-based music information
retrieval, data mining, and fuzzy inference systems.




      Copyright ⓒ 2010 Future Technology Research Association International                            7
Journal of Convergence                                                     Volume 3, Number 4, December 2012




                         This page is intentionally left blank




8                                         Copyright ⓒ 2010 Future Technology Research Association International

Contenu connexe

Tendances

Communication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabCommunication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabSaifAbdulNabi1
 
MS Thesis Presentation
MS Thesis PresentationMS Thesis Presentation
MS Thesis PresentationAli Raza
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesGihan Wikramanayake
 
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...IJNSA Journal
 
Bx32903907
Bx32903907Bx32903907
Bx32903907IJMER
 
Comparative analysis of c99 and topictiling text
Comparative analysis of c99 and topictiling textComparative analysis of c99 and topictiling text
Comparative analysis of c99 and topictiling texteSAT Publishing House
 
Comparative analysis of c99 and topictiling text segmentation algorithms
Comparative analysis of c99 and topictiling text segmentation algorithmsComparative analysis of c99 and topictiling text segmentation algorithms
Comparative analysis of c99 and topictiling text segmentation algorithmseSAT Journals
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Text extraction from images
Text extraction from imagesText extraction from images
Text extraction from imagesGarby Baby
 
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformText Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformIOSR Journals
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 
Presentation2 2000
Presentation2 2000Presentation2 2000
Presentation2 2000suvobgd
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
 
Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...
Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...
Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...Huynh MVT
 

Tendances (18)

Communication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlabCommunication systems-theory-for-undergraduate-students-using-matlab
Communication systems-theory-for-undergraduate-students-using-matlab
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
MS Thesis Presentation
MS Thesis PresentationMS Thesis Presentation
MS Thesis Presentation
 
Speaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia DatabasesSpeaker Search and Indexing for Multimedia Databases
Speaker Search and Indexing for Multimedia Databases
 
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...
COMPARISON PROCESS LONG EXECUTION BETWEEN PQ ALGORTHM AND NEW FUZZY LOGIC ALG...
 
Bx32903907
Bx32903907Bx32903907
Bx32903907
 
Comparative analysis of c99 and topictiling text
Comparative analysis of c99 and topictiling textComparative analysis of c99 and topictiling text
Comparative analysis of c99 and topictiling text
 
Comparative analysis of c99 and topictiling text segmentation algorithms
Comparative analysis of c99 and topictiling text segmentation algorithmsComparative analysis of c99 and topictiling text segmentation algorithms
Comparative analysis of c99 and topictiling text segmentation algorithms
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Text extraction from images
Text extraction from imagesText extraction from images
Text extraction from images
 
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformText Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
 
Dj25659663
Dj25659663Dj25659663
Dj25659663
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
Jk3516211625
Jk3516211625Jk3516211625
Jk3516211625
 
Presentation2 2000
Presentation2 2000Presentation2 2000
Presentation2 2000
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: Review
 
Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...
Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...
Atomic Receiver by Utilizing Multiple Radio-Frequency Coupling at Rydberg Sta...
 

En vedette

Temporary Portfolio Layout
Temporary Portfolio LayoutTemporary Portfolio Layout
Temporary Portfolio Layoutlisalovesit
 
Social Media for Company Secretaries
Social Media for Company SecretariesSocial Media for Company Secretaries
Social Media for Company SecretariesHong Bao Media
 
Rotary Youth Exchange - 1020
Rotary Youth Exchange - 1020Rotary Youth Exchange - 1020
Rotary Youth Exchange - 1020Paul Beedham
 
Hydrologic Information Systems and the CUAHSI HIS Desktop Application
Hydrologic Information Systems and the CUAHSI HIS Desktop ApplicationHydrologic Information Systems and the CUAHSI HIS Desktop Application
Hydrologic Information Systems and the CUAHSI HIS Desktop ApplicationACSG Section Montréal
 
Ville de Lévis : Flexibilité de l'application GOCITÉ
Ville de Lévis : Flexibilité de l'application GOCITÉVille de Lévis : Flexibilité de l'application GOCITÉ
Ville de Lévis : Flexibilité de l'application GOCITÉACSG - Section Montréal
 

En vedette (7)

Temporary Portfolio Layout
Temporary Portfolio LayoutTemporary Portfolio Layout
Temporary Portfolio Layout
 
Social Media for Company Secretaries
Social Media for Company SecretariesSocial Media for Company Secretaries
Social Media for Company Secretaries
 
Rotary Youth Exchange - 1020
Rotary Youth Exchange - 1020Rotary Youth Exchange - 1020
Rotary Youth Exchange - 1020
 
Diadelosmuertos
DiadelosmuertosDiadelosmuertos
Diadelosmuertos
 
Chongqing Handbook
Chongqing HandbookChongqing Handbook
Chongqing Handbook
 
Hydrologic Information Systems and the CUAHSI HIS Desktop Application
Hydrologic Information Systems and the CUAHSI HIS Desktop ApplicationHydrologic Information Systems and the CUAHSI HIS Desktop Application
Hydrologic Information Systems and the CUAHSI HIS Desktop Application
 
Ville de Lévis : Flexibilité de l'application GOCITÉ
Ville de Lévis : Flexibilité de l'application GOCITÉVille de Lévis : Flexibilité de l'application GOCITÉ
Ville de Lévis : Flexibilité de l'application GOCITÉ
 

Similaire à Fuzzy

Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier
 
A computationally efficient learning model to classify audio signal attributes
A computationally efficient learning model to classify audio  signal attributesA computationally efficient learning model to classify audio  signal attributes
A computationally efficient learning model to classify audio signal attributesIJECEIAES
 
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio ClassificationInvestigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio Classificationijaia
 
Analysis Synthesis Comparison
Analysis Synthesis ComparisonAnalysis Synthesis Comparison
Analysis Synthesis ComparisonJim Webb
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
 
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningIRJET Journal
 
Analyzing and Ranking Multimedia Ontologies for their Reuse
Analyzing and Ranking Multimedia Ontologies for their ReuseAnalyzing and Ranking Multimedia Ontologies for their Reuse
Analyzing and Ranking Multimedia Ontologies for their ReuseEURECOM
 
Multidimensional approach in cbmmirs full paper v4.0
Multidimensional approach in cbmmirs  full paper  v4.0Multidimensional approach in cbmmirs  full paper  v4.0
Multidimensional approach in cbmmirs full paper v4.0Albaar Rubhasy
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technologyShital Kat
 
Applsci 08-00606-v3
Applsci 08-00606-v3Applsci 08-00606-v3
Applsci 08-00606-v3IsraelEbonko
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Towards User-friendly Audio Creation
Towards User-friendly Audio CreationTowards User-friendly Audio Creation
Towards User-friendly Audio CreationJean Vanderdonckt
 
Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...ijitjournal
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 

Similaire à Fuzzy (20)

Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
A computationally efficient learning model to classify audio signal attributes
A computationally efficient learning model to classify audio  signal attributesA computationally efficient learning model to classify audio  signal attributes
A computationally efficient learning model to classify audio signal attributes
 
Investigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio ClassificationInvestigating Multi-Feature Selection and Ensembling for Audio Classification
Investigating Multi-Feature Selection and Ensembling for Audio Classification
 
Analysis Synthesis Comparison
Analysis Synthesis ComparisonAnalysis Synthesis Comparison
Analysis Synthesis Comparison
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
 
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep Learning
 
Analyzing and Ranking Multimedia Ontologies for their Reuse
Analyzing and Ranking Multimedia Ontologies for their ReuseAnalyzing and Ranking Multimedia Ontologies for their Reuse
Analyzing and Ranking Multimedia Ontologies for their Reuse
 
De4201715719
De4201715719De4201715719
De4201715719
 
Multidimensional approach in cbmmirs full paper v4.0
Multidimensional approach in cbmmirs  full paper  v4.0Multidimensional approach in cbmmirs  full paper  v4.0
Multidimensional approach in cbmmirs full paper v4.0
 
Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technology
 
Applsci 08-00606-v3
Applsci 08-00606-v3Applsci 08-00606-v3
Applsci 08-00606-v3
 
Report
ReportReport
Report
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
T26123129
T26123129T26123129
T26123129
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Towards User-friendly Audio Creation
Towards User-friendly Audio CreationTowards User-friendly Audio Creation
Towards User-friendly Audio Creation
 
Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 

Fuzzy

  • 1. Volume 3, Number 4, December 2012 Journal of Convergence Novel Query-by-Humming/Singing Method with Fuzzy Inference System Yo-Ping Huang Shin-Liang Lai Department of Electrical Engineering Department of Computer Science and Engineering National Taipei University of Technology Tatung University Taipei, Taiwan 10608 Taipei, Taiwan 10451 yphuang@ntut.edu.tw sinla.lai@gmail.com Abstract—Music Information Retrieval (MIR) is a crucial topic in means for a MIR inquiry can obtain a desirable effect without the domain of information retrieval. According to the major other apparatus. However, compared with other methods, it is characteristics of music, the Query-by-Humming system retrieves more difficult to retrieve related music and it is always interesting music by finding melodies that contains similar or returned with a lower accuracy, especially when using singing equal melodies to the humming query. Basing on the fuzzy as the way of query the accuracy [6]. inference model designed in this paper, a novel Query-by- Humming/Singing system is proposed to extract pitch contour To translate the hummed melody into the contrasting information from WAV and MIDI files. To verify the MIDI format, many researchers have described multi Query- effectiveness of the presented work, the MIREX QBSH Database by-Humming systems with melody comparison [12]. MIDI is employed as our experimental database and a large amount of can be regarded as a music format expressed in words or human vocal data is used as queries to test the robustness of the numbers. Therefore, most researchers change the melody into MIR. Then, the Longest Common Subsequence (LCS) is used as a series of symbolic representations to be compared with the an approximate matching algorithm to identify the top 5 music MIDI. When a query has been changed into symbolic samples as an evaluation standard for the system. Experimental representation, it can be proceeded to contrast with a melody results show that the proposed system achieves 85% accuracy in that has been previously coped with. Several methods have the top 5 retrievals. been proposed for comparison, such as granular events [13], Keywords-MIDI; query-by-humming; pitch contour; fuzzy hidden Markov models [14] [15], string matching [16] [17], inference system. dynamic programming [18] [19], dynamic time warping (DTW) [20] [21], and tree-based searching [22] [23]. At the I. INTRODUCTION end of the contrasting procedures, MIR can then select the item that matches the guiding rules. With its extensive resources, Internet services prevail over The remainder of this paper is organised as follows. whole areas, including the music market. The Internet is Section 2 starts with some basic definitions and notations used something like a huge database that satisfies all kinds of needs throughout the paper, and provides a brief look at the previous of its client at all times. For instance, many researchers study work on this subject. Section 3 presents the details of our Music Information Retrieval (MIR) for the purpose of finding approach. In Section 4, a novel MIR is presented and test quickly and correctly the programs a user needs from a vast results are discussed. Finally, Section 5 concludes the paper music database. To achieve the goal, researchers have probed and gives a perspective on further study on this subject. into all types of music format, including MIDI, MP3, Wave, and Voice. In recent years, more and more researchers rely on II. RELATED WORK MIDI (Musical Instrument Data Interface) as their research Ghias et al. [25] built a MIR to process MIDI files in 1995. focus due to the fact that MIDI has superiority over other They used three symbols (U, D and S) to depict the three music formats. For example, MIDI can be played with different levels of pitch contour. U represents up, D is down electronic synthesizers and the pitch or length can be changed and S is the same. In any time series graph those are the only according to a user’s needs. With a smaller size of file forms, three possibilities for a graph to grow, shrink or remain the MIDI is easier to record and thus is widely applied in karaoke. same. With this advantage, MIDI has its characteristics in saving Many researchers ameliorate afterwards with this, such as storage space, speeding up enquiries, and raising the accuracy Typke et al. [26] who researched the Parsons Code. They also of enquiry results. use the pitch contour from humming, and after acquiring the However, the introduction of MIR is accompanied with pitch contour represent them as U, D, and R strings. McNab et questions on query methods when considering how to extract al. [27] incorporated rhythm with the idea of pitch contours. and represent features of a query melody. Previous studies Sonoda et al. [28], Li et al. [7], and Mo et al. [29] considered have developed many query models, including Query-by- the characteristic of duration, and used L (longer), R (Repeat), Humming (Singing) [1]–[7], Query-by-Tapping [8], Query- and S (Same) to represent the changes. by-Example [9], Query-by-Tag [10], and Query-by- Tom et al. [30] used a dynamically-calculated threshold, Description [11]. Many researchers are interested in Query-by- applied it to get the contour of the signal to segment notes, and Humming (QBH), because humming is the simplest and most used autocorrelation to detect pitches. Then notes are direct way for people to express music. Using QBH as the Copyright ⓒ 2010 Future Technology Research Association International 1
  • 2. Journal of Convergence Volume 3, Number 4, December 2012 represented as a string sequence of U, D, and R. Raju et al. audio input in real-time, by tracking note dynamics and pitch [31] proposed a similar approach to represent the melody. The bends, and using different harmonic models to improve the melodies in the database are indexed by the U, D, and S recognition of appropriate instruments. In the meantime, v3.0 strings. They used a time-domain autocorrelation function for has increased a lot in terms of identification rate. Hence we pitch extraction and gave a dynamic programming-based edit also use the Akoff Music Composer as a tool for transforming distance algorithm as a similarity metric. But these techniques WAV to MIDI as shown in Fig. 2. are time-consuming. Therefore, in this paper we propose different pitch contour coding methods that use a large amount of human WAV File MIDI vocal data as the query and combine Fuzzy Inference System Database (FIS) to search the MIDI database for testing the accuracy of MIR. III. THE PROPOSED APPROACH Translate Melody to Since the user’s query is an important factor that will MIDI Pitch Contour Extraction influence the accuracy of the results, massive and diverse Database query data are experimented with to test the accuracy of the system. To solve the above problems, the MIREX QBSH Fuzzy Fuzzy Database is employed as our experimental database, which is Inference Inference unique in that it contains a large amount of query data that System LCS System consists of different individuals and sexes. In general, there String are several steps to establish a Query-by-Humming system: Matching Pitch Pitch 1. Building a music database: whatever the music format, Contour Contour the first step is to collect massive music programs in the database, proceed to pre-processing or format transference after comparison, and save the processed Ranked List results into the database. 2. Inputting query: based on the above discussions, the Figure 1. System framework. query may have multi modes, for example picking up the features of the query after the user inputs the query and then transforming them into pre-defined formats for comparison. 3. Comparing procedures: utilise the algorithm to compare the query and melody in the database. Most systems transformed both into symbolic representation for a handy contrast, but it takes more time as well as occupies more database storage space. 4. Returning results: based on the results of the comparison, the Query-by-Humming system can return results that are sorted by similarities. Usually it will return the top five or top ten results to the user to differentiate whether the results fit the request. This is an important index to determine the accuracy of the Query-by-Humming Figure 2. Example of transforming WAV to MIDI. system. According to the above discussions, the system B. Pitch Contour framework is shown in Fig. 1 where the blue dotted rectangle contains the pre-process steps and the red rectangle indicates After translating WAV to MIDI, we then analyse the pitch the query processing steps. The comparing section that interval in each note. Different from [25], which divided the employs the Longest Common Subsequence (LCS) will be pitch interval into U, D, and S, two neighbouring pitch expatiated in the following. intervals are divided into six symbols as given in Table 1. A. WAV to MIDI Table 1: Pitch Contour Symbol Zhu and Shasha [32] mentioned that rather than employing some unreliable note-segmentation algorithm, they would Symbol Pitch Interval rather use the best commercial software – Akoff Music H >8 Composer v2.0 – to record and transcribe notes from a user’s R 5~8 query. The Akoff Music Composer software by Akoff Sound Labs is designed for the recognition of polyphonic music from U 1~4 audio sources and its conversion to a MIDI score. Recognition D -1 ~ -4 is performed from pre-recorded WAV files or directly from 2 Copyright ⓒ 2010 Future Technology Research Association International
  • 3. Volume 3, Number 4, December 2012 Journal of Convergence B -5 ~ -8 L < -8 If a user’s query includes several consecutive same pitches it may produce segment note errors during the processing from WAV to MIDI, which is particularly obvious in melodies with faster tempos. To exclude this problem, we do not record the melody when the pitch interval is zero, but record the variation in pitch interval only to later reduce the timing of comparing word strings. An example is listed in Fig. 3. Figure 4. Input membership functions for query. Figure 3. Pitch contour example. C. Fuzzy Inference System In general, fuzzy inference systems are based on four major modules for operation. 1. Fuzzification process: transforms the system inputs, which are crisp numbers, into fuzzy sets. This is done by applying the membership functions to calculate the membership degrees. 2. Knowledge base: stores fuzzy operations, fuzzy rule bases, etc. Figure 5. Output membership functions for symbol. 3. Conflict resolution process: more than one rule may be fired by an input datum, and this module resolves conflicts by predefined operations. Table 2: Parametric Values of Membership Functions 4. Defuzzification process: uses the predefined defuzzification method, such as the centre of gravity, to Pitch Interval Parametric Value transform the fuzzy regions obtained by the inference large up 9.5 procedure into a crisp output value. medium up 6.5 In this paper, we use FIS to transform pitch intervals in little up 2.5 the query into symbolic representation. After a WAV file is transformed to a MIDI, none of any software or algorithm can little down -2.5 be 100% accurate in retrieval. Hence, we utilise the medium down -6.5 characteristics of fuzzy sets to blur the pitch contour so as to offset the errors made while transforming it, to favour the large down -9.5 underlying contrasting procedures. In the fuzzy membership function settings, frequently used According to Table 2, six membership functions are functions are triangular, Gaussian, or trapezoidal membership defined for the input query variable. Those six terms are functions. Among them, curves of Gaussian membership labelled by mf1 to mf6 and correspond to the output functions are smoother and therefore have better nonlinear membership functions L, B, D, U, R, and H, respectively. We traits. The general formulae are shown below: define six fuzzy rules accordingly, and the users’ queries can   x  m 2  be transformed into pitch contours by these rules. Finally, the  A ( x )  exp      (1) pitch contours of the users’ queries and repeating patterns are        compared in the matching process. Fig. 6 illustrates how the Here, m is the mean and is the centre point of the Gaussian proposed fuzzy system infers the output from the membership function. σ is the standard deviation which corresponding input. corresponds to the width of the Gaussian membership function. This study chose Gaussian membership functions as the test Rule 1: IF pitch interval is large down, membership functions to process pitch contour transform for THEN pitch contour symbol is L. the underlying comparison procedures. Fig. 4 and Fig. 5 show Rule 2: IF pitch interval is medium down, the membership functions for query and symbol, respectively, THEN pitch contour symbol is B. and the parametric values of membership functions are given in Table 2. Rule 3: IF pitch interval is little down, Copyright ⓒ 2010 Future Technology Research Association International 3
  • 4. Journal of Convergence Volume 3, Number 4, December 2012 THEN pitch contour symbol is D. for i = 1 to m Rule 4: IF pitch interval is little up, for j = 1 to n if X[i] = Y[j] THEN pitch contour symbol is U. C[i][j] = C[i-1][j-1] + 1 Rule 5: IF pitch interval is medium up, else THEN pitch contour symbol is R. C[i][j] = max(C[i][j-1], C[i-1][j]) Rule 6: IF pitch interval is large up, Return C[m][n] THEN pitch contour symbol is H. For example, we can compare two strings “U, R, R, D, R, D, R, B” and “U, D, U, R, B”, and the length of the LCS is 4 as shown in Fig. 7. Then, we can use this algorithm to calculate the length of LCS. Last, we can compare the query and the melody in the database to the length of LCS as a basis for sorting and selecting the top ten from the system as candidate sets. U U R R R R D D R R D D R R B B U U D D R R B B U U D D U U R R B B Figure 7. LCS example. Figure 6. Illustration of defuzzification process. IV. EXPERIMENTAL RESULTS AND DISCUSSIONS The MIREX QBSH Database is used as our experimental D. Longest Common Subsequence database, which includes 48 MIDI and 718 WAV files. There When we finished the query translation and MIDI database are 35 subjects’ recorded vocal fragments in WAV files, representation, the next task is to match the symbolic strings including one subject who has recorded 20 records in between the query string and the strings in the database. The humming, and the rest of the subjects in singing. Each section Longest Common Subsequence (LCS) algorithm is a widely- of WAV file lasts 7 seconds, and includes male and female used and famous dynamic programming algorithm to voices to test the accuracy of the query system. In addition, we implement approximate matching. Like other dynamic employ the remaining 48 MIDI files as the music database in programming methods, LCS also resolved the problems by a which the programs include Chinese and western folk songs. recurrence relation. For example, X(1, 2, ..., m) and Y(1, 2, ..., The original WAV files translated to MIDI via the Akoff n) are string sequences of length m and n, respectively. Xi Music Composer can use FIS to translate them into pitch represents a subsequence of X(1, 2, ..., i) or the prefix of the contours. After the original query transforms from the FIS of sequence X of length i and xi represents the ith element of the this system to pitch contours, its length distribution scope falls subsequence. The LCS of X and Y is described in the between 15 and 24. The number of all length query statistics is following formula: shown in Fig. 8, where the horizontal axis is the length of the pitch contour and the vertical axis is its accumulated number.  if i  0 or j  0  LCS ( X i , Y j )  ( LCS ( X i 1 , Y j 1 ), xi ) if xi  y j longest( LCS ( X , Y ), LCS ( X , Y )) if x  y (2)  i j 1 i 1 j i j Then, we can use the following pseudo codes to calculate the length of LCS. Lastly we can contrast the query and the melody in the database to the length of the LCS as a basis for sorting and selecting the top five from the system as candidate sets. Function LCSLen(X[1, …, m], Y[1, …, n]) Initialize C[m][n] to zeroes for i = 0 to m C[i][0] = 0 for j = 0 to n Figure 8. Length statistics of pitch contour. C[0][j] = 0 4 Copyright ⓒ 2010 Future Technology Research Association International
  • 5. Volume 3, Number 4, December 2012 Journal of Convergence The FIS is implemented by MATLAB. There are 6 membership functions as previously mentioned for either input or output variables. The system output is divided into 6 groups, as Top 1, Top 2, Top 3, Top 4, Top 5, and Top 5+ based on the LCS computing results to calculate their recall rates for evaluation. To compare the effectiveness of the FIS in the proposed system to other methods, we contrast the results with and without using the FIS. The experimental results from the top five and beyond retrieval are summarised in Table 3, and the histograms for each retrieval level are compared in Fig. 9. Figure 10. Histograms of accumulated retrieval results. V. CONCLUSION AND FUTURE WORK Unlike research in the past, in this paper we employ a massive singing query as the experimental data via diverse vocal records to test the robustness of the proposed system. In the pitch interval coding, we change the past U, S, and D coding method to L, B, D, U, R, and H to improve the accuracy of the coding. Besides, we also abandon the same pitch coding to reduce situations that occur due to consecutive Figure 9. Retrieval results from the top five and beyond levels. same pitch note segmentation errors. In the processing of coding, a fuzzy inference model is used as a coding tool to reduce the errors produced when WAV files were translated to We can see that although 11 records in the FIS are not MIDI files via their characteristics of blurring. LCS is also successfully retrieved compared to the normal method in the applied as an approximate matching algorithm to locate the Top 1 level in Table 3, the accumulated figure reached nearly Top 5 retrieval results in the system as a standard for twice the retrieval in the Top 2 level. Apparently the MIR evaluating the system’s performance. effect can indeed be improved after applying FIS to the system. During the experiment, we compare the differences with Moreover, we can see the accumulated results from each and without the addition of FIS. From the results, we can see retrieval level in Fig. 10, where the accumulated results from that the amount increases radically in the Top 2 samples after both the Top 1 and Top 2 levels in FIS find 332 records. This FIS is added, which means that FIS has indeed fulfilled its figure indicates that almost half the query can be successfully function to raise the system’s accuracy. It reaches 65% in the retrieved. Table 4 shows the percentage accumulated is 65% Top 3 retrievals, which indicates that the proposed system has from the Top 1 to Top 3 levels, and this rises to 85% if the a great effect on Query-by-Humming/Singing. Top 1 to Top 5 levels are considered. Consequently, this result We are not satisfied with the current performance. More verifies that using FIS can indeed raise the system accuracy efforts are needed in the future to increase the size of the and fit the needs of the user query. melody database to test the system’s scalability and to improve the WAV (voice) to MIDI (string) translation process Table 3: Total Number of Queries in Each Group as well as the incorporation of new features for the basis of analysis to improve the system’s accuracy. Top 1 Top 2 Top 3 Top 4 Top 5 Top 5+ ACKNOWLEDGMENTS This work was supported in part by the National Science Normal 117 137 158 90 86 130 Council, Taiwan under Grant NSC100-2221-E-027-110, and in part by the joint project between the National Taipei FIS 106 226 137 72 73 104 University of Technology and Mackay Memorial Hospital under Grant NTUT-MMH-99-03 and Grant NTUT-MMH- Table 4: Retrieval Percentage of Each Group 100-09. REFERENCES Top 1 Top 2 Top 3 Top 4 Top 5 Top 5+ [1] N. Ben Salem, and J. P. Hubaux, “Securing wireless mesh networks,” IEEE Comm. Mag., vol. 13, no. 2, pp. 50–55, April Normal 16% 19% 22% 13% 12% 18% 2006. [2] N. H. Adams, M. A. Bartsch, and G. H. Wakefield, “Note FIS 15% 31% 19% 10% 10% 15% segmentation and quantization for music information retrieval,” Copyright ⓒ 2010 Future Technology Research Association International 5
  • 6. Journal of Convergence Volume 3, Number 4, December 2012 IEEE Trans. on Audio, Speech, and Language Processing, vol. [19] J.-S. R. Jang, and H.-U. Lee, “A general framework of 14, no. 1, pp. 131–141, January 2006. progressive filtering and its application to query by singing/humming,” IEEE Trans. on Audio, Speech, and [3] B. D. Roger, P. B. William, P. Bryan, H. Ning, M. Colin, and T. Language Processing, vol. 16, no. 2, pp. 350–358, February 2008. George, “A comparative evaluation of search techniques for query-by-humming using the MUSART testbed,” J. Am. Soc. [20] T. Nishimura, H. Hashiguchi, J. Takita, J. X. Zhang, M. Goto, Info. Sci. Technol., vol. 58, no. 5, pp. 687–701, March 2007. and R. Oka, “Music signal spotting retrieval by a humming query using start frame feature dependent continuous dynamic [4] E. Unal, E. Chew, P. G. Georgiou, and S. S. Narayanan, programming,” Proc. Int. Symp. on Music Information Retrieval, “Challenging uncertainty in query by humming systems: a Bloomington, Indiana, USA, pp. 211–218, October 2001. fingerprinting approach,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 359–371, February 2008. [21] H.-M. Yu, W.-H. Tsai, and H.-M. Wang, “A query-by-singing system for retrieving karaoke music,” IEEE Trans. on [5] Y. Jinhee, P. Sanghyun, and K. Inbum, “An efficient frequent Multimedia, vol. 10, no. 8, pp. 1626–1637, December 2008. melody indexing method to improve the performance of query- by-humming systems,” J. Info. Sci., vol. 34, no. 6, pp. 777–798, [22] A. N. Myna, V. Chaitra, and K. S. Smitha, “Melody information December 2008. retrieval system using dynamic time warping,” Proc. 2009 WRI World Congress on Computer Science and Information [6] S. Rho, B. Han, E. Hwang, and M. Kim, “MUSEMBLE: A novel Engineering, vol. 5, Los Angeles, California, USA, pp. 266–270, music retrieval system with automatic voice query transcription March-April 2009. and reformulation,” J. Syst. Softw., vol. 81, no. 7, pp. 1065–1080, July 2008. [23] U. Bagci, and E. Erzin, “Automatic classification of musical genres using inter-genre similarity,” IEEE Signal Process. Lett., [7] P. Li, M. Zhou, X. Wang, X. Wang, N. Li, and L. Xie, “A novel vol. 14, no. 8, pp. 521–524, August 2007. MIR framework and application with automatic voice processing, database construction and fuzzy matching,” Proc. 2nd Int. Conf. [24] J. Shen, D. Tao, and X. Li, “QUC-Tree: Integrating query Computer and Automation Engineering, Singapore, vol. 1, pp. context information for efficient music retrieval,” IEEE Trans. 20–24, February 2010. on Multimedia, vol. 11, no. 2, pp. 313–323, February 2009. [8] K. Kichul, R. P. Kang, S.-J. Park, S.-P. Lee, and Y. K. Moo, [25] A. Ghias, J. Logan, D. Chamberlain, and B. Smith, “Query by “Robust query-by-singing/humming system against background humming – Musical information retrieval in an audio database,” noise environments,” IEEE Trans. on Consumer Electronics, vol. Proc. ACM Multimedia, San Francisco, California, USA, pp. 57, no. 2, pp. 720–725, May 2011. 231–236, November 1995. [9] H. Pierre, and R. Matthias, “Query by tapping system based on [26] R. Typke, and L. Prechelt, “An interface for melody input,” alignment algorithm,” Proc. IEEE Int. Conf. Acoustics, Speech, ACM Trans. on Computer-Human Interaction, vol. 8, no. 2, pp. and Signal Processing, Taipei, Taiwan, pp. 1881–1884, April 133–149, June 2001. 2009. [27] R. J. McNab, L. A. Smith, I. H. Witten, and C. L. Henderson, [10] F. Pereira, A. Vetro, and T. Sikora, “Multimedia retrieval and “Tune retrieval in the multimedia library,” Multimedia Tools delivery: Essential metadata challenges and standards,” Proc. Appl., vol. 10, pp. 113–132, April 2000. IEEE, vol. 96, no. 4, pp. 721–744, April 2008. [28] T. Sonoda, and Y. Muraoka, “A www-based melody retrieval [11] J.-C. Wang, M.-S. Wu, H.-M. Wang, and S.-K. Jeng, “A content- system - An indexing method for a large melody database,” Proc. based music search system using query by multi-tags with multi- of Int. Computer Music Conference, Berlin, Germany, pp. 170– levels of preference,” Proc. IEEE Int. Conf. Multimedia and 173, August–September 2000. Expo, Suntec City, Singapore, pp. 1–6, July 2010. [29] J.-S. Mo, C. H. Han, and Y.-S. Kim, “A melody-based similarity [12] B. Whitman, and R. Rifkin, “Musical query-by-description as a computation algorithm for musical information,” Proc. multiclass learning problem multimedia signal processing,” Proc. Workshop on Knowledge and Data Engineering Exchange, IEEE Workshop on Multimedia Signal Processing, St. Thomas, Illinois, Chicago, USA, pp. 114–121, November 1999. USA, pp. 153–156, December 2002. [30] B. Tom, A. Søren, F. Brian, H. Christian, K. Jimmy, W. N. Lau, [13] N. Hu, and R. B. Dannenberg, “A comparison of melodic and R. Thomas, “A system for recognition of hummed tunes,” database retrieval techniques using sung queries,” Proc. 2nd Proc. COST G-6 Conf. on Digital Audio Effects, Limerick, ACM/IEEE-CS Joint Conf. Digital Libraries, Portland, Oregon, Ireland, pp. 203–206, December 2001. USA, pp. 301–307, July 2002. [31] M. A. Raju, B. Sundaram, and P. Rao, “TANSEN: A query-by- [14] J. T. Yao, and Y. Y. Yao, “Information granulation for web humming based music retrieval system,” Proc. of National Conf. based information retrieval support systems,” Proc. Int. Society on Communications, Madras, India, pp. 75–79, January-February for Optical Engineering, Orlando, Florida, USA, pp. 138–146, 2003. April 2003. [32] Y. Zhu and D. Shasha, “Query by humming: A time series [15] J.-S. R. Jang, C.-L. Hsu, and H.-R. Lee, “Continuous HMM and database approach,” Proc. ACM Special Interest Group on its enhancement for singing/humming query retrieval,” Proc. Int. Management of Data, San Diego, California, USA, June 2003. Symp. Music Information Retrieval, London, UK, pp. 546–551, September 2005. BIOGRAPHIES [16] H. Takeda, N. Saito, T. Otsuki, M. Nakai, H. Shimodaira, and S. Sagayama, “Hidden Markov model for automatic transcription of Yo-Ping Huang received his PhD MIDI signals,” Proc. IEEE Workshop on Multimedia Signal Processing, St. Thomas, Virgin Islands, USA, pp. 428–431, in Electrical Engineering from December 2002. Texas Tech University, Lubbock, [17] K. Lemström, String Matching Techniques for Music Retrieval, TX, USA. He is currently a PhD Thesis, Department of Computer Science, Faculty of Professor in the Department of Science, University of Helsinki, 2000. Electrical Engineering at National [18] C. Parker, “Towards intelligent string matching in query-by- Taipei University of Technology humming systems,” Proc. IEEE Int. Conf. Multimedia and Expo, (NTUT), Taiwan. He also serves vol. 2, Baltimore, Maryland, USA, pp. 25–28, July 2003. as CEO of the Joint Commission of Technological and Vocational 6 Copyright ⓒ 2010 Future Technology Research Association International
  • 7. Volume 3, Number 4, December 2012 Journal of Convergence College Admission Committee in Taiwan. He was Secretary General at NTUT, Chairman of IEEE CIS Taipei Chapter, and Vice Chairman of IEEE SMC Taipei Chapter. He was Professor and Dean of the College of Electrical Engineering and Computer Science, Tatung University, Taipei, before joining NTUT. His research interests include medical knowledge mining, intelligent control systems, and handheld device application systems design. Prof. Huang is a senior member of the IEEE and a fellow of the IET. Shin-Liang Lai received BS and MS degrees in Mathematics from the National Taipei University of Education, Taipei, Taiwan, in 1998 and 2002, respectively. In 2012, he received his PhD in Computer Science from the Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan. His research interests include content-based music information retrieval, data mining, and fuzzy inference systems. Copyright ⓒ 2010 Future Technology Research Association International 7
  • 8. Journal of Convergence Volume 3, Number 4, December 2012 This page is intentionally left blank 8 Copyright ⓒ 2010 Future Technology Research Association International