Environmental Sound detection Using MFCC technique
Fuzzy
1. Volume 3, Number 4, December 2012 Journal of Convergence
Novel Query-by-Humming/Singing Method with
Fuzzy Inference System
Yo-Ping Huang Shin-Liang Lai
Department of Electrical Engineering Department of Computer Science and Engineering
National Taipei University of Technology Tatung University
Taipei, Taiwan 10608 Taipei, Taiwan 10451
yphuang@ntut.edu.tw sinla.lai@gmail.com
Abstract—Music Information Retrieval (MIR) is a crucial topic in means for a MIR inquiry can obtain a desirable effect without
the domain of information retrieval. According to the major other apparatus. However, compared with other methods, it is
characteristics of music, the Query-by-Humming system retrieves more difficult to retrieve related music and it is always
interesting music by finding melodies that contains similar or returned with a lower accuracy, especially when using singing
equal melodies to the humming query. Basing on the fuzzy
as the way of query the accuracy [6].
inference model designed in this paper, a novel Query-by-
Humming/Singing system is proposed to extract pitch contour To translate the hummed melody into the contrasting
information from WAV and MIDI files. To verify the MIDI format, many researchers have described multi Query-
effectiveness of the presented work, the MIREX QBSH Database by-Humming systems with melody comparison [12]. MIDI
is employed as our experimental database and a large amount of can be regarded as a music format expressed in words or
human vocal data is used as queries to test the robustness of the numbers. Therefore, most researchers change the melody into
MIR. Then, the Longest Common Subsequence (LCS) is used as a series of symbolic representations to be compared with the
an approximate matching algorithm to identify the top 5 music MIDI. When a query has been changed into symbolic
samples as an evaluation standard for the system. Experimental representation, it can be proceeded to contrast with a melody
results show that the proposed system achieves 85% accuracy in
that has been previously coped with. Several methods have
the top 5 retrievals.
been proposed for comparison, such as granular events [13],
Keywords-MIDI; query-by-humming; pitch contour; fuzzy hidden Markov models [14] [15], string matching [16] [17],
inference system. dynamic programming [18] [19], dynamic time warping
(DTW) [20] [21], and tree-based searching [22] [23]. At the
I. INTRODUCTION end of the contrasting procedures, MIR can then select the
item that matches the guiding rules.
With its extensive resources, Internet services prevail over The remainder of this paper is organised as follows.
whole areas, including the music market. The Internet is Section 2 starts with some basic definitions and notations used
something like a huge database that satisfies all kinds of needs throughout the paper, and provides a brief look at the previous
of its client at all times. For instance, many researchers study work on this subject. Section 3 presents the details of our
Music Information Retrieval (MIR) for the purpose of finding approach. In Section 4, a novel MIR is presented and test
quickly and correctly the programs a user needs from a vast results are discussed. Finally, Section 5 concludes the paper
music database. To achieve the goal, researchers have probed and gives a perspective on further study on this subject.
into all types of music format, including MIDI, MP3, Wave,
and Voice. In recent years, more and more researchers rely on II. RELATED WORK
MIDI (Musical Instrument Data Interface) as their research Ghias et al. [25] built a MIR to process MIDI files in 1995.
focus due to the fact that MIDI has superiority over other They used three symbols (U, D and S) to depict the three
music formats. For example, MIDI can be played with different levels of pitch contour. U represents up, D is down
electronic synthesizers and the pitch or length can be changed and S is the same. In any time series graph those are the only
according to a user’s needs. With a smaller size of file forms, three possibilities for a graph to grow, shrink or remain the
MIDI is easier to record and thus is widely applied in karaoke. same.
With this advantage, MIDI has its characteristics in saving Many researchers ameliorate afterwards with this, such as
storage space, speeding up enquiries, and raising the accuracy Typke et al. [26] who researched the Parsons Code. They also
of enquiry results. use the pitch contour from humming, and after acquiring the
However, the introduction of MIR is accompanied with pitch contour represent them as U, D, and R strings. McNab et
questions on query methods when considering how to extract al. [27] incorporated rhythm with the idea of pitch contours.
and represent features of a query melody. Previous studies Sonoda et al. [28], Li et al. [7], and Mo et al. [29] considered
have developed many query models, including Query-by- the characteristic of duration, and used L (longer), R (Repeat),
Humming (Singing) [1]–[7], Query-by-Tapping [8], Query- and S (Same) to represent the changes.
by-Example [9], Query-by-Tag [10], and Query-by- Tom et al. [30] used a dynamically-calculated threshold,
Description [11]. Many researchers are interested in Query-by- applied it to get the contour of the signal to segment notes, and
Humming (QBH), because humming is the simplest and most used autocorrelation to detect pitches. Then notes are
direct way for people to express music. Using QBH as the
Copyright ⓒ 2010 Future Technology Research Association International 1
2. Journal of Convergence Volume 3, Number 4, December 2012
represented as a string sequence of U, D, and R. Raju et al. audio input in real-time, by tracking note dynamics and pitch
[31] proposed a similar approach to represent the melody. The bends, and using different harmonic models to improve the
melodies in the database are indexed by the U, D, and S recognition of appropriate instruments. In the meantime, v3.0
strings. They used a time-domain autocorrelation function for has increased a lot in terms of identification rate. Hence we
pitch extraction and gave a dynamic programming-based edit also use the Akoff Music Composer as a tool for transforming
distance algorithm as a similarity metric. But these techniques WAV to MIDI as shown in Fig. 2.
are time-consuming.
Therefore, in this paper we propose different pitch
contour coding methods that use a large amount of human
WAV File MIDI
vocal data as the query and combine Fuzzy Inference System
Database
(FIS) to search the MIDI database for testing the accuracy of
MIR.
III. THE PROPOSED APPROACH Translate
Melody
to
Since the user’s query is an important factor that will MIDI Pitch Contour Extraction
influence the accuracy of the results, massive and diverse Database
query data are experimented with to test the accuracy of the
system. To solve the above problems, the MIREX QBSH Fuzzy Fuzzy
Database is employed as our experimental database, which is Inference Inference
unique in that it contains a large amount of query data that System LCS System
consists of different individuals and sexes. In general, there String
are several steps to establish a Query-by-Humming system: Matching
Pitch Pitch
1. Building a music database: whatever the music format, Contour Contour
the first step is to collect massive music programs in the
database, proceed to pre-processing or format
transference after comparison, and save the processed Ranked List
results into the database.
2. Inputting query: based on the above discussions, the
Figure 1. System framework.
query may have multi modes, for example picking up
the features of the query after the user inputs the query
and then transforming them into pre-defined formats for
comparison.
3. Comparing procedures: utilise the algorithm to compare
the query and melody in the database. Most systems
transformed both into symbolic representation for a
handy contrast, but it takes more time as well as
occupies more database storage space.
4. Returning results: based on the results of the comparison,
the Query-by-Humming system can return results that
are sorted by similarities. Usually it will return the top
five or top ten results to the user to differentiate whether
the results fit the request. This is an important index to
determine the accuracy of the Query-by-Humming Figure 2. Example of transforming WAV to MIDI.
system.
According to the above discussions, the system B. Pitch Contour
framework is shown in Fig. 1 where the blue dotted rectangle
contains the pre-process steps and the red rectangle indicates After translating WAV to MIDI, we then analyse the pitch
the query processing steps. The comparing section that interval in each note. Different from [25], which divided the
employs the Longest Common Subsequence (LCS) will be pitch interval into U, D, and S, two neighbouring pitch
expatiated in the following. intervals are divided into six symbols as given in Table 1.
A. WAV to MIDI Table 1: Pitch Contour Symbol
Zhu and Shasha [32] mentioned that rather than employing
some unreliable note-segmentation algorithm, they would Symbol Pitch Interval
rather use the best commercial software – Akoff Music H >8
Composer v2.0 – to record and transcribe notes from a user’s R 5~8
query. The Akoff Music Composer software by Akoff Sound
Labs is designed for the recognition of polyphonic music from U 1~4
audio sources and its conversion to a MIDI score. Recognition D -1 ~ -4
is performed from pre-recorded WAV files or directly from
2 Copyright ⓒ 2010 Future Technology Research Association International
3. Volume 3, Number 4, December 2012 Journal of Convergence
B -5 ~ -8
L < -8
If a user’s query includes several consecutive same pitches
it may produce segment note errors during the processing from
WAV to MIDI, which is particularly obvious in melodies with
faster tempos. To exclude this problem, we do not record the
melody when the pitch interval is zero, but record the variation
in pitch interval only to later reduce the timing of comparing
word strings. An example is listed in Fig. 3.
Figure 4. Input membership functions for query.
Figure 3. Pitch contour example.
C. Fuzzy Inference System
In general, fuzzy inference systems are based on four
major modules for operation.
1. Fuzzification process: transforms the system inputs,
which are crisp numbers, into fuzzy sets. This is done by
applying the membership functions to calculate the
membership degrees.
2. Knowledge base: stores fuzzy operations, fuzzy rule
bases, etc. Figure 5. Output membership functions for symbol.
3. Conflict resolution process: more than one rule may be
fired by an input datum, and this module resolves
conflicts by predefined operations. Table 2: Parametric Values of Membership Functions
4. Defuzzification process: uses the predefined
defuzzification method, such as the centre of gravity, to Pitch Interval Parametric Value
transform the fuzzy regions obtained by the inference
large up 9.5
procedure into a crisp output value.
medium up 6.5
In this paper, we use FIS to transform pitch intervals in
little up 2.5
the query into symbolic representation. After a WAV file is
transformed to a MIDI, none of any software or algorithm can little down -2.5
be 100% accurate in retrieval. Hence, we utilise the
medium down -6.5
characteristics of fuzzy sets to blur the pitch contour so as to
offset the errors made while transforming it, to favour the large down -9.5
underlying contrasting procedures.
In the fuzzy membership function settings, frequently used According to Table 2, six membership functions are
functions are triangular, Gaussian, or trapezoidal membership defined for the input query variable. Those six terms are
functions. Among them, curves of Gaussian membership labelled by mf1 to mf6 and correspond to the output
functions are smoother and therefore have better nonlinear membership functions L, B, D, U, R, and H, respectively. We
traits. The general formulae are shown below: define six fuzzy rules accordingly, and the users’ queries can
x m 2 be transformed into pitch contours by these rules. Finally, the
A ( x ) exp (1) pitch contours of the users’ queries and repeating patterns are
compared in the matching process. Fig. 6 illustrates how the
Here, m is the mean and is the centre point of the Gaussian proposed fuzzy system infers the output from the
membership function. σ is the standard deviation which corresponding input.
corresponds to the width of the Gaussian membership function.
This study chose Gaussian membership functions as the test Rule 1: IF pitch interval is large down,
membership functions to process pitch contour transform for THEN pitch contour symbol is L.
the underlying comparison procedures. Fig. 4 and Fig. 5 show Rule 2: IF pitch interval is medium down,
the membership functions for query and symbol, respectively,
THEN pitch contour symbol is B.
and the parametric values of membership functions are given
in Table 2. Rule 3: IF pitch interval is little down,
Copyright ⓒ 2010 Future Technology Research Association International 3
4. Journal of Convergence Volume 3, Number 4, December 2012
THEN pitch contour symbol is D. for i = 1 to m
Rule 4: IF pitch interval is little up, for j = 1 to n
if X[i] = Y[j]
THEN pitch contour symbol is U. C[i][j] = C[i-1][j-1] + 1
Rule 5: IF pitch interval is medium up, else
THEN pitch contour symbol is R. C[i][j] = max(C[i][j-1], C[i-1][j])
Rule 6: IF pitch interval is large up, Return C[m][n]
THEN pitch contour symbol is H.
For example, we can compare two strings “U, R, R, D, R,
D, R, B” and “U, D, U, R, B”, and the length of the LCS is 4
as shown in Fig. 7. Then, we can use this algorithm to
calculate the length of LCS. Last, we can compare the query
and the melody in the database to the length of LCS as a basis
for sorting and selecting the top ten from the system as
candidate sets.
U
U R
R R
R D
D R
R D
D R
R B
B
U
U D
D R
R B
B
U
U D
D U
U R
R B
B
Figure 7. LCS example.
Figure 6. Illustration of defuzzification process.
IV. EXPERIMENTAL RESULTS AND DISCUSSIONS
The MIREX QBSH Database is used as our experimental
D. Longest Common Subsequence database, which includes 48 MIDI and 718 WAV files. There
When we finished the query translation and MIDI database are 35 subjects’ recorded vocal fragments in WAV files,
representation, the next task is to match the symbolic strings including one subject who has recorded 20 records in
between the query string and the strings in the database. The humming, and the rest of the subjects in singing. Each section
Longest Common Subsequence (LCS) algorithm is a widely- of WAV file lasts 7 seconds, and includes male and female
used and famous dynamic programming algorithm to voices to test the accuracy of the query system. In addition, we
implement approximate matching. Like other dynamic employ the remaining 48 MIDI files as the music database in
programming methods, LCS also resolved the problems by a which the programs include Chinese and western folk songs.
recurrence relation. For example, X(1, 2, ..., m) and Y(1, 2, ..., The original WAV files translated to MIDI via the Akoff
n) are string sequences of length m and n, respectively. Xi Music Composer can use FIS to translate them into pitch
represents a subsequence of X(1, 2, ..., i) or the prefix of the contours. After the original query transforms from the FIS of
sequence X of length i and xi represents the ith element of the this system to pitch contours, its length distribution scope falls
subsequence. The LCS of X and Y is described in the between 15 and 24. The number of all length query statistics is
following formula: shown in Fig. 8, where the horizontal axis is the length of the
pitch contour and the vertical axis is its accumulated number.
if i 0 or j 0
LCS ( X i , Y j ) ( LCS ( X i 1 , Y j 1 ), xi ) if xi y j
longest( LCS ( X , Y ), LCS ( X , Y )) if x y (2)
i j 1 i 1 j i j
Then, we can use the following pseudo codes to calculate
the length of LCS. Lastly we can contrast the query and the
melody in the database to the length of the LCS as a basis for
sorting and selecting the top five from the system as candidate
sets.
Function LCSLen(X[1, …, m], Y[1, …, n])
Initialize C[m][n] to zeroes
for i = 0 to m
C[i][0] = 0
for j = 0 to n
Figure 8. Length statistics of pitch contour.
C[0][j] = 0
4 Copyright ⓒ 2010 Future Technology Research Association International
5. Volume 3, Number 4, December 2012 Journal of Convergence
The FIS is implemented by MATLAB. There are 6
membership functions as previously mentioned for either input
or output variables. The system output is divided into 6 groups,
as Top 1, Top 2, Top 3, Top 4, Top 5, and Top 5+ based on
the LCS computing results to calculate their recall rates for
evaluation.
To compare the effectiveness of the FIS in the proposed
system to other methods, we contrast the results with and
without using the FIS. The experimental results from the top
five and beyond retrieval are summarised in Table 3, and the
histograms for each retrieval level are compared in Fig. 9.
Figure 10. Histograms of accumulated retrieval results.
V. CONCLUSION AND FUTURE WORK
Unlike research in the past, in this paper we employ a
massive singing query as the experimental data via diverse
vocal records to test the robustness of the proposed system. In
the pitch interval coding, we change the past U, S, and D
coding method to L, B, D, U, R, and H to improve the
accuracy of the coding. Besides, we also abandon the same
pitch coding to reduce situations that occur due to consecutive
Figure 9. Retrieval results from the top five and beyond levels. same pitch note segmentation errors. In the processing of
coding, a fuzzy inference model is used as a coding tool to
reduce the errors produced when WAV files were translated to
We can see that although 11 records in the FIS are not MIDI files via their characteristics of blurring. LCS is also
successfully retrieved compared to the normal method in the applied as an approximate matching algorithm to locate the
Top 1 level in Table 3, the accumulated figure reached nearly Top 5 retrieval results in the system as a standard for
twice the retrieval in the Top 2 level. Apparently the MIR evaluating the system’s performance.
effect can indeed be improved after applying FIS to the system. During the experiment, we compare the differences with
Moreover, we can see the accumulated results from each and without the addition of FIS. From the results, we can see
retrieval level in Fig. 10, where the accumulated results from that the amount increases radically in the Top 2 samples after
both the Top 1 and Top 2 levels in FIS find 332 records. This FIS is added, which means that FIS has indeed fulfilled its
figure indicates that almost half the query can be successfully function to raise the system’s accuracy. It reaches 65% in the
retrieved. Table 4 shows the percentage accumulated is 65% Top 3 retrievals, which indicates that the proposed system has
from the Top 1 to Top 3 levels, and this rises to 85% if the a great effect on Query-by-Humming/Singing.
Top 1 to Top 5 levels are considered. Consequently, this result We are not satisfied with the current performance. More
verifies that using FIS can indeed raise the system accuracy efforts are needed in the future to increase the size of the
and fit the needs of the user query. melody database to test the system’s scalability and to
improve the WAV (voice) to MIDI (string) translation process
Table 3: Total Number of Queries in Each Group as well as the incorporation of new features for the basis of
analysis to improve the system’s accuracy.
Top 1 Top 2 Top 3 Top 4 Top 5 Top 5+ ACKNOWLEDGMENTS
This work was supported in part by the National Science
Normal 117 137 158 90 86 130 Council, Taiwan under Grant NSC100-2221-E-027-110, and
in part by the joint project between the National Taipei
FIS 106 226 137 72 73 104 University of Technology and Mackay Memorial Hospital
under Grant NTUT-MMH-99-03 and Grant NTUT-MMH-
Table 4: Retrieval Percentage of Each Group 100-09.
REFERENCES
Top 1 Top 2 Top 3 Top 4 Top 5 Top 5+
[1] N. Ben Salem, and J. P. Hubaux, “Securing wireless mesh
networks,” IEEE Comm. Mag., vol. 13, no. 2, pp. 50–55, April
Normal 16% 19% 22% 13% 12% 18% 2006.
[2] N. H. Adams, M. A. Bartsch, and G. H. Wakefield, “Note
FIS 15% 31% 19% 10% 10% 15% segmentation and quantization for music information retrieval,”
Copyright ⓒ 2010 Future Technology Research Association International 5
6. Journal of Convergence Volume 3, Number 4, December 2012
IEEE Trans. on Audio, Speech, and Language Processing, vol. [19] J.-S. R. Jang, and H.-U. Lee, “A general framework of
14, no. 1, pp. 131–141, January 2006. progressive filtering and its application to query by
singing/humming,” IEEE Trans. on Audio, Speech, and
[3] B. D. Roger, P. B. William, P. Bryan, H. Ning, M. Colin, and T. Language Processing, vol. 16, no. 2, pp. 350–358, February 2008.
George, “A comparative evaluation of search techniques for
query-by-humming using the MUSART testbed,” J. Am. Soc. [20] T. Nishimura, H. Hashiguchi, J. Takita, J. X. Zhang, M. Goto,
Info. Sci. Technol., vol. 58, no. 5, pp. 687–701, March 2007. and R. Oka, “Music signal spotting retrieval by a humming query
using start frame feature dependent continuous dynamic
[4] E. Unal, E. Chew, P. G. Georgiou, and S. S. Narayanan, programming,” Proc. Int. Symp. on Music Information Retrieval,
“Challenging uncertainty in query by humming systems: a Bloomington, Indiana, USA, pp. 211–218, October 2001.
fingerprinting approach,” IEEE Trans. on Audio, Speech, and
Language Processing, vol. 16, no. 2, pp. 359–371, February 2008. [21] H.-M. Yu, W.-H. Tsai, and H.-M. Wang, “A query-by-singing
system for retrieving karaoke music,” IEEE Trans. on
[5] Y. Jinhee, P. Sanghyun, and K. Inbum, “An efficient frequent Multimedia, vol. 10, no. 8, pp. 1626–1637, December 2008.
melody indexing method to improve the performance of query-
by-humming systems,” J. Info. Sci., vol. 34, no. 6, pp. 777–798, [22] A. N. Myna, V. Chaitra, and K. S. Smitha, “Melody information
December 2008. retrieval system using dynamic time warping,” Proc. 2009 WRI
World Congress on Computer Science and Information
[6] S. Rho, B. Han, E. Hwang, and M. Kim, “MUSEMBLE: A novel Engineering, vol. 5, Los Angeles, California, USA, pp. 266–270,
music retrieval system with automatic voice query transcription March-April 2009.
and reformulation,” J. Syst. Softw., vol. 81, no. 7, pp. 1065–1080,
July 2008. [23] U. Bagci, and E. Erzin, “Automatic classification of musical
genres using inter-genre similarity,” IEEE Signal Process. Lett.,
[7] P. Li, M. Zhou, X. Wang, X. Wang, N. Li, and L. Xie, “A novel vol. 14, no. 8, pp. 521–524, August 2007.
MIR framework and application with automatic voice processing,
database construction and fuzzy matching,” Proc. 2nd Int. Conf. [24] J. Shen, D. Tao, and X. Li, “QUC-Tree: Integrating query
Computer and Automation Engineering, Singapore, vol. 1, pp. context information for efficient music retrieval,” IEEE Trans.
20–24, February 2010. on Multimedia, vol. 11, no. 2, pp. 313–323, February 2009.
[8] K. Kichul, R. P. Kang, S.-J. Park, S.-P. Lee, and Y. K. Moo, [25] A. Ghias, J. Logan, D. Chamberlain, and B. Smith, “Query by
“Robust query-by-singing/humming system against background humming – Musical information retrieval in an audio database,”
noise environments,” IEEE Trans. on Consumer Electronics, vol. Proc. ACM Multimedia, San Francisco, California, USA, pp.
57, no. 2, pp. 720–725, May 2011. 231–236, November 1995.
[9] H. Pierre, and R. Matthias, “Query by tapping system based on [26] R. Typke, and L. Prechelt, “An interface for melody input,”
alignment algorithm,” Proc. IEEE Int. Conf. Acoustics, Speech, ACM Trans. on Computer-Human Interaction, vol. 8, no. 2, pp.
and Signal Processing, Taipei, Taiwan, pp. 1881–1884, April 133–149, June 2001.
2009.
[27] R. J. McNab, L. A. Smith, I. H. Witten, and C. L. Henderson,
[10] F. Pereira, A. Vetro, and T. Sikora, “Multimedia retrieval and “Tune retrieval in the multimedia library,” Multimedia Tools
delivery: Essential metadata challenges and standards,” Proc. Appl., vol. 10, pp. 113–132, April 2000.
IEEE, vol. 96, no. 4, pp. 721–744, April 2008.
[28] T. Sonoda, and Y. Muraoka, “A www-based melody retrieval
[11] J.-C. Wang, M.-S. Wu, H.-M. Wang, and S.-K. Jeng, “A content- system - An indexing method for a large melody database,” Proc.
based music search system using query by multi-tags with multi- of Int. Computer Music Conference, Berlin, Germany, pp. 170–
levels of preference,” Proc. IEEE Int. Conf. Multimedia and 173, August–September 2000.
Expo, Suntec City, Singapore, pp. 1–6, July 2010.
[29] J.-S. Mo, C. H. Han, and Y.-S. Kim, “A melody-based similarity
[12] B. Whitman, and R. Rifkin, “Musical query-by-description as a computation algorithm for musical information,” Proc.
multiclass learning problem multimedia signal processing,” Proc. Workshop on Knowledge and Data Engineering Exchange,
IEEE Workshop on Multimedia Signal Processing, St. Thomas, Illinois, Chicago, USA, pp. 114–121, November 1999.
USA, pp. 153–156, December 2002.
[30] B. Tom, A. Søren, F. Brian, H. Christian, K. Jimmy, W. N. Lau,
[13] N. Hu, and R. B. Dannenberg, “A comparison of melodic and R. Thomas, “A system for recognition of hummed tunes,”
database retrieval techniques using sung queries,” Proc. 2nd Proc. COST G-6 Conf. on Digital Audio Effects, Limerick,
ACM/IEEE-CS Joint Conf. Digital Libraries, Portland, Oregon, Ireland, pp. 203–206, December 2001.
USA, pp. 301–307, July 2002.
[31] M. A. Raju, B. Sundaram, and P. Rao, “TANSEN: A query-by-
[14] J. T. Yao, and Y. Y. Yao, “Information granulation for web humming based music retrieval system,” Proc. of National Conf.
based information retrieval support systems,” Proc. Int. Society on Communications, Madras, India, pp. 75–79, January-February
for Optical Engineering, Orlando, Florida, USA, pp. 138–146, 2003.
April 2003.
[32] Y. Zhu and D. Shasha, “Query by humming: A time series
[15] J.-S. R. Jang, C.-L. Hsu, and H.-R. Lee, “Continuous HMM and database approach,” Proc. ACM Special Interest Group on
its enhancement for singing/humming query retrieval,” Proc. Int. Management of Data, San Diego, California, USA, June 2003.
Symp. Music Information Retrieval, London, UK, pp. 546–551,
September 2005.
BIOGRAPHIES
[16] H. Takeda, N. Saito, T. Otsuki, M. Nakai, H. Shimodaira, and S.
Sagayama, “Hidden Markov model for automatic transcription of Yo-Ping Huang received his PhD
MIDI signals,” Proc. IEEE Workshop on Multimedia Signal
Processing, St. Thomas, Virgin Islands, USA, pp. 428–431, in Electrical Engineering from
December 2002. Texas Tech University, Lubbock,
[17] K. Lemström, String Matching Techniques for Music Retrieval, TX, USA. He is currently a
PhD Thesis, Department of Computer Science, Faculty of Professor in the Department of
Science, University of Helsinki, 2000. Electrical Engineering at National
[18] C. Parker, “Towards intelligent string matching in query-by- Taipei University of Technology
humming systems,” Proc. IEEE Int. Conf. Multimedia and Expo, (NTUT), Taiwan. He also serves
vol. 2, Baltimore, Maryland, USA, pp. 25–28, July 2003. as CEO of the Joint Commission
of Technological and Vocational
6 Copyright ⓒ 2010 Future Technology Research Association International
7. Volume 3, Number 4, December 2012 Journal of Convergence
College Admission Committee in Taiwan. He was Secretary
General at NTUT, Chairman of IEEE CIS Taipei Chapter, and
Vice Chairman of IEEE SMC Taipei Chapter. He was
Professor and Dean of the College of Electrical Engineering
and Computer Science, Tatung University, Taipei, before
joining NTUT. His research interests include medical
knowledge mining, intelligent control systems, and handheld
device application systems design. Prof. Huang is a senior
member of the IEEE and a fellow of the IET.
Shin-Liang Lai received BS and
MS degrees in Mathematics from
the National Taipei University of
Education, Taipei, Taiwan, in 1998
and 2002, respectively. In 2012, he
received his PhD in Computer
Science from the Department of
Computer Science and Engineering,
Tatung University, Taipei, Taiwan.
His research interests include
content-based music information
retrieval, data mining, and fuzzy inference systems.
Copyright ⓒ 2010 Future Technology Research Association International 7
8. Journal of Convergence Volume 3, Number 4, December 2012
This page is intentionally left blank
8 Copyright ⓒ 2010 Future Technology Research Association International