SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
London Information Retrieval Meetup
19 Feb 2019
Introduction to Music Information Retrieval
Thoughts from a former bass player
Andrea Gazzarini, Software Engineer
19th February 2019
London Information Retrieval Meetup
Who I am
▪ Software Engineer (1999-)
▪ “Hermit” Software Engineer (2010-)
▪ Java & Information Retrieval Passionate
▪ Apache Qpid (past) Committer
▪ Husband & Father
▪ Bass Player
Andrea Gazzarini, “Gazza”
London Information Retrieval Meetup
Sease
Search Services
● Open Source Enthusiasts
● Apache Lucene/Solr experts
! Community Contributors
● Active Researchers
● Hot Trends : Learning To Rank, Document Similarity,
Search Quality Evaluation, Relevancy Tuning
London Information Retrieval Meetup
✓Music Information Retrieval (MIR)?
➢ Music Essentials
➢ Audio Processing
➢ Q&A
Agenda
London Information Retrieval Meetup
MIR is concerned with the extraction, analysis and usage of information about any kind of music
entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI
representation of a piece of music, or name of a music artist).”
Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web.
Dissertation, Johannes Kepler University, Wien (2003)
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from
music. MIR is a small but growing field of research with many real-world applications. Those involved in
MIR may have a background in in musicology, psychoacoustics, psychology, academic music study,
signal processing, informatics, machine learning, optical music recognition, computational intelligence or
some combination of these.
https://en.wikipedia.org/wiki/Music_information_retrieval
Music Information Retrieval (MIR)
London Information Retrieval Meetup
AUDIO IDENTIFICATION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
COVER SONG DETECTION
SYMBOLIC SIMILARITY
MOOD
SOURCE SEPARATION
INSTRUMENT RECOGNITION
TEMPO ESTIMATION
SCORE ALIGNMENT
SONG STRUCTURE
BEAT TRACKING
KEY DETECTION
QUERY BY HUMMINGQUERY BY HUMMING
AUDIO IDENTIFICATION
INSTRUMENT RECOGNITION
GENRE IDENTIFICATION
TRANSCRIPTION RECOMMENDATION
TEMPO ESTIMATION
SONG STRUCTURE
SCORE ALIGNMENT
COVER SONG DETECTION
SYMBOLIC SIMILARITY
KEY DETECTION
BEAT TRACKING
MOOD
SOURCE SEPARATION
Music Information Retrieval (MIR)
London Information Retrieval Meetup
Music Content includes all those low-level things we
can extract from the audio signal (e.g. time,
frequencies, loudness)
Computational Factors
Context
State
Music Content
Music Context
Music Context defines additional metadata that
cannot be extracted from the audio signal (e.g. lyrics,
tags, artists, feedback, posts)
Listener state includes the user state in a given
moment (e.g. mood, musical knowledge, preferences)
Listener Context relates to the environment where
the listener is in a given moment (e.g. political,
geographical, social)
Factors in Music Perception
London Information Retrieval Meetup
➢ Music Information Retrieval (MIR)
✓Music Essentials
‣ Essentials
‣ Score Music Representation
‣ Symbolic Representations
‣ Audio Representation
➢ Audio Processing
➢ Q&A
Agenda
London Information Retrieval Meetup
A note is used for denoting a sound, its pitch and duration
A sound is the audio signal produced by a vibrating body
Notes are associated to graphical symbols (for indicating the pitch and the duration)
Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As
consequence of that, we say they belong to the same pitch class
A note is also used for denoting a pitch class. The traditional music theory individuates 12 pitch classes
Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI)
C D E
F G A B
C
B A
G F E D
C
C#
D# F# G#
A#
Bb Ab Gb Eb Db
Music Language Essentials
London Information Retrieval Meetup
Text Music
Letter Note
Word
Phrase
Sentence
Chord
Ghost Note
Phrase
Text vs Music
London Information Retrieval Meetup
Time Signature
Key Signature
Clef
Tempo
Note
Reference Chord
Chord
Score music representation
London Information Retrieval Meetup
Symbolic music representations comprise any
kind of score representation with an explicit
encoding of notes or other musical events.
Piano Roll, initially used for denoting rolls of
paper with holes for controlling a melody
execution on a self-playing device, it is nowadays
used for referring to a digital visualisation which
provides pitches over time.
Musical Instrument Digital Interface (MIDI) is
another representation, widely adopted, for
representing music event (e.g. pitch, velocity,
duration, intensity)
Piano Roll & MIDI
Symbolic music representation
London Information Retrieval Meetup
MusicXML [1] is an XML dialect for expressing Music
in XML format.
As you can imagine from the example on the right,
encoding a whole song will result in a huge and
verbose textual representation (that’s XML!).
For that reason MusicXML 2.0 introduced a
compressed format with a .mxml suffix
• Widely supported (scorewriting, OCR, sequencer)
• Easy to understand
• Full support of music features
MusicXML
Part
Time
Clef
Note(s)
[1] https://www.musicxml.com
MusicXML
London Information Retrieval Meetup
The Parsons code, formally named the Parsons
code for melodic contours, is a simple notation
used to identify a piece of music through melodic
motion — movements of the pitch up and down.
(https://en.wikipedia.org/wiki/Parsons_code)
The encoding focuses on the pitch relation between
subsequent notes. Main points about this method are:
• Simplicity
• Being a textual encoding it offers interesting
challenges in text search engines
• Limited: It doesn’t consider at all important
features like time and intervals, pauses, ghost
notes
Parsons CodeSymbol Description
* First note of a sequence
u,/
“up”, the note is higher than the
previous one
d,
“down”, the note is lower than
the previous one
r,-
“repeat”, the note is the same
of the previous one
Parsons Code (1/4)
London Information Retrieval Meetup
Parsons Code (2/4)
London Information Retrieval Meetup
*
*
r
u u rr u r u r d r d r
d r d r
u r u r u r u r
*
u
d d d u u uX
u
d d d u u uXd
Money, Pink Floyd
Parsons Code (3/4)
London Information Retrieval Meetup
Tempo (Time)
Intervals
Rests
Ghost Notes
Parsons Code (4/4)
London Information Retrieval Meetup
Digital computers can only capture this data at discrete moments in time. The rate at which a
computer captures audio data is called the sampling frequency or sampling rate.
An audio signal is a representation of sound that represents the fluctuation in air pressure
caused by the vibration as a function of time. Unlike sheet music or symbolic representations,
audio representations encode everything that is necessary to reproduce an acoustic realization
of a piece of music.
Audio Representation: Time Domain
London Information Retrieval Meetup
The Frequency Domain representation
decomposes the audio signal in a number of
waves oscillating a different frequencies.
The FD plots the frequencies on the
horizontal axis by their corresponding
magnitude (power) on the vertical axis.
This representation, among other things, can
be used for highlighting the dominant
frequencies of a musical tone.
Frequency Domain
Frequency Domain
London Information Retrieval Meetup
➢ Music Information Retrieval (MIR)
➢ Music Essentials
✓ Audio Processing
‣ Basic Pipeline
‣ Time Domain Features
‣ Frequency Domain Features
‣ Chroma Features
➢ Q&A
Agenda
London Information Retrieval Meetup
Time Domain Features Extraction
Frequency Domain Features Extraction
Sampling / Quantization
Framing
Windowing
FFT
Analog Signal
Basic Audio Processing Pipeline
London Information Retrieval Meetup
Amplitude Envelope (AE)
Max amplitude within a frame
Root-Mean-Square Energy (RMS)
Perceived sound intensity
Zero Crossing Rate (ZCR)
Number of times the amplitude changes its sign within a frameFeature
Example
Usage
Loudness Estimation
Timbre Analysis
Speech Recognition
Audio Segmentation
Onset Detection
Time Domain Features
London Information Retrieval Meetup
Band Energy Ratio (BER)
Ratio between lower and higher
frequency bands energy
Spectral Centroid
Frequency band where most of
the energy is concentrated
Bandwidth (BW)
Spectral range of interesting
part of a signal
Feature
Example
Usage
Timbre Analysis
Speech Recognition
Onset DetectionSpeech/Music Discrimination
Spectral Flux
Frequency band where most of
the energy is concentrated
Frequency Domain Features
London Information Retrieval Meetup
Chroma features are a powerful representation for
music audio in which the entire spectrum is
projected onto 12 bins representing the 12 distinct
semitones (or chroma) of the musical octave.
It’s a kind of analysis which bridges between low-level
and middle-level features, moving the audio signal
representation toward something which is more
readable, from a functional perspective.
Chroma Features
Chroma Features (1/2)
London Information Retrieval Meetup
Time
C
D
E
F
G
A
B
C#
D#
F#
G#
A#
A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B
N
O
I
S
E
Chroma Features (2/2)
London Information Retrieval Meetup
FALCON: FAst Lucene-based Cover sOng identification | chromaprint (part of AcustID)
Interesting Projects
London Information Retrieval Meetup
➢ Music Information Retrieval (MIR)
➢ Music Representation
➢ Audio Processing
✓ Q&A
Agenda
London Information Retrieval Meetup
19 Feb 2019
Thank you!
Introduction to Music Information Retrieval
Thoughts from a former bass player
Andrea Gazzarini, Software Engineer
19th February 2019

Contenu connexe

Tendances

Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Signal flow and Audio Consoles
Signal flow and Audio ConsolesSignal flow and Audio Consoles
Signal flow and Audio ConsolesAjoi Dzulhafidz
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR)
Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR) Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR)
Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR) Chandrashekhar Padole
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x musicYi-Hsuan Yang
 
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3Naoya Takahashi
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離Kitamura Laboratory
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processingazhagujaisudhan
 
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...ssuserf54db1
 

Tendances (20)

Lip reading Project
Lip reading ProjectLip reading Project
Lip reading Project
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Cohen v1
Cohen v1Cohen v1
Cohen v1
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Noise Models
Noise ModelsNoise Models
Noise Models
 
Signal flow and Audio Consoles
Signal flow and Audio ConsolesSignal flow and Audio Consoles
Signal flow and Audio Consoles
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR)
Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR) Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR)
Digital Signal Processing Tutorial: Chapt 4 design of digital filters (FIR)
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x music
 
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
音源分離 ~DNN音源分離の基礎から最新技術まで~ Tokyo bishbash #3
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Seminar
SeminarSeminar
Seminar
 
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離調波打撃音モデルに基づく線形多チャネルブラインド音源分離
調波打撃音モデルに基づく線形多チャネルブラインド音源分離
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
Interspeech2020 paper reading workshop "Similarity-and-Independence-Aware-Bea...
 
digital filter design
digital filter designdigital filter design
digital filter design
 

Similaire à Introduction to Music Information Retrieval

Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based RankingAndrea Gazzarini
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
 
RDA for music cataloguers
RDA for music cataloguersRDA for music cataloguers
RDA for music cataloguersPeter Sime
 
Alpan Aytekin-Game Audio Essentials
Alpan Aytekin-Game Audio EssentialsAlpan Aytekin-Game Audio Essentials
Alpan Aytekin-Game Audio Essentialsgamedevelopersturkey
 
Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...
Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...
Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...sebastianewert
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic WebYves Raimond
 
Poster vega north
Poster vega northPoster vega north
Poster vega northAcxelVega
 
Denktank 2010
Denktank 2010Denktank 2010
Denktank 2010ocor203
 
Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...kthrlab
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingijma
 
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...I MT
 
Notating pop music
Notating pop musicNotating pop music
Notating pop musicxjkoboe
 
Audio Processing and Music Recognition
Audio Processing and Music RecognitionAudio Processing and Music Recognition
Audio Processing and Music RecognitionMrinmoy Dalal
 
Music Cataloging Basics Workshop - Slides
Music Cataloging Basics Workshop - SlidesMusic Cataloging Basics Workshop - Slides
Music Cataloging Basics Workshop - SlidesALATechSource
 

Similaire à Introduction to Music Information Retrieval (20)

Interval Hashing Based Ranking
Interval Hashing Based RankingInterval Hashing Based Ranking
Interval Hashing Based Ranking
 
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based RankingMusical Information Retrieval Take 2: Interval Hashing Based Ranking
Musical Information Retrieval Take 2: Interval Hashing Based Ranking
 
RDA for music cataloguers
RDA for music cataloguersRDA for music cataloguers
RDA for music cataloguers
 
Alpan Aytekin-Game Audio Essentials
Alpan Aytekin-Game Audio EssentialsAlpan Aytekin-Game Audio Essentials
Alpan Aytekin-Game Audio Essentials
 
Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...
Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...
Semantic Linking of Information, Content and Metadata for Early Music (SLICKM...
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic Web
 
Poster vega north
Poster vega northPoster vega north
Poster vega north
 
Denktank 2010
Denktank 2010Denktank 2010
Denktank 2010
 
Music mobile
Music mobileMusic mobile
Music mobile
 
Sound
SoundSound
Sound
 
楊奕軒/音樂資料檢索
楊奕軒/音樂資料檢索楊奕軒/音樂資料檢索
楊奕軒/音樂資料檢索
 
Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...
 
MIR
MIRMIR
MIR
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key finding
 
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...
Colloque IMT -04/04/2019- L'IA au cœur des mutations industrielles - "Machine...
 
Understang Music with Machine Learning by Jimena Royo-Letelier
Understang Music with Machine Learning by Jimena Royo-LetelierUnderstang Music with Machine Learning by Jimena Royo-Letelier
Understang Music with Machine Learning by Jimena Royo-Letelier
 
Ism2011
Ism2011Ism2011
Ism2011
 
Notating pop music
Notating pop musicNotating pop music
Notating pop music
 
Audio Processing and Music Recognition
Audio Processing and Music RecognitionAudio Processing and Music Recognition
Audio Processing and Music Recognition
 
Music Cataloging Basics Workshop - Slides
Music Cataloging Basics Workshop - SlidesMusic Cataloging Basics Workshop - Slides
Music Cataloging Basics Workshop - Slides
 

Dernier

10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Dernier (20)

10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Introduction to Music Information Retrieval

  • 1. London Information Retrieval Meetup 19 Feb 2019 Introduction to Music Information Retrieval Thoughts from a former bass player Andrea Gazzarini, Software Engineer 19th February 2019
  • 2. London Information Retrieval Meetup Who I am ▪ Software Engineer (1999-) ▪ “Hermit” Software Engineer (2010-) ▪ Java & Information Retrieval Passionate ▪ Apache Qpid (past) Committer ▪ Husband & Father ▪ Bass Player Andrea Gazzarini, “Gazza”
  • 3. London Information Retrieval Meetup Sease Search Services ● Open Source Enthusiasts ● Apache Lucene/Solr experts ! Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning
  • 4. London Information Retrieval Meetup ✓Music Information Retrieval (MIR)? ➢ Music Essentials ➢ Audio Processing ➢ Q&A Agenda
  • 5. London Information Retrieval Meetup MIR is concerned with the extraction, analysis and usage of information about any kind of music entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI representation of a piece of music, or name of a music artist).” Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web. Dissertation, Johannes Kepler University, Wien (2003) Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications. Those involved in MIR may have a background in in musicology, psychoacoustics, psychology, academic music study, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these. https://en.wikipedia.org/wiki/Music_information_retrieval Music Information Retrieval (MIR)
  • 6. London Information Retrieval Meetup AUDIO IDENTIFICATION GENRE IDENTIFICATION TRANSCRIPTION RECOMMENDATION COVER SONG DETECTION SYMBOLIC SIMILARITY MOOD SOURCE SEPARATION INSTRUMENT RECOGNITION TEMPO ESTIMATION SCORE ALIGNMENT SONG STRUCTURE BEAT TRACKING KEY DETECTION QUERY BY HUMMINGQUERY BY HUMMING AUDIO IDENTIFICATION INSTRUMENT RECOGNITION GENRE IDENTIFICATION TRANSCRIPTION RECOMMENDATION TEMPO ESTIMATION SONG STRUCTURE SCORE ALIGNMENT COVER SONG DETECTION SYMBOLIC SIMILARITY KEY DETECTION BEAT TRACKING MOOD SOURCE SEPARATION Music Information Retrieval (MIR)
  • 7. London Information Retrieval Meetup Music Content includes all those low-level things we can extract from the audio signal (e.g. time, frequencies, loudness) Computational Factors Context State Music Content Music Context Music Context defines additional metadata that cannot be extracted from the audio signal (e.g. lyrics, tags, artists, feedback, posts) Listener state includes the user state in a given moment (e.g. mood, musical knowledge, preferences) Listener Context relates to the environment where the listener is in a given moment (e.g. political, geographical, social) Factors in Music Perception
  • 8. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ✓Music Essentials ‣ Essentials ‣ Score Music Representation ‣ Symbolic Representations ‣ Audio Representation ➢ Audio Processing ➢ Q&A Agenda
  • 9. London Information Retrieval Meetup A note is used for denoting a sound, its pitch and duration A sound is the audio signal produced by a vibrating body Notes are associated to graphical symbols (for indicating the pitch and the duration) Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As consequence of that, we say they belong to the same pitch class A note is also used for denoting a pitch class. The traditional music theory individuates 12 pitch classes Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI) C D E F G A B C B A G F E D C C# D# F# G# A# Bb Ab Gb Eb Db Music Language Essentials
  • 10. London Information Retrieval Meetup Text Music Letter Note Word Phrase Sentence Chord Ghost Note Phrase Text vs Music
  • 11. London Information Retrieval Meetup Time Signature Key Signature Clef Tempo Note Reference Chord Chord Score music representation
  • 12. London Information Retrieval Meetup Symbolic music representations comprise any kind of score representation with an explicit encoding of notes or other musical events. Piano Roll, initially used for denoting rolls of paper with holes for controlling a melody execution on a self-playing device, it is nowadays used for referring to a digital visualisation which provides pitches over time. Musical Instrument Digital Interface (MIDI) is another representation, widely adopted, for representing music event (e.g. pitch, velocity, duration, intensity) Piano Roll & MIDI Symbolic music representation
  • 13. London Information Retrieval Meetup MusicXML [1] is an XML dialect for expressing Music in XML format. As you can imagine from the example on the right, encoding a whole song will result in a huge and verbose textual representation (that’s XML!). For that reason MusicXML 2.0 introduced a compressed format with a .mxml suffix • Widely supported (scorewriting, OCR, sequencer) • Easy to understand • Full support of music features MusicXML Part Time Clef Note(s) [1] https://www.musicxml.com MusicXML
  • 14. London Information Retrieval Meetup The Parsons code, formally named the Parsons code for melodic contours, is a simple notation used to identify a piece of music through melodic motion — movements of the pitch up and down. (https://en.wikipedia.org/wiki/Parsons_code) The encoding focuses on the pitch relation between subsequent notes. Main points about this method are: • Simplicity • Being a textual encoding it offers interesting challenges in text search engines • Limited: It doesn’t consider at all important features like time and intervals, pauses, ghost notes Parsons CodeSymbol Description * First note of a sequence u,/ “up”, the note is higher than the previous one d, “down”, the note is lower than the previous one r,- “repeat”, the note is the same of the previous one Parsons Code (1/4)
  • 15. London Information Retrieval Meetup Parsons Code (2/4)
  • 16. London Information Retrieval Meetup * * r u u rr u r u r d r d r d r d r u r u r u r u r * u d d d u u uX u d d d u u uXd Money, Pink Floyd Parsons Code (3/4)
  • 17. London Information Retrieval Meetup Tempo (Time) Intervals Rests Ghost Notes Parsons Code (4/4)
  • 18. London Information Retrieval Meetup Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the sampling frequency or sampling rate. An audio signal is a representation of sound that represents the fluctuation in air pressure caused by the vibration as a function of time. Unlike sheet music or symbolic representations, audio representations encode everything that is necessary to reproduce an acoustic realization of a piece of music. Audio Representation: Time Domain
  • 19. London Information Retrieval Meetup The Frequency Domain representation decomposes the audio signal in a number of waves oscillating a different frequencies. The FD plots the frequencies on the horizontal axis by their corresponding magnitude (power) on the vertical axis. This representation, among other things, can be used for highlighting the dominant frequencies of a musical tone. Frequency Domain Frequency Domain
  • 20. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Music Essentials ✓ Audio Processing ‣ Basic Pipeline ‣ Time Domain Features ‣ Frequency Domain Features ‣ Chroma Features ➢ Q&A Agenda
  • 21. London Information Retrieval Meetup Time Domain Features Extraction Frequency Domain Features Extraction Sampling / Quantization Framing Windowing FFT Analog Signal Basic Audio Processing Pipeline
  • 22. London Information Retrieval Meetup Amplitude Envelope (AE) Max amplitude within a frame Root-Mean-Square Energy (RMS) Perceived sound intensity Zero Crossing Rate (ZCR) Number of times the amplitude changes its sign within a frameFeature Example Usage Loudness Estimation Timbre Analysis Speech Recognition Audio Segmentation Onset Detection Time Domain Features
  • 23. London Information Retrieval Meetup Band Energy Ratio (BER) Ratio between lower and higher frequency bands energy Spectral Centroid Frequency band where most of the energy is concentrated Bandwidth (BW) Spectral range of interesting part of a signal Feature Example Usage Timbre Analysis Speech Recognition Onset DetectionSpeech/Music Discrimination Spectral Flux Frequency band where most of the energy is concentrated Frequency Domain Features
  • 24. London Information Retrieval Meetup Chroma features are a powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave. It’s a kind of analysis which bridges between low-level and middle-level features, moving the audio signal representation toward something which is more readable, from a functional perspective. Chroma Features Chroma Features (1/2)
  • 25. London Information Retrieval Meetup Time C D E F G A B C# D# F# G# A# A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B N O I S E Chroma Features (2/2)
  • 26. London Information Retrieval Meetup FALCON: FAst Lucene-based Cover sOng identification | chromaprint (part of AcustID) Interesting Projects
  • 27. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Music Representation ➢ Audio Processing ✓ Q&A Agenda
  • 28. London Information Retrieval Meetup 19 Feb 2019 Thank you! Introduction to Music Information Retrieval Thoughts from a former bass player Andrea Gazzarini, Software Engineer 19th February 2019