SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Concept Extraction with
Convolutional Neural
Networks
Andreas Waldis, Luca Mazzola, and Michael Kaufmann
HSLU - Lucerne University of Applied Sciences,
School of Information Technology,
6343 - Rotkreuz,
Switzerland
7th International Conference on Data Science, Technology and Applications
DATA 2018 27/07/2018
Slide 2, 27-Jul-18
- XMAS: Cross-platform Mediation, Association and
Search engine
- Knowledge Management Tool
- Automatic document tagging
- Recognition of Concepts
- Represented as N-Grams (sequences of words)
- Objective: create an index based model for
Keyconcept extraction
Context
• XMAS
• Knowledge Management Tool
• Automatic Keywords extraction
DATA 2018 27/07/2018
Slide 3, 27-Jul-18
X-MAS example
• Concepts extracted
• Automatic summarization (KW)
DATA 2018 27/07/2018
Slide 4, 27-Jul-18
- Part of Speech (NLP)
- Based on syntactical characteristics of
language and frequency of typical constructs
- Requires the exhaustive creation of words n-
grams combinations (over linear) and
frequency filtering
- POS limitations
- Language dependent
- Manually laborious to design the acceptable
pattern
- Including longer n-grams reduces significantly
the precision (even if increases coverage)
POS solution
• POS limitations
DATA 2018 27/07/2018
Slide 5, 27-Jul-18
Examples
• POS performances
TP + FP = positive
TN + FN = negative
Pos/Neg class
True/False match
DATA 2018 27/07/2018
P
N
T
F
Concepts
Positive
True
Rutgers Preparatory School
Watts 103rd Street Rhythm Band
Twinkle Twinkle Little Star
Accademia di Belle Arti di Roma
False
Oricon Weekly Albums Chart
Grand Forks-ND-MN Metropolitan Statistical
Apple CEO Steve Jobs
Negative
True
the rims of the
in consonance with the
in which they were written
was interred in Spring Grove Cemetery
False
Los Angeles Film Critics Association Awards
United States Citizenship and Immigration Services
State of North Carolina
1917 October Revolution
Slide
Neural Network
Slide 7, 27-Jul-18
- Capability of identifying automatically:
- Regularities in the data
- Meaning of particular constructs
- Possibilities of add non-linearity by means of ReLU
activation units
- Deep model allows extremely compact network to
understand very complex problems.
- Can use any encoding of data
- We relied on the Word2Vec-plus by Google
Neural Network motivation
• Automatic knowledge extraction
• Multiple hidden layers
• Compatible with every data
encoding available
DATA 2018 27/07/2018
Slide 8, 27-Jul-18
Data preprocessing
DATA 2018 27/07/2018
Slide 9, 27-Jul-18
- Use the Word2Vec-plus
- Holds the word vector, including also some
contextual information
- Can provide a representation for unseen words:
a) Computation based on 4 surronding words
b) Vector update
Word Embedding
• Vector representation of word
• Holds some context, also
• Can also represent unseen word
DATA 2018 27/07/2018
Slide 10, 27-Jul-18
Training Pipeline
DATA 2018 27/07/2018
Slide 11, 27-Jul-18
Vertical vs. Horizontal layers
• Types of convolutions
• Vertical vs. Horizontal
DATA 2018 27/07/2018
Slide
Configurations
Slide 13, 27-Jul-18
Hyperparametrization
• Network configurations
• Parameters setting/limits
DATA 2018 27/07/2018
Slide 14, 27-Jul-18
Evaluation
• Evaluation Procedure
DATA 2018 27/07/2018
Slide 15, 27-Jul-18
- Lenght of N-Gram influences the results
- Percentage of valid concepts different per class:
Data Set distribution
• Dataset characterization
DATA 2018 27/07/2018
Slide
Results
DATA 2018 27/07/2018
Slide 17, 27-Jul-18
K-fold evaluation
• Cross evaluation
• 4-fold, 2 runs per config, 100
epochs training limit
DATA 2018 27/07/2018
F1 = 2*(Recall *
Precision) / (Recall +
Precision)
V6H3 precision along epochs
outlier
Slide 18, 27-Jul-18
Word embedding comprehension
DATA 2018 27/07/2018
Slide 19, 27-Jul-18
Examples
True False
Positive American Educational Research
Journal
Tianjin Medical University
carry out
Bono and The Edge
Sons of the San Joaquin
Glastonbury Lake Village
Earl of Darnley
Regiment Hussars
University of Theoretical Science
Inland Aircraft Fuel Depot
NHL and
Mexican State Senate
University of
Ireland Station
In process
Negative to the start of World War II
must complete their
just a small part
a citizen of Afghanistan who
itself include
NFL and the
a Sky
therefore it is
use by
in conversation with
Council of the Isles of Scilly
Xiahou Dun
The Tenant of Wildfell Hall
DATA 2018 27/07/2018
Slide 20, 27-Jul-18
Cross checking POS vs. CNN
DATA 2018 27/07/2018
Concepts
True (CNN) False (CNN)
Positive
True
Rutgers Preparatory School
Watts 103rd Street Rhythm Band
Twinkle Twinkle Little Star
Accademia di Belle Arti di Roma
Capitanes de Arecibo
Fort Belknap Indian Reservation
False
Republican President Richard
Senator Ted
East Stroudsburg Senior High School North
Charles Bender High School
The New York Times Guide
Zombie Movie Encyclopedia
Negative
True
Toronto was the
in which they were written
are a family of passerine birds which
the Art Center College of Design
the NWA World Middleweight Championship
language novel
False
Legislative Council of New South Wales
1917 October Revolution
EAFF East Asian Cup
West Surrey College of Art and Design
Federal University of Rio Grande do Sul
Los Angeles Film Critics Association Awards
United States Citizenship and Immigration
Services
State of North Carolina
1917 October Revolution
Slide 21, 27-Jul-18
Averaged Performances
DATA 2018 27/07/2018
Slide 22, 27-Jul-18
Learning Curves
DATA 2018 27/07/2018
Slide 23, 27-Jul-18
Performances w.r.t. the N-Gram length
• Dependency from lenght (n)
AUC= Area under Curve
 Global comparison
metric
DATA 2018 27/07/2018
Slide 24, 27-Jul-18
- We presented a CNN approach for automatic
concept extraction
- We demonstrate its competitiveness w.r.t. POS,
holding a slightly better F1 measure
- Increase in recall with loss of precision, with
increasing length of N-Gram.
- Possible next steps:
- Adopt other words embedding models
- Use different n-Gram sources, extracting them
from real world documents
- Use a different architecture (RNN) to try
capturing latent and long running relationship
(LSTM)
- Train individual instances for different n and
using then the aggregated results.
Conclusions
• Results achieved
• Limits still existing
• Next research possibilities
DATA 2018 27/07/2018
T direct
Research
Dr. Luca Mazzola
Research Associate
+41 41 757 68 90
luca.mazzola@hslu.ch
Rotkreuz
Questions
DATA 2018 27/07/2018

Contenu connexe

Similaire à Concept extraction with convolutional neural networks

Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Application layer
Application layerApplication layer
Application layerSohag Babu
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataIJCSIS Research Publications
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyCornelius Puschmann
 
Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?Denis Parra Santander
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 

Similaire à Concept extraction with convolutional neural networks (20)

resume_MH
resume_MHresume_MH
resume_MH
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Application layer
Application layerApplication layer
Application layer
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Big data storage
Big data storageBig data storage
Big data storage
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer Better... for Image Recommendation?
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 

Plus de Luca Mazzola

Document semantic characterization
Document semantic characterizationDocument semantic characterization
Document semantic characterizationLuca Mazzola
 
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...Luca Mazzola
 
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERUPattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERULuca Mazzola
 
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in ManufacturingODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in ManufacturingLuca Mazzola
 
Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...Luca Mazzola
 
MRC12_120915_MOCLog
MRC12_120915_MOCLogMRC12_120915_MOCLog
MRC12_120915_MOCLogLuca Mazzola
 
Icalt2012 presentation
Icalt2012 presentationIcalt2012 presentation
Icalt2012 presentationLuca Mazzola
 
Presentazione moodle notification_moodlemoot2011_trieste
Presentazione  moodle notification_moodlemoot2011_triestePresentazione  moodle notification_moodlemoot2011_trieste
Presentazione moodle notification_moodlemoot2011_triesteLuca Mazzola
 
Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010Luca Mazzola
 
Presentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - BariPresentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - BariLuca Mazzola
 
GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...Luca Mazzola
 
Protezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico IntroProtezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico IntroLuca Mazzola
 
Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Luca Mazzola
 
Toward adaptive presentations of student models in eLearning environments
Toward adaptive presentations
 of student models
in eLearning environmentsToward adaptive presentations
 of student models
in eLearning environments
Toward adaptive presentations of student models in eLearning environmentsLuca Mazzola
 
Towards Home Healthcare Informatics
Towards Home Healthcare InformaticsTowards Home Healthcare Informatics
Towards Home Healthcare InformaticsLuca Mazzola
 
Moodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorseMoodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorseLuca Mazzola
 
Presentazione per MIC 2008
Presentazione per MIC 2008Presentazione per MIC 2008
Presentazione per MIC 2008Luca Mazzola
 
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...Luca Mazzola
 

Plus de Luca Mazzola (19)

Document semantic characterization
Document semantic characterizationDocument semantic characterization
Document semantic characterization
 
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
DLP: a Web-based Facility for Exploration and Basic Modification of Ontologie...
 
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERUPattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
Pattern-Based Semantic Composition of Optimal Process Service Plans with ODERU
 
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in ManufacturingODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
 
Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...Phd defence: Learner Models in Online Personalized Educational Experiences: a...
Phd defence: Learner Models in Online Personalized Educational Experiences: a...
 
MRC12_120915_MOCLog
MRC12_120915_MOCLogMRC12_120915_MOCLog
MRC12_120915_MOCLog
 
Icalt2012 presentation
Icalt2012 presentationIcalt2012 presentation
Icalt2012 presentation
 
Presentazione moodle notification_moodlemoot2011_trieste
Presentazione  moodle notification_moodlemoot2011_triestePresentazione  moodle notification_moodlemoot2011_trieste
Presentazione moodle notification_moodlemoot2011_trieste
 
Ifhro2010
Ifhro2010Ifhro2010
Ifhro2010
 
Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010Presentazione Gvis MoodleMoot 2010
Presentazione Gvis MoodleMoot 2010
 
Presentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - BariPresentazione GISMO moodlemoot2010 - Bari
Presentazione GISMO moodlemoot2010 - Bari
 
GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...GVIS: a framework for graphical mashups of heterogeneous sources to support d...
GVIS: a framework for graphical mashups of heterogeneous sources to support d...
 
Protezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico IntroProtezione Dati Ambito Biomedico Intro
Protezione Dati Ambito Biomedico Intro
 
Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...
 
Toward adaptive presentations of student models in eLearning environments
Toward adaptive presentations
 of student models
in eLearning environmentsToward adaptive presentations
 of student models
in eLearning environments
Toward adaptive presentations of student models in eLearning environments
 
Towards Home Healthcare Informatics
Towards Home Healthcare InformaticsTowards Home Healthcare Informatics
Towards Home Healthcare Informatics
 
Moodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorseMoodle e la verifica dell'uso delle risorse
Moodle e la verifica dell'uso delle risorse
 
Presentazione per MIC 2008
Presentazione per MIC 2008Presentazione per MIC 2008
Presentazione per MIC 2008
 
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...Verso il ritorno della oralita? Una esperienza di radio online  nella scuola ...
Verso il ritorno della oralita? Una esperienza di radio online nella scuola ...
 

Dernier

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Dernier (20)

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

Concept extraction with convolutional neural networks

  • 1. Concept Extraction with Convolutional Neural Networks Andreas Waldis, Luca Mazzola, and Michael Kaufmann HSLU - Lucerne University of Applied Sciences, School of Information Technology, 6343 - Rotkreuz, Switzerland 7th International Conference on Data Science, Technology and Applications DATA 2018 27/07/2018
  • 2. Slide 2, 27-Jul-18 - XMAS: Cross-platform Mediation, Association and Search engine - Knowledge Management Tool - Automatic document tagging - Recognition of Concepts - Represented as N-Grams (sequences of words) - Objective: create an index based model for Keyconcept extraction Context • XMAS • Knowledge Management Tool • Automatic Keywords extraction DATA 2018 27/07/2018
  • 3. Slide 3, 27-Jul-18 X-MAS example • Concepts extracted • Automatic summarization (KW) DATA 2018 27/07/2018
  • 4. Slide 4, 27-Jul-18 - Part of Speech (NLP) - Based on syntactical characteristics of language and frequency of typical constructs - Requires the exhaustive creation of words n- grams combinations (over linear) and frequency filtering - POS limitations - Language dependent - Manually laborious to design the acceptable pattern - Including longer n-grams reduces significantly the precision (even if increases coverage) POS solution • POS limitations DATA 2018 27/07/2018
  • 5. Slide 5, 27-Jul-18 Examples • POS performances TP + FP = positive TN + FN = negative Pos/Neg class True/False match DATA 2018 27/07/2018 P N T F Concepts Positive True Rutgers Preparatory School Watts 103rd Street Rhythm Band Twinkle Twinkle Little Star Accademia di Belle Arti di Roma False Oricon Weekly Albums Chart Grand Forks-ND-MN Metropolitan Statistical Apple CEO Steve Jobs Negative True the rims of the in consonance with the in which they were written was interred in Spring Grove Cemetery False Los Angeles Film Critics Association Awards United States Citizenship and Immigration Services State of North Carolina 1917 October Revolution
  • 7. Slide 7, 27-Jul-18 - Capability of identifying automatically: - Regularities in the data - Meaning of particular constructs - Possibilities of add non-linearity by means of ReLU activation units - Deep model allows extremely compact network to understand very complex problems. - Can use any encoding of data - We relied on the Word2Vec-plus by Google Neural Network motivation • Automatic knowledge extraction • Multiple hidden layers • Compatible with every data encoding available DATA 2018 27/07/2018
  • 8. Slide 8, 27-Jul-18 Data preprocessing DATA 2018 27/07/2018
  • 9. Slide 9, 27-Jul-18 - Use the Word2Vec-plus - Holds the word vector, including also some contextual information - Can provide a representation for unseen words: a) Computation based on 4 surronding words b) Vector update Word Embedding • Vector representation of word • Holds some context, also • Can also represent unseen word DATA 2018 27/07/2018
  • 10. Slide 10, 27-Jul-18 Training Pipeline DATA 2018 27/07/2018
  • 11. Slide 11, 27-Jul-18 Vertical vs. Horizontal layers • Types of convolutions • Vertical vs. Horizontal DATA 2018 27/07/2018
  • 13. Slide 13, 27-Jul-18 Hyperparametrization • Network configurations • Parameters setting/limits DATA 2018 27/07/2018
  • 14. Slide 14, 27-Jul-18 Evaluation • Evaluation Procedure DATA 2018 27/07/2018
  • 15. Slide 15, 27-Jul-18 - Lenght of N-Gram influences the results - Percentage of valid concepts different per class: Data Set distribution • Dataset characterization DATA 2018 27/07/2018
  • 17. Slide 17, 27-Jul-18 K-fold evaluation • Cross evaluation • 4-fold, 2 runs per config, 100 epochs training limit DATA 2018 27/07/2018 F1 = 2*(Recall * Precision) / (Recall + Precision) V6H3 precision along epochs outlier
  • 18. Slide 18, 27-Jul-18 Word embedding comprehension DATA 2018 27/07/2018
  • 19. Slide 19, 27-Jul-18 Examples True False Positive American Educational Research Journal Tianjin Medical University carry out Bono and The Edge Sons of the San Joaquin Glastonbury Lake Village Earl of Darnley Regiment Hussars University of Theoretical Science Inland Aircraft Fuel Depot NHL and Mexican State Senate University of Ireland Station In process Negative to the start of World War II must complete their just a small part a citizen of Afghanistan who itself include NFL and the a Sky therefore it is use by in conversation with Council of the Isles of Scilly Xiahou Dun The Tenant of Wildfell Hall DATA 2018 27/07/2018
  • 20. Slide 20, 27-Jul-18 Cross checking POS vs. CNN DATA 2018 27/07/2018 Concepts True (CNN) False (CNN) Positive True Rutgers Preparatory School Watts 103rd Street Rhythm Band Twinkle Twinkle Little Star Accademia di Belle Arti di Roma Capitanes de Arecibo Fort Belknap Indian Reservation False Republican President Richard Senator Ted East Stroudsburg Senior High School North Charles Bender High School The New York Times Guide Zombie Movie Encyclopedia Negative True Toronto was the in which they were written are a family of passerine birds which the Art Center College of Design the NWA World Middleweight Championship language novel False Legislative Council of New South Wales 1917 October Revolution EAFF East Asian Cup West Surrey College of Art and Design Federal University of Rio Grande do Sul Los Angeles Film Critics Association Awards United States Citizenship and Immigration Services State of North Carolina 1917 October Revolution
  • 21. Slide 21, 27-Jul-18 Averaged Performances DATA 2018 27/07/2018
  • 22. Slide 22, 27-Jul-18 Learning Curves DATA 2018 27/07/2018
  • 23. Slide 23, 27-Jul-18 Performances w.r.t. the N-Gram length • Dependency from lenght (n) AUC= Area under Curve  Global comparison metric DATA 2018 27/07/2018
  • 24. Slide 24, 27-Jul-18 - We presented a CNN approach for automatic concept extraction - We demonstrate its competitiveness w.r.t. POS, holding a slightly better F1 measure - Increase in recall with loss of precision, with increasing length of N-Gram. - Possible next steps: - Adopt other words embedding models - Use different n-Gram sources, extracting them from real world documents - Use a different architecture (RNN) to try capturing latent and long running relationship (LSTM) - Train individual instances for different n and using then the aggregated results. Conclusions • Results achieved • Limits still existing • Next research possibilities DATA 2018 27/07/2018
  • 25. T direct Research Dr. Luca Mazzola Research Associate +41 41 757 68 90 luca.mazzola@hslu.ch Rotkreuz Questions DATA 2018 27/07/2018