Data Mining Specialization Capstone Project - Task 2

•

2 j'aime•654 vues

Marco Antonio Gonzalez Junior

University of Illinois

University of Illinois at Urbana-Champaign
Data Mining Specialization: Capstone Project
Marco Antonio Gonzalez Junior
September, 2015
Task 2 Report
Cuisine Clustering and Map Construction
1. Visualization of the Cuisine Map
The purpose of this section is to compute and visualise the similarity between cuisines.
The computation is based on their review texts. The output is a similarity matrix where each
cell corresponds to the similarity between a pair of cuisines. The opacity of each cell indicates
the level of similarity, the higher opacity, higher is the similarity.
A subset of the data was used. The criteria of selection is to process only the the review
about country specific cuisines. The whole dataset provided contains over one hundred
subjects and it is not feasible to compare all of them on a single matrix. So only files named
with country specific subjects were processed. A few examples are American, Argentine,
Brazilian, Greek, Chinese, French, German, Italian, Mexican, Japanese and so on.
The approach to obtain the similarity was to use Python to do topic modelling and
extract the 10 most important topics of each cuisine through LDA. Each file was processed
generating a new file with the same name, name of the cuisine, on another folder. This new
file contains the topic modelling for each country cuisine. These files were compared one
against each in order to other to compute the similarity between them. The technique used
was Cosine Similarity.
The results are shown in the Figure 1. The opacity means the level of similarity between
the cuisines. Higher opacity indicates higher similarity.

Figure 1: Visualisation of sample cuisines

2. Improving the Cuisine Map
Varying the similarity function by ﬁrst computing the similarity of each individual
review and then aggregating the similarity values improved the accuracy of similarity as
shown in Figure 2.
Figure 2: Improved visualisation of sample cuisines

3. Incorporating Clustering in Cuisine Map
Figure 3: Clustering cuisines

Figure 4: Improved clustering

Recommandé

OS18 - 8.b.3 Modelling the Impact of Farming Practices upon Vaccine Effecti...

OS18 - 8.b.3 Modelling the Impact of Farming Practices upon Vaccine Effecti...

OS18 - 8.b.3 Modelling the Impact of Farming Practices upon Vaccine Effecti...EuFMD

ScienceShare.co.uk Shared Resource

ScienceShare.co.uk Shared Resource

ScienceShare.co.uk Shared ResourceScienceShare.co.uk

How Composite Functions Apply To The Real World

How Composite Functions Apply To The Real World

How Composite Functions Apply To The Real Worldaschneider970

Dr. Jim Lowe and Dr. Ben Blair - Are cull sow movements impacting disease tra...

Dr. Jim Lowe and Dr. Ben Blair - Are cull sow movements impacting disease tra...

Dr. Jim Lowe and Dr. Ben Blair - Are cull sow movements impacting disease tra...John Blue

Trabalho de Computação Gráfica - Infográfico

Trabalho de Computação Gráfica - Infográfico

Trabalho de Computação Gráfica - InfográficoMarco Antonio Gonzalez Junior

Coursera Data Mining 2016

Coursera Data Mining 2016

Coursera Data Mining 2016Vadim Kyssa

Capstone Project Slides- Yelper

Capstone Project Slides- Yelper

Capstone Project Slides- YelperChuan Sun

Data Mining Specialization - Capstone Project - Task 1

Data Mining Specialization - Capstone Project - Task 1

Data Mining Specialization - Capstone Project - Task 1Marco Antonio Gonzalez Junior

Recommandé

OS18 - 8.b.3 Modelling the Impact of Farming Practices upon Vaccine Effecti...

OS18 - 8.b.3 Modelling the Impact of Farming Practices upon Vaccine Effecti...

OS18 - 8.b.3 Modelling the Impact of Farming Practices upon Vaccine Effecti...EuFMD

ScienceShare.co.uk Shared Resource

ScienceShare.co.uk Shared Resource

ScienceShare.co.uk Shared ResourceScienceShare.co.uk

How Composite Functions Apply To The Real World

How Composite Functions Apply To The Real World

How Composite Functions Apply To The Real Worldaschneider970

Dr. Jim Lowe and Dr. Ben Blair - Are cull sow movements impacting disease tra...

Dr. Jim Lowe and Dr. Ben Blair - Are cull sow movements impacting disease tra...

Dr. Jim Lowe and Dr. Ben Blair - Are cull sow movements impacting disease tra...John Blue

Trabalho de Computação Gráfica - Infográfico

Trabalho de Computação Gráfica - Infográfico

Trabalho de Computação Gráfica - InfográficoMarco Antonio Gonzalez Junior

Coursera Data Mining 2016

Coursera Data Mining 2016

Coursera Data Mining 2016Vadim Kyssa

Capstone Project Slides- Yelper

Capstone Project Slides- Yelper

Capstone Project Slides- YelperChuan Sun

Data Mining Specialization - Capstone Project - Task 1

Data Mining Specialization - Capstone Project - Task 1

Data Mining Specialization - Capstone Project - Task 1Marco Antonio Gonzalez Junior

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Understanding the Laravel MVC Architecture

Understanding the Laravel MVC Architecture

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Breaking the Kubernetes Kill Chain: Host Path Mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024Results

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Maximizing Board Effectiveness 2024 Webinar.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPTExpeed Software

Contenu connexe

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Understanding the Laravel MVC Architecture

Understanding the Laravel MVC Architecture

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Breaking the Kubernetes Kill Chain: Host Path Mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024Results

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Maximizing Board Effectiveness 2024 Webinar.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

CNv6 Instructor Chapter 6 Quality of Service

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

My Hashitalk Indonesia April 2024 Presentation

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Understanding the Laravel MVC Architecture

Understanding the Laravel MVC Architecture

Understanding the Laravel MVC Architecture

Breaking the Kubernetes Kill Chain: Host Path Mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

A Call to Action for Generative AI in 2024

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

GenCyber Cyber Security Day Presentation

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

Unblocking The Main Thread Solving ANRs and Frozen Frames

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Friends Colony Women Seeking Men

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

IAC 2024 - IA Fast Track to Search Focused AI Solutions

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

How to Troubleshoot Apps for the Modern Connected Worker

Maximizing Board Effectiveness 2024 Webinar.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

En vedette

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture Code

Skeleton Culture Code

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next

Getting into the tech field. what next

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations

How to have difficult conversations

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data Science

Introduction to Data Science

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best Practices

Time Management & Productivity - Best Practices

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project management

The six step guide to practical project management

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

En vedette (20)

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage Engineerings

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental Health

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

Skeleton Culture Code

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie Insights

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summary

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Getting into the tech field. what next

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search Intent

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

How to have difficult conversations

How to have difficult conversations

Introduction to Data Science

Introduction to Data Science

Introduction to Data Science

Time Management & Productivity - Best Practices

Time Management & Productivity - Best Practices

Time Management & Productivity - Best Practices

The six step guide to practical project management

The six step guide to practical project management

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Data Mining Specialization Capstone Project - Task 2

1. University of Illinois at Urbana-Champaign Data Mining Specialization: Capstone Project Marco Antonio Gonzalez Junior September, 2015 Task 2 Report Cuisine Clustering and Map Construction 1. Visualization of the Cuisine Map The purpose of this section is to compute and visualise the similarity between cuisines. The computation is based on their review texts. The output is a similarity matrix where each cell corresponds to the similarity between a pair of cuisines. The opacity of each cell indicates the level of similarity, the higher opacity, higher is the similarity. A subset of the data was used. The criteria of selection is to process only the the review about country specific cuisines. The whole dataset provided contains over one hundred subjects and it is not feasible to compare all of them on a single matrix. So only files named with country specific subjects were processed. A few examples are American, Argentine, Brazilian, Greek, Chinese, French, German, Italian, Mexican, Japanese and so on. The approach to obtain the similarity was to use Python to do topic modelling and extract the 10 most important topics of each cuisine through LDA. Each file was processed generating a new file with the same name, name of the cuisine, on another folder. This new file contains the topic modelling for each country cuisine. These files were compared one against each in order to other to compute the similarity between them. The technique used was Cosine Similarity. The results are shown in the Figure 1. The opacity means the level of similarity between the cuisines. Higher opacity indicates higher similarity.

2. Figure 1: Visualisation of sample cuisines

3. 2. Improving the Cuisine Map Varying the similarity function by ﬁrst computing the similarity of each individual review and then aggregating the similarity values improved the accuracy of similarity as shown in Figure 2. Figure 2: Improved visualisation of sample cuisines

4. 3. Incorporating Clustering in Cuisine Map Figure 3: Clustering cuisines

5. Figure 4: Improved clustering