SlideShare une entreprise Scribd logo
1  sur  22
WEB ANALYTICS -
ONLINE NEWS POPULARITY
TEAM – 11
KRUTIKA DEDHIA
KINJAL GADA
ANKUR VORA
ADVANCES IN DATA SCIENCES AND ARCHITECTURE
- PROF. SRIKANTH KRISHNAMURTHY
INTRODUCTION
• The dataset summarizes a set of features about articles published by Mashable,
a well-known news website over a period of two years.
• The objective is to predict the number of shares depending on the features if the
article to be published would be popular on the internet or no.
GOALS
• Create and evaluate regression, classification and clustering models in Microsoft
Azure Machine Learning Studio.
• Deploy the models as a web service to generate a REST API.
• Build the interactive web interface to predict the results.
DATASET
• Data Source : UCI ML Repository
https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity
• Number of attributes: 61
• Number of records: 39,645
• Dependent variable: Number of shares
DATA MODIFICATION
• Type of Data : 1 – business, 2 – lifestyle, 3 – entertainment, 4 - social media, 5 –
technology, 6 – world
• Extracted the date from the URL column.
• Day of week : 0 – Sunday, 1 – Monday, 2 – Tuesday, 3 – Wednesday, 4 –
Thursday, 5 – Friday, 6 – Saturday
• Web Scraping : Topics, Channel, Author
PROCESS
• Created training models for regression, classification and clustering in Azure ML.
• Created predictive experiment for the above trained models.
• Deployed the models as a web service and generated a REST API.
• Designed UI using Java Spring MVC, HTML, Bootstrap, Ajax along with user
validations.
MACHINE LEARNING
ALGORITMHS
REGRESSION MODELS
• Used Azure ML regression modules
• Decision Forest, Neural Network, Poisson Regression and Boosted Decision Tree
• Best Model: Random Forest based on lowest RMSE value
RANDOM FOREST
CLASSIFICATION MODELS
• Used Azure ML classification components Two Class Decision Forest, Two Class
Neural Network and Two Class Boosted Decision Tree
• Added attribute isPopular :
• Shares <= 1400 : high popular
• Shares > 1400 : less popular
• Best Model : Two Class Boosted Decision Tree Based on the high Accuracy and
AUC value
TWO CLASS BOOSTED DECISION CLASSIFICATION
CLUSTERING MODELS
• Used K-means Clustering
• No of clusters used is 3 (k = 3).
• Determines the distance of articles based on a few parameters from the centroid
of clusters.
DEMO
• Web User Interface
ANALYSIS
TABLEAU ANALYSIS
TABLEAU ANALYSIS
TABLEAU ANALYSIS
CHALLENGES
• Formatting data after Web Scraping.
• Understanding the variables like keywords, subjectivity.
• Finding relation between variables and feature selection for modelling.
LINKS
• URL – http://sample-env-1.xhmp4ynr7g.us-east-1.elasticbeanstalk.com/
• Github – https://github.com/voraankur/ADS/tree/master/Final%20Project
REFERENCES
• https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity
• https://repositorium.sdum.uminho.pt/bitstream/1822/39169/1/main.pdf
CONTRIBUTION
• Ankur – Regression Models, Web Interface
• Kinjal – Data cleaning, Web Scraping, Clustering, Report
• Krutika – Classification Models, Presentation, Tableau Analysis
THANK YOU

Contenu connexe

Similaire à Online news popularity analysis

Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in AzureValdas Maksimavičius
 
Soumya kambhapati resume_mar2016
Soumya kambhapati resume_mar2016Soumya kambhapati resume_mar2016
Soumya kambhapati resume_mar2016Soumya Kambhampati
 
Community Resource Portal for the Healthcare Sector
Community Resource Portal for the Healthcare SectorCommunity Resource Portal for the Healthcare Sector
Community Resource Portal for the Healthcare SectorMike Taylor
 
Introduction to Azure monitor
Introduction to Azure monitorIntroduction to Azure monitor
Introduction to Azure monitorPraveen Nair
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning Jesus Rodriguez
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 
Developing scalable enterprise serverless applications on azure with .net
Developing scalable enterprise serverless applications on azure with .netDeveloping scalable enterprise serverless applications on azure with .net
Developing scalable enterprise serverless applications on azure with .netCallon Campbell
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
 

Similaire à Online news popularity analysis (20)

Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
moharnab-ft
moharnab-ftmoharnab-ft
moharnab-ft
 
Trinada pabolu profile
Trinada pabolu profileTrinada pabolu profile
Trinada pabolu profile
 
Trinada pabolu profile
Trinada pabolu profileTrinada pabolu profile
Trinada pabolu profile
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Soumya kambhapati resume_mar2016
Soumya kambhapati resume_mar2016Soumya kambhapati resume_mar2016
Soumya kambhapati resume_mar2016
 
Community Resource Portal for the Healthcare Sector
Community Resource Portal for the Healthcare SectorCommunity Resource Portal for the Healthcare Sector
Community Resource Portal for the Healthcare Sector
 
Aakanksha_Agnani_j2016
Aakanksha_Agnani_j2016Aakanksha_Agnani_j2016
Aakanksha_Agnani_j2016
 
Introduction to Azure monitor
Introduction to Azure monitorIntroduction to Azure monitor
Introduction to Azure monitor
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
LinkedinResume
LinkedinResumeLinkedinResume
LinkedinResume
 
NikulChauhan-Resume
NikulChauhan-ResumeNikulChauhan-Resume
NikulChauhan-Resume
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
Developing scalable enterprise serverless applications on azure with .net
Developing scalable enterprise serverless applications on azure with .netDeveloping scalable enterprise serverless applications on azure with .net
Developing scalable enterprise serverless applications on azure with .net
 
Monitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & InsightsMonitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & Insights
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 

Dernier

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 

Dernier (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 

Online news popularity analysis

  • 1. WEB ANALYTICS - ONLINE NEWS POPULARITY TEAM – 11 KRUTIKA DEDHIA KINJAL GADA ANKUR VORA ADVANCES IN DATA SCIENCES AND ARCHITECTURE - PROF. SRIKANTH KRISHNAMURTHY
  • 2. INTRODUCTION • The dataset summarizes a set of features about articles published by Mashable, a well-known news website over a period of two years. • The objective is to predict the number of shares depending on the features if the article to be published would be popular on the internet or no.
  • 3. GOALS • Create and evaluate regression, classification and clustering models in Microsoft Azure Machine Learning Studio. • Deploy the models as a web service to generate a REST API. • Build the interactive web interface to predict the results.
  • 4. DATASET • Data Source : UCI ML Repository https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity • Number of attributes: 61 • Number of records: 39,645 • Dependent variable: Number of shares
  • 5. DATA MODIFICATION • Type of Data : 1 – business, 2 – lifestyle, 3 – entertainment, 4 - social media, 5 – technology, 6 – world • Extracted the date from the URL column. • Day of week : 0 – Sunday, 1 – Monday, 2 – Tuesday, 3 – Wednesday, 4 – Thursday, 5 – Friday, 6 – Saturday • Web Scraping : Topics, Channel, Author
  • 6. PROCESS • Created training models for regression, classification and clustering in Azure ML. • Created predictive experiment for the above trained models. • Deployed the models as a web service and generated a REST API. • Designed UI using Java Spring MVC, HTML, Bootstrap, Ajax along with user validations.
  • 8. REGRESSION MODELS • Used Azure ML regression modules • Decision Forest, Neural Network, Poisson Regression and Boosted Decision Tree • Best Model: Random Forest based on lowest RMSE value
  • 10. CLASSIFICATION MODELS • Used Azure ML classification components Two Class Decision Forest, Two Class Neural Network and Two Class Boosted Decision Tree • Added attribute isPopular : • Shares <= 1400 : high popular • Shares > 1400 : less popular • Best Model : Two Class Boosted Decision Tree Based on the high Accuracy and AUC value
  • 11. TWO CLASS BOOSTED DECISION CLASSIFICATION
  • 12. CLUSTERING MODELS • Used K-means Clustering • No of clusters used is 3 (k = 3). • Determines the distance of articles based on a few parameters from the centroid of clusters.
  • 13. DEMO • Web User Interface
  • 18. CHALLENGES • Formatting data after Web Scraping. • Understanding the variables like keywords, subjectivity. • Finding relation between variables and feature selection for modelling.
  • 19. LINKS • URL – http://sample-env-1.xhmp4ynr7g.us-east-1.elasticbeanstalk.com/ • Github – https://github.com/voraankur/ADS/tree/master/Final%20Project
  • 21. CONTRIBUTION • Ankur – Regression Models, Web Interface • Kinjal – Data cleaning, Web Scraping, Clustering, Report • Krutika – Classification Models, Presentation, Tableau Analysis