Online news popularity analysis

•Télécharger en tant que PPTX, PDF•

2 j'aime•2,409 vues

As a part of 'Advance of Data Science and Architecture', we have done some analysis on Online news popularity dataset by mashable.com

Données & analyses

WEB ANALYTICS -
ONLINE NEWS POPULARITY
TEAM – 11
KRUTIKA DEDHIA
KINJAL GADA
ANKUR VORA
ADVANCES IN DATA SCIENCES AND ARCHITECTURE
- PROF. SRIKANTH KRISHNAMURTHY

INTRODUCTION
• The dataset summarizes a set of features about articles published by Mashable,
a well-known news website over a period of two years.
• The objective is to predict the number of shares depending on the features if the
article to be published would be popular on the internet or no.

GOALS
• Create and evaluate regression, classification and clustering models in Microsoft
Azure Machine Learning Studio.
• Deploy the models as a web service to generate a REST API.
• Build the interactive web interface to predict the results.

DATASET
• Data Source : UCI ML Repository
https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity
• Number of attributes: 61
• Number of records: 39,645
• Dependent variable: Number of shares

DATA MODIFICATION
• Type of Data : 1 – business, 2 – lifestyle, 3 – entertainment, 4 - social media, 5 –
technology, 6 – world
• Extracted the date from the URL column.
• Day of week : 0 – Sunday, 1 – Monday, 2 – Tuesday, 3 – Wednesday, 4 –
Thursday, 5 – Friday, 6 – Saturday
• Web Scraping : Topics, Channel, Author

PROCESS
• Created training models for regression, classification and clustering in Azure ML.
• Created predictive experiment for the above trained models.
• Deployed the models as a web service and generated a REST API.
• Designed UI using Java Spring MVC, HTML, Bootstrap, Ajax along with user
validations.

REGRESSION MODELS
• Used Azure ML regression modules
• Decision Forest, Neural Network, Poisson Regression and Boosted Decision Tree
• Best Model: Random Forest based on lowest RMSE value

CLASSIFICATION MODELS
• Used Azure ML classification components Two Class Decision Forest, Two Class
Neural Network and Two Class Boosted Decision Tree
• Added attribute isPopular :
• Shares <= 1400 : high popular
• Shares > 1400 : less popular
• Best Model : Two Class Boosted Decision Tree Based on the high Accuracy and
AUC value

TWO CLASS BOOSTED DECISION CLASSIFICATION

CLUSTERING MODELS
• Used K-means Clustering
• No of clusters used is 3 (k = 3).
• Determines the distance of articles based on a few parameters from the centroid
of clusters.

CHALLENGES
• Formatting data after Web Scraping.
• Understanding the variables like keywords, subjectivity.
• Finding relation between variables and feature selection for modelling.

LINKS
• URL – http://sample-env-1.xhmp4ynr7g.us-east-1.elasticbeanstalk.com/
• Github – https://github.com/voraankur/ADS/tree/master/Final%20Project

REFERENCES
• https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity
• https://repositorium.sdum.uminho.pt/bitstream/1822/39169/1/main.pdf

CONTRIBUTION
• Ankur – Regression Models, Web Interface
• Kinjal – Data cleaning, Web Scraping, Clustering, Report
• Krutika – Classification Models, Presentation, Tableau Analysis

Contenu connexe

Similaire à Online news popularity analysis

Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania

Enterprise Architecture vs. Data ArchitectureDATAVERSITY

moharnab-ftMoharnab Saikia

Trinada pabolu profileRavikumar Pabolu

Making Data Scientists Productive in AzureValdas Maksimavičius

Soumya kambhapati resume_mar2016Soumya Kambhampati

Community Resource Portal for the Healthcare SectorMike Taylor

Aakanksha_Agnani_j2016Aakanksha Agnani

Introduction to Azure monitorPraveen Nair

201908 Overview of Automated MLMark Tabladillo

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale

A practical guidance of the enterprise machine learning Jesus Rodriguez

LinkedinResumeSurinder Sokhal

NikulChauhan-ResumeNikul Chauhan

Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf

Atlanta MLConfQubole

Developing scalable enterprise serverless applications on azure with .netCallon Campbell

Monitor Cloud Resources using Alerts & InsightsSynergetics Learning and Cloud Consulting

Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC

Similaire à Online news popularity analysis (20)

Alex mang patterns for scalability in microsoft azure application

Enterprise Architecture vs. Data Architecture

moharnab-ft

Trinada pabolu profile

Making Data Scientists Productive in Azure

Soumya kambhapati resume_mar2016

Community Resource Portal for the Healthcare Sector

Aakanksha_Agnani_j2016

Introduction to Azure monitor

201908 Overview of Automated ML

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

A practical guidance of the enterprise machine learning

LinkedinResume

NikulChauhan-Resume

Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15

Atlanta MLConf

Developing scalable enterprise serverless applications on azure with .net

Monitor Cloud Resources using Alerts & Insights

Phishing Website Detection by Machine Learning Techniques Presentation.pdf

Dernier

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls

7. Epi of Chronic respiratory diseases.pptibrahimabdi22

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg

Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg

Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg

Kings of Saudi Arabia, information about themeitharjee

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg

Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131

Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls

Dernier (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...

7. Epi of Chronic respiratory diseases.ppt

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...

Lecture_2_Deep_Learning_Overview-newone1

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...

Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...

Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

Kings of Saudi Arabia, information about them

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...

Dubai Call Girls Peeing O525547819 Call Girls Dubai

Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...

Online news popularity analysis

1. WEB ANALYTICS - ONLINE NEWS POPULARITY TEAM – 11 KRUTIKA DEDHIA KINJAL GADA ANKUR VORA ADVANCES IN DATA SCIENCES AND ARCHITECTURE - PROF. SRIKANTH KRISHNAMURTHY

2. INTRODUCTION • The dataset summarizes a set of features about articles published by Mashable, a well-known news website over a period of two years. • The objective is to predict the number of shares depending on the features if the article to be published would be popular on the internet or no.

3. GOALS • Create and evaluate regression, classification and clustering models in Microsoft Azure Machine Learning Studio. • Deploy the models as a web service to generate a REST API. • Build the interactive web interface to predict the results.

4. DATASET • Data Source : UCI ML Repository https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity • Number of attributes: 61 • Number of records: 39,645 • Dependent variable: Number of shares

5. DATA MODIFICATION • Type of Data : 1 – business, 2 – lifestyle, 3 – entertainment, 4 - social media, 5 – technology, 6 – world • Extracted the date from the URL column. • Day of week : 0 – Sunday, 1 – Monday, 2 – Tuesday, 3 – Wednesday, 4 – Thursday, 5 – Friday, 6 – Saturday • Web Scraping : Topics, Channel, Author

6. PROCESS • Created training models for regression, classification and clustering in Azure ML. • Created predictive experiment for the above trained models. • Deployed the models as a web service and generated a REST API. • Designed UI using Java Spring MVC, HTML, Bootstrap, Ajax along with user validations.

7. MACHINE LEARNING ALGORITMHS

8. REGRESSION MODELS • Used Azure ML regression modules • Decision Forest, Neural Network, Poisson Regression and Boosted Decision Tree • Best Model: Random Forest based on lowest RMSE value

9. RANDOM FOREST

10. CLASSIFICATION MODELS • Used Azure ML classification components Two Class Decision Forest, Two Class Neural Network and Two Class Boosted Decision Tree • Added attribute isPopular : • Shares <= 1400 : high popular • Shares > 1400 : less popular • Best Model : Two Class Boosted Decision Tree Based on the high Accuracy and AUC value

11. TWO CLASS BOOSTED DECISION CLASSIFICATION

12. CLUSTERING MODELS • Used K-means Clustering • No of clusters used is 3 (k = 3). • Determines the distance of articles based on a few parameters from the centroid of clusters.

13. DEMO • Web User Interface

14. ANALYSIS

15. TABLEAU ANALYSIS

16. TABLEAU ANALYSIS

17. TABLEAU ANALYSIS

18. CHALLENGES • Formatting data after Web Scraping. • Understanding the variables like keywords, subjectivity. • Finding relation between variables and feature selection for modelling.

19. LINKS • URL – http://sample-env-1.xhmp4ynr7g.us-east-1.elasticbeanstalk.com/ • Github – https://github.com/voraankur/ADS/tree/master/Final%20Project

20. REFERENCES • https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity • https://repositorium.sdum.uminho.pt/bitstream/1822/39169/1/main.pdf

21. CONTRIBUTION • Ankur – Regression Models, Web Interface • Kinjal – Data cleaning, Web Scraping, Clustering, Report • Krutika – Classification Models, Presentation, Tableau Analysis

22. THANK YOU

Online news popularity analysis

Recommandé

Recommandé

Contenu connexe

Similaire à Online news popularity analysis

Similaire à Online news popularity analysis (20)

Dernier

Dernier (20)

Online news popularity analysis