Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Online news popularity analysis

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 22 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Online news popularity analysis (20)

Publicité

Online news popularity analysis

  1. 1. WEB ANALYTICS - ONLINE NEWS POPULARITY TEAM – 11 KRUTIKA DEDHIA KINJAL GADA ANKUR VORA ADVANCES IN DATA SCIENCES AND ARCHITECTURE - PROF. SRIKANTH KRISHNAMURTHY
  2. 2. INTRODUCTION • The dataset summarizes a set of features about articles published by Mashable, a well-known news website over a period of two years. • The objective is to predict the number of shares depending on the features if the article to be published would be popular on the internet or no.
  3. 3. GOALS • Create and evaluate regression, classification and clustering models in Microsoft Azure Machine Learning Studio. • Deploy the models as a web service to generate a REST API. • Build the interactive web interface to predict the results.
  4. 4. DATASET • Data Source : UCI ML Repository https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity • Number of attributes: 61 • Number of records: 39,645 • Dependent variable: Number of shares
  5. 5. DATA MODIFICATION • Type of Data : 1 – business, 2 – lifestyle, 3 – entertainment, 4 - social media, 5 – technology, 6 – world • Extracted the date from the URL column. • Day of week : 0 – Sunday, 1 – Monday, 2 – Tuesday, 3 – Wednesday, 4 – Thursday, 5 – Friday, 6 – Saturday • Web Scraping : Topics, Channel, Author
  6. 6. PROCESS • Created training models for regression, classification and clustering in Azure ML. • Created predictive experiment for the above trained models. • Deployed the models as a web service and generated a REST API. • Designed UI using Java Spring MVC, HTML, Bootstrap, Ajax along with user validations.
  7. 7. MACHINE LEARNING ALGORITMHS
  8. 8. REGRESSION MODELS • Used Azure ML regression modules • Decision Forest, Neural Network, Poisson Regression and Boosted Decision Tree • Best Model: Random Forest based on lowest RMSE value
  9. 9. RANDOM FOREST
  10. 10. CLASSIFICATION MODELS • Used Azure ML classification components Two Class Decision Forest, Two Class Neural Network and Two Class Boosted Decision Tree • Added attribute isPopular : • Shares <= 1400 : high popular • Shares > 1400 : less popular • Best Model : Two Class Boosted Decision Tree Based on the high Accuracy and AUC value
  11. 11. TWO CLASS BOOSTED DECISION CLASSIFICATION
  12. 12. CLUSTERING MODELS • Used K-means Clustering • No of clusters used is 3 (k = 3). • Determines the distance of articles based on a few parameters from the centroid of clusters.
  13. 13. DEMO • Web User Interface
  14. 14. ANALYSIS
  15. 15. TABLEAU ANALYSIS
  16. 16. TABLEAU ANALYSIS
  17. 17. TABLEAU ANALYSIS
  18. 18. CHALLENGES • Formatting data after Web Scraping. • Understanding the variables like keywords, subjectivity. • Finding relation between variables and feature selection for modelling.
  19. 19. LINKS • URL – http://sample-env-1.xhmp4ynr7g.us-east-1.elasticbeanstalk.com/ • Github – https://github.com/voraankur/ADS/tree/master/Final%20Project
  20. 20. REFERENCES • https://archive.ics.uci.edu/ml/datasets/Online+News+Popularity • https://repositorium.sdum.uminho.pt/bitstream/1822/39169/1/main.pdf
  21. 21. CONTRIBUTION • Ankur – Regression Models, Web Interface • Kinjal – Data cleaning, Web Scraping, Clustering, Report • Krutika – Classification Models, Presentation, Tableau Analysis
  22. 22. THANK YOU

×