Project IDI PPT

Project IDI
David I Widjaja
Steps
 Data Extraction
 Tagging
 Correlation
 Web Scraping
 Comparison
 Documentation
Data Extraction
 How to get the data?
 Input from database
 Input manually
 Data type:
 Topics that is made of strings
Tagging
 Prerequisite:
 Topic Sentences (Subject)
 Dictionary (Tags)
Dictionary
 How to create tags:
1. Get all topic sentences and split them between white space
2. Convert all words into lower case
3. Delete all numeric and duplicate values
4. Sort words alphabetically
5. Delete unnecessary words (e.g. is, the, and, etc.)
6. Search for synonym words and cluster them into a single tag
7. Translate words if necessary
8. Insert tags into main spreadsheet
Correlation
 A weighted graph map is used:
 The larger the amount of word associated
with the tag, the bigger the bubble.
 Lines get thicker according to the number
of relationship between topics.
Web Scraping
 Web Scraping on other similar websites
 Take the topic sentences to be in the
subject columns. Examples:
 Article Titles
 Comments
 Etc.
 Copy to previous spreadsheet (The one with
the pervious tags).
Correlation
 Do the same process as before on
the weighted graph map
Comparison
 Compare the two weighted graph maps
Word Cloud
 Generate Word Cloud using Python or online tools.
e.g.
Tools
 Microsoft Excel 2013 (Spreadsheet)
 Mozilla Firefox (Browser)
 Inspect Element (Search Patterns)
 DownThemAll (Download HTMLs)
 Total Commander (Merge HTMLs)
 Notepad++ (Cleanse Data)
1 sur 11

Recommandé

How to convert MBOX data into PST, EML, NSF & MSG par
How to convert MBOX data into PST, EML, NSF & MSGHow to convert MBOX data into PST, EML, NSF & MSG
How to convert MBOX data into PST, EML, NSF & MSGBen Tyson
280 vues10 diapositives
Mail merge in MS word MobView par
Mail merge in MS word MobViewMail merge in MS word MobView
Mail merge in MS word MobViewManik Bhola
1.6K vues6 diapositives
Computer practical par
Computer practicalComputer practical
Computer practicalMd Yeakub Hossain
363 vues7 diapositives
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul... par
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Ansgar Scherp
68 vues21 diapositives
Data par
DataData
Datadharvey100
322 vues8 diapositives
Learn Latex par
Learn LatexLearn Latex
Learn LatexHaitham El-Ghareeb
2K vues27 diapositives

Contenu connexe

Tendances

Mailmerge par
Mailmerge Mailmerge
Mailmerge Ankita Shirke
140 vues3 diapositives
Res 811 wk 4 comparison matrix , outline and paper par
Res 811 wk 4 comparison matrix , outline and  paperRes 811 wk 4 comparison matrix , outline and  paper
Res 811 wk 4 comparison matrix , outline and paperLaynebaril
13 vues2 diapositives
Dataset reuse: An analysis of references in community discussions, publicatio... par
Dataset reuse: An analysis of references in community discussions, publicatio...Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...Kemele M. Endris
885 vues18 diapositives
IS411 Research Training 2011 par
IS411 Research Training 2011IS411 Research Training 2011
IS411 Research Training 2011weixiasmu
386 vues42 diapositives
Session 1 par
Session 1Session 1
Session 1Uma Maheshwari
127 vues14 diapositives
Business Journalism Professors 2014: Excel for Journalists by Steve Doig par
Business Journalism Professors 2014: Excel for Journalists by Steve DoigBusiness Journalism Professors 2014: Excel for Journalists by Steve Doig
Business Journalism Professors 2014: Excel for Journalists by Steve DoigReynolds Center for Business Journalism
2.4K vues24 diapositives

Tendances(20)

Res 811 wk 4 comparison matrix , outline and paper par Laynebaril
Res 811 wk 4 comparison matrix , outline and  paperRes 811 wk 4 comparison matrix , outline and  paper
Res 811 wk 4 comparison matrix , outline and paper
Laynebaril13 vues
Dataset reuse: An analysis of references in community discussions, publicatio... par Kemele M. Endris
Dataset reuse: An analysis of references in community discussions, publicatio...Dataset reuse: An analysis of references in community discussions, publicatio...
Dataset reuse: An analysis of references in community discussions, publicatio...
Kemele M. Endris885 vues
IS411 Research Training 2011 par weixiasmu
IS411 Research Training 2011IS411 Research Training 2011
IS411 Research Training 2011
weixiasmu386 vues
Database Management System par Muhd Dembo
Database Management SystemDatabase Management System
Database Management System
Muhd Dembo111 vues
The future of scholarly communications professionals par Nancy Pontika
The future of scholarly communications professionalsThe future of scholarly communications professionals
The future of scholarly communications professionals
Nancy Pontika777 vues
CSPro Workshop P-3 par prabhustat
CSPro Workshop P-3CSPro Workshop P-3
CSPro Workshop P-3
prabhustat1.9K vues
Applications: Word-Processing, Spreadsheet & Database par Alaa Sadik
Applications: Word-Processing, Spreadsheet & DatabaseApplications: Word-Processing, Spreadsheet & Database
Applications: Word-Processing, Spreadsheet & Database
Alaa Sadik33.4K vues
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson par Talis
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis96 vues

En vedette

Seguridad ciudadana par
Seguridad ciudadanaSeguridad ciudadana
Seguridad ciudadanakeiko alfaro lima
350 vues22 diapositives
Manejo de seguridad en internet (13) par
Manejo de seguridad en internet (13)Manejo de seguridad en internet (13)
Manejo de seguridad en internet (13)jaquelinne yoanna ruiz achury
254 vues6 diapositives
ADVTS DESIGNED BY MR SINHA par
ADVTS DESIGNED BY MR SINHAADVTS DESIGNED BY MR SINHA
ADVTS DESIGNED BY MR SINHASunil Sinha
242 vues7 diapositives
Arizuma tradezone private limited par
Arizuma tradezone private limitedArizuma tradezone private limited
Arizuma tradezone private limitedNayan Singh
159 vues29 diapositives
Manage your sales with Hitachi Solutions Ecommerce par
Manage your sales with Hitachi Solutions EcommerceManage your sales with Hitachi Solutions Ecommerce
Manage your sales with Hitachi Solutions EcommerceHitachi Solutions America, Ltd.
384 vues42 diapositives
Creating Discounts & Promotions with Hitachi Solutions Ecommerce par
Creating Discounts & Promotions with Hitachi Solutions EcommerceCreating Discounts & Promotions with Hitachi Solutions Ecommerce
Creating Discounts & Promotions with Hitachi Solutions EcommerceHitachi Solutions America, Ltd.
317 vues15 diapositives

En vedette(20)

ADVTS DESIGNED BY MR SINHA par Sunil Sinha
ADVTS DESIGNED BY MR SINHAADVTS DESIGNED BY MR SINHA
ADVTS DESIGNED BY MR SINHA
Sunil Sinha242 vues
Arizuma tradezone private limited par Nayan Singh
Arizuma tradezone private limitedArizuma tradezone private limited
Arizuma tradezone private limited
Nayan Singh159 vues
Projeto integrador Historia da Computação Grupo 5 par Bernardo Citelis
Projeto integrador Historia da Computação Grupo 5Projeto integrador Historia da Computação Grupo 5
Projeto integrador Historia da Computação Grupo 5
Bernardo Citelis380 vues

Similaire à Project IDI PPT

Mail Merge - the basics par
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basicskprentice
11.9K vues56 diapositives
Presentation par
PresentationPresentation
PresentationXiaoyu Chen
388 vues54 diapositives
Open Office Writer : Level2 par
Open Office Writer : Level2 Open Office Writer : Level2
Open Office Writer : Level2 thinkict
611 vues24 diapositives
Document databases par
Document databasesDocument databases
Document databasesQframe
1.3K vues32 diapositives
Unit08 dbms par
Unit08 dbmsUnit08 dbms
Unit08 dbmsarnold 7490
3.4K vues45 diapositives
The Duet model par
The Duet modelThe Duet model
The Duet modelBhaskar Mitra
2.6K vues29 diapositives

Similaire à Project IDI PPT(20)

Mail Merge - the basics par kprentice
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basics
kprentice11.9K vues
Open Office Writer : Level2 par thinkict
Open Office Writer : Level2 Open Office Writer : Level2
Open Office Writer : Level2
thinkict611 vues
Document databases par Qframe
Document databasesDocument databases
Document databases
Qframe1.3K vues
Survey of Generative Clustering Models 2008 par Roman Stanchak
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
Roman Stanchak67 vues
12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx par aulasnilda
12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx
12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx
aulasnilda1 vue
12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx par drennanmicah
12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx
12 FOR CONFERENCE-RELATED PAPERS, REPLACE THIS LINE WITH YO.docx
drennanmicah3 vues
Part 1Select a dataset. It can be five to six articles, or a.docx par smile790243
Part 1Select a dataset. It can be five to six articles, or a.docxPart 1Select a dataset. It can be five to six articles, or a.docx
Part 1Select a dataset. It can be five to six articles, or a.docx
smile7902432 vues
Week 1 Assignment InstructionsGOAL Create the initial element o.docx par jessiehampson
Week 1 Assignment InstructionsGOAL Create the initial element o.docxWeek 1 Assignment InstructionsGOAL Create the initial element o.docx
Week 1 Assignment InstructionsGOAL Create the initial element o.docx
jessiehampson7 vues
Lab 1 Creating a Database Design Due Week 3 and worth 75 points.docx par ssuser47f0be
Lab 1 Creating a Database Design Due Week 3 and worth 75 points.docxLab 1 Creating a Database Design Due Week 3 and worth 75 points.docx
Lab 1 Creating a Database Design Due Week 3 and worth 75 points.docx
ssuser47f0be2 vues
MS Access Ch 2 PPT par prsmith72
MS Access Ch 2 PPTMS Access Ch 2 PPT
MS Access Ch 2 PPT
prsmith724.1K vues
Creating and editing a database par crystalpullen
Creating and editing a databaseCreating and editing a database
Creating and editing a database
crystalpullen2.4K vues
Project Deliverable 3 Database and Data Warehousing DesignThi.docx par denneymargareta
Project Deliverable 3 Database and Data Warehousing DesignThi.docxProject Deliverable 3 Database and Data Warehousing DesignThi.docx
Project Deliverable 3 Database and Data Warehousing DesignThi.docx
Project Deliverable 3 Database and Data Warehousing Design   Th.docx par denneymargareta
Project Deliverable 3 Database and Data Warehousing Design   Th.docxProject Deliverable 3 Database and Data Warehousing Design   Th.docx
Project Deliverable 3 Database and Data Warehousing Design   Th.docx
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx par ambersalomon88660
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docxWK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
WK8_A2 OverviewAssignment 2 Excelling with ExcelDue Week 8 an.docx
Searching Repositories of Web Application Models par Marco Brambilla
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
Marco Brambilla552 vues
RPE - Template formating, style and stylesheet usage par GEBS Reporting
RPE - Template formating, style and stylesheet usageRPE - Template formating, style and stylesheet usage
RPE - Template formating, style and stylesheet usage
GEBS Reporting11.3K vues

Project IDI PPT

  • 2. Steps  Data Extraction  Tagging  Correlation  Web Scraping  Comparison  Documentation
  • 3. Data Extraction  How to get the data?  Input from database  Input manually  Data type:  Topics that is made of strings
  • 4. Tagging  Prerequisite:  Topic Sentences (Subject)  Dictionary (Tags)
  • 5. Dictionary  How to create tags: 1. Get all topic sentences and split them between white space 2. Convert all words into lower case 3. Delete all numeric and duplicate values 4. Sort words alphabetically 5. Delete unnecessary words (e.g. is, the, and, etc.) 6. Search for synonym words and cluster them into a single tag 7. Translate words if necessary 8. Insert tags into main spreadsheet
  • 6. Correlation  A weighted graph map is used:  The larger the amount of word associated with the tag, the bigger the bubble.  Lines get thicker according to the number of relationship between topics.
  • 7. Web Scraping  Web Scraping on other similar websites  Take the topic sentences to be in the subject columns. Examples:  Article Titles  Comments  Etc.  Copy to previous spreadsheet (The one with the pervious tags).
  • 8. Correlation  Do the same process as before on the weighted graph map
  • 9. Comparison  Compare the two weighted graph maps
  • 10. Word Cloud  Generate Word Cloud using Python or online tools. e.g.
  • 11. Tools  Microsoft Excel 2013 (Spreadsheet)  Mozilla Firefox (Browser)  Inspect Element (Search Patterns)  DownThemAll (Download HTMLs)  Total Commander (Merge HTMLs)  Notepad++ (Cleanse Data)