SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Data Design
                                                           2114.409: Creative Research Practice




HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
Reflection
Status Check



Concerns

 Programming

 What can we build




                     HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
Course Outline
1. Foundations                 3. Prototyping
Introduction                   Crawling
Survey Methods / Data Mining   Text Mining
Visualization and Analysis     To be determined (TBD)
Social Mechanics               Project Update




2. Methods                     4. Refinement
Creativity and Brainstorming   TBD x3
Prototyping                    Project Presentations
Project Management             Reflection
Last Week: Building Blocks
    Clustering



   Classification
   & Regression


   Association
     Rules


     Outlier
    Detection
                   HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
This Week: Systems




HTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
Data Mining Overview
How do I see and
                        Visualization, Storytelling
communicate answers?


What questions should
                        Design, Data Exploration
I ask of the data?

How do I clean and
                        Analysis Techniques
process the data?

How do I gather
                        Crawling, Surveys, UX Design
meaningful data?
Why might we prefer analysis?

         LABOR                       ACCURACY
Too many pictures to look at.   Can test for statistical
                                significance, etc.
Don’t know which are
interesting.                    Some patterns don’t
                                visualize easily.




                                         HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
Clustering
Find natural
groupings in
the data



Organize data into classes:

‣ high intra-class similarity
‣ low inter-class similarity
Clustering
         Input Data                  Output Clusters



  Points                                           Hard
                                              OR



    OR




                                       Soft
Similarities                                  OR




         [ # of clusters ]              Hierarchical
Classification               Regression




Learn to map objects to   Learn map objects to
categories                continuous variables
Classification
Observations    X   Learn         f(x) = y
Labels          Y
                     Y = gender


 Male




Female
                                       X = height
The Whole Process
                     Data Set
                                Featurization



                   Featurized

                  Random Split (e.g. 90/10)



Training Data                                   Test Data
       Training



   Model
                          Evaluation




                      Results
Association Rules
Learn interesting
relations in the data




                        = proportion of events in which X occurs
Anomaly Detection

          Detect strange
          events in the data


            Simplest measure:
What Can
                                                  We Build?




HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective Intelligence
Clicks,)      Likes,)      Updates,)   Ar,cles,)
Scrolls,)     Links,)      Reviews,)   Images,)
 Time)       Checkins)    Comments)     Video)




                   Collec,ve)            How can we harness the
                  Intelligence)
                                         activities of the world’s digital
                                         citizens to build new and
                                         useful consumer services?


                  Community)
Politics




The Korean elections are coming. How
does the Internet tell us more than
traditional polling ever could?
Politics




What issues are important?
Who are the influencers?
How can we segment/characterize support groups?
How do we spread our opinions more widely?
Who will win the election?
How can we build this?

 “Can social
media predict
  election
outcomes?”
 HTTP://WWW.USATODAY.COM/TECH/
 NEWS/STORY/2012-03-05/SOCIAL-
   SUPER-TUESDAY-PREDICTION/
          53374536/1
Tweet       Insert Magic
 Author
  Date         Here?
 Body
Retweets
Hashtags                                    Prediction
                                             Candidate
                                              Location
                          Classification &
Author      Clustering
                            Regression         Score
 Profile                                      Confidence
 Tweets
Favorites
Following
Followers   Association      Outlier
Location      Rules         Detection
Workshop
Sentiment +
                         Candidate              System Overview

Tweet Inputs



                                                         Correction based
                                      Scoring
                                                         on past elections



               Refinements




Author Inputs




                                                       RMSE Evaluation
Sentiment Detail
Input Observation   Feature Extractor



                                                          Classifier                 Output Label




                                                                                              Confusion Matrix
                                                                                                 Evaluation


                                        N-Gram Features




                                                                 Training Process



   Tweet + Label
Entertainment                                                              Food                                           Movements



            HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/       HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/         HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/




        Collaboration                                                   Shopping                                                        Travel



                    HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/       HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/      HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/




                Investing                                                Medicine                                                         Trust


HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/
           HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/   HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/    HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
Homework: Data Mining
1. Form groups!

2. Choose a Collective Intelligence topic from
   Lecture 1, or propose similar.

3. Make a list of data sources that might
   provide insights to that topic.

4. Propose a set of meaningful questions about
   the data based on your intuition.

5. How would you have to clean/process your
   data to start answering those questions?

6. Consider clustering, association rules,
   anomaly detection, classification. For each
   technique, how might you apply it to the
   data and what would it show?

7. Document your work and be prepared to
   present.
                                                 HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
Feedback

Contenu connexe

Similaire à Data Design and Creative Research Practice

Andy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...European Data Forum
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...Amazon Web Services
 
JoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies
 
Future of test automation tools & infrastructure
Future of test automation tools & infrastructureFuture of test automation tools & infrastructure
Future of test automation tools & infrastructureAnand Bagmar
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتMohammed El Rafie Tarabay
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for DevelopersNeo4j
 
Measuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchMeasuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchBeth Kanter
 
2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design DecadeJustin Lee
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics Peter Wren-Hilton
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 

Similaire à Data Design and Creative Research Practice (20)

Andy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 PresentationAndy Kirk Malofiej 20 Presentation
Andy Kirk Malofiej 20 Presentation
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
EDF2013: Selected Talk: Allan Hanbury: Algorithm any good? A Cloud-based Infr...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
[AWS LA Media & Entertainment Event 2015]: Cloud Analytics for Audience Engag...
 
JoTechies -Azure Machine Learning
JoTechies -Azure Machine LearningJoTechies -Azure Machine Learning
JoTechies -Azure Machine Learning
 
Future of test automation tools & infrastructure
Future of test automation tools & infrastructureFuture of test automation tools & infrastructure
Future of test automation tools & infrastructure
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
Graph Algorithms for Developers
Graph Algorithms for DevelopersGraph Algorithms for Developers
Graph Algorithms for Developers
 
Measuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book LaunchMeasuring the Networked Nonprofit Book Launch
Measuring the Networked Nonprofit Book Launch
 
Sai kiran goud sem.ppt
Sai kiran goud sem.pptSai kiran goud sem.ppt
Sai kiran goud sem.ppt
 
2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade2011/06/21 Microsoft Developer Day 2011—Design Decade
2011/06/21 Microsoft Developer Day 2011—Design Decade
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Classification
ClassificationClassification
Classification
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 

Plus de Michael Shilman

Controlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoControlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoMichael Shilman
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Michael Shilman
 
Seungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingSeungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingMichael Shilman
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine LearningMichael Shilman
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionMichael Shilman
 

Plus de Michael Shilman (7)

Project Management
Project ManagementProject Management
Project Management
 
Controlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong ZhaoControlled Experiments - Shengdong Zhao
Controlled Experiments - Shengdong Zhao
 
Iterative Prototyping
Iterative PrototypingIterative Prototyping
Iterative Prototyping
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!
 
Seungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and MatchingSeungwon Hwang: Entity Graph Mining and Matching
Seungwon Hwang: Entity Graph Mining and Matching
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine Learning
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: Introduction
 

Dernier

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Dernier (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Data Design and Creative Research Practice

  • 1. Data Design 2114.409: Creative Research Practice HTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  • 2. Reflection Status Check Concerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  • 3. Course Outline 1. Foundations 3. Prototyping Introduction Crawling Survey Methods / Data Mining Text Mining Visualization and Analysis To be determined (TBD) Social Mechanics Project Update 2. Methods 4. Refinement Creativity and Brainstorming TBD x3 Prototyping Project Presentations Project Management Reflection
  • 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  • 6. Data Mining Overview How do I see and Visualization, Storytelling communicate answers? What questions should Design, Data Exploration I ask of the data? How do I clean and Analysis Techniques process the data? How do I gather Crawling, Surveys, UX Design meaningful data?
  • 7. Why might we prefer analysis? LABOR ACCURACY Too many pictures to look at. Can test for statistical significance, etc. Don’t know which are interesting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  • 8. Clustering Find natural groupings in the data Organize data into classes: ‣ high intra-class similarity ‣ low inter-class similarity
  • 9. Clustering Input Data Output Clusters Points Hard OR OR Soft Similarities OR [ # of clusters ] Hierarchical
  • 10. Classification Regression Learn to map objects to Learn map objects to categories continuous variables
  • 11. Classification Observations X Learn f(x) = y Labels Y Y = gender Male Female X = height
  • 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10) Training Data Test Data Training Model Evaluation Results
  • 13. Association Rules Learn interesting relations in the data = proportion of events in which X occurs
  • 14. Anomaly Detection Detect strange events in the data Simplest measure:
  • 15. What Can We Build? HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  • 16. Collective Intelligence Clicks,) Likes,) Updates,) Ar,cles,) Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  • 17. Politics The Korean elections are coming. How does the Internet tell us more than traditional polling ever could?
  • 18. Politics What issues are important? Who are the influencers? How can we segment/characterize support groups? How do we spread our opinions more widely? Who will win the election?
  • 19. How can we build this? “Can social media predict election outcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  • 20. Tweet Insert Magic Author Date Here? Body Retweets Hashtags Prediction Candidate Location Classification & Author Clustering Regression Score Profile Confidence Tweets Favorites Following Followers Association Outlier Location Rules Detection
  • 22. Sentiment + Candidate System Overview Tweet Inputs Correction based Scoring on past elections Refinements Author Inputs RMSE Evaluation
  • 23. Sentiment Detail Input Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  • 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine Trust HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  • 25. Homework: Data Mining 1. Form groups! 2. Choose a Collective Intelligence topic from Lecture 1, or propose similar. 3. Make a list of data sources that might provide insights to that topic. 4. Propose a set of meaningful questions about the data based on your intuition. 5. How would you have to clean/process your data to start answering those questions? 6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show? 7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/