SlideShare une entreprise Scribd logo
1  sur  27
Decoder Ring
           http://decoder-ring.net




Jeff Beeman jeff.beeman@asu.edu @doogiemac
             GLS Conference 2010
Background
• Fall 2009 semester
 • Seminars w/ Jim & Betty
 • Wanted to do some sort of emulation of
    work I had been reading (Gee, Hayes,
    Steinkuehler, Duncan, etc.)
 • Seemed to me the process for doing it
    was painful
Traditional process

                     Copy into         Take notes /
Find content
                     Word docs        hi-light phrases


       Come up w/            Manually transfer
    equations & charts        data to Excel


               (At least how I see it)
Traditional process

                    Copy into        Take notes /
Find content
                    Word docs       hi-light phrases


       Come up w/          Manually transfer
    equations & charts      data to Excel


        Wasting time... and it’s BORING
I’m lazy
• I want to
 • use technology to solve repetitive, boring
    problems for me
  • write something once, use it many times
  • take advantage of work others have
    already done
  • work with a lot of data
Better process
                  Create
Find content
                 importer


               Import content


                  Analyze
                  content

      Get someone else to do this
Initial requirements
• Abstracted, flexible, powerful data model
• Sustainable, low cost, framework
• Web based to facilitate collaboration
• Facilitate importing and browsing large data
  sets
• Automated reporting
Overview
Data model
                Collection
                Name                                     Taxonomy
                Description                              Name


 Post                     User                           Term
 Title                    Username                       Name
 Body                     Avatar                         Description
 Author                   Creation date
 Post date                Attributes (rank, sex, etc.)
 Parent post (optional)
 External identifier


All data normalized into Collections, Posts, Users, Taxonomies
Database-backed




• Reports can be generated on the fly
Database-backed




• Data can be queried and searched
Collaborative




• Multiple projects, multiple contributors
Open source
Getting the content
                                                  Collections

                                                 Posts

                                                  Users


Seems to be the overwhelmingly most difficult part of doing this
work.
Again, I’m lazy

• I have a tool that has a normalized,
  predictable data model.
• I can “scrape” websites or other data sets
  and put them into the data model.
Write once...




 Scrapers / importers
Reduced to as little
   work as possible
• Given a common file format, data is quick
  and easy to import into Decoder Ring
• Bad news: Scrapers need to be written for
  every site
• Good news: They’re very quick to write
  (average 4 - 8 hours each)
Analysis & Reporting




     Content navigation
Analysis & Reporting




      Content editing
Analysis & Reporting
Analysis & Reporting
This is great, but...
•   It’s making things faster, but what does it do
    that’s new?
    •   Collaboration, networking of researchers
    •   Immediate reporting provides insight where
        it may not otherwise be seen
•   Still some difficulties:
    •   How do you effectively communicate how to
        use / apply a taxonomy?
Demo
Todo
•   Per-collection taxonomy visibility
•   Per-collection access control
•   Cross-collection reports
•   Search-based reports (i.e. taxonomy term activity for all
    posts with the word "tutorial")
•   More accurate and faster search (Solr): i.e. All posts with
    "violence" near the words "games OR video games OR
    entertainment"
•   More robust hosting infrastructure (more users,
    collections)
Long-term todo
•   DR could "learn" over time about taxonomies
    and language: i.e. What words commonly
    appear in phrases tagged "scientific learning"?
•   Comparisons with external data: i.e. Thread
    activity corresponding to product release
    announcements (Starcraft II thread)
•   Web-based content import: Once a parser is
    written, the ability to queue up import via the
    DR website

Contenu connexe

Tendances

Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlRafael Alvarado
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Matt Weaver
 
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...D2L Barry
 
Tie presentation 2012
Tie presentation 2012Tie presentation 2012
Tie presentation 2012Erin Abruzzo
 
Drupal: an Overview
Drupal: an OverviewDrupal: an Overview
Drupal: an OverviewMatt Weaver
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Matt Weaver
 

Tendances (11)

Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-html
 
dmBridge & dmMonocle
dmBridge & dmMonocledmBridge & dmMonocle
dmBridge & dmMonocle
 
History and Features of Dropbox
History and Features of DropboxHistory and Features of Dropbox
History and Features of Dropbox
 
E-publishing
E-publishingE-publishing
E-publishing
 
Storing and sharing
Storing and sharingStoring and sharing
Storing and sharing
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
 
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
The Paperless Instructor: Going All Digital in the Classroom at Brightspace I...
 
Tie presentation 2012
Tie presentation 2012Tie presentation 2012
Tie presentation 2012
 
Drupal: an Overview
Drupal: an OverviewDrupal: an Overview
Drupal: an Overview
 
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
Ebooks without Vendors: Using Open Source Software to Create and Share Meanin...
 
Apache Lucene 4
Apache Lucene 4Apache Lucene 4
Apache Lucene 4
 

En vedette

Beyond the interface to the interaction
Beyond the interface to the interactionBeyond the interface to the interaction
Beyond the interface to the interactionDavid Roth
 
LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012Bela Florenthal
 
In e chapter meeting june 22 2010
In e chapter meeting june 22 2010In e chapter meeting june 22 2010
In e chapter meeting june 22 2010Zach Schmidt
 
E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012Bela Florenthal
 
Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010Jeff Beeman
 
DMEF Conference Vodcast Paper Fall 2011
DMEF Conference  Vodcast Paper Fall 2011DMEF Conference  Vodcast Paper Fall 2011
DMEF Conference Vodcast Paper Fall 2011Bela Florenthal
 
Sinónimos y antónimos (1)
Sinónimos y antónimos (1)Sinónimos y antónimos (1)
Sinónimos y antónimos (1)cedalm
 
UX Ukraine: The Kings are Dead
UX Ukraine: The Kings are DeadUX Ukraine: The Kings are Dead
UX Ukraine: The Kings are DeadDavid Roth
 
ASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and ViewsASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and ViewsJeff Beeman
 
ASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and WorkflowASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and WorkflowJeff Beeman
 
Working 5 To 9 Presentation
Working 5 To 9 PresentationWorking 5 To 9 Presentation
Working 5 To 9 PresentationHarriman House
 
SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014Bela Florenthal
 
Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)Nikolay Syusko
 
DrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for PerformanceDrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for PerformanceJeff Beeman
 

En vedette (18)

Beyond the interface to the interaction
Beyond the interface to the interactionBeyond the interface to the interaction
Beyond the interface to the interaction
 
Annualreportfinal
AnnualreportfinalAnnualreportfinal
Annualreportfinal
 
LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012LinkedIn - DMEF Summit 2012
LinkedIn - DMEF Summit 2012
 
In e chapter meeting june 22 2010
In e chapter meeting june 22 2010In e chapter meeting june 22 2010
In e chapter meeting june 22 2010
 
Library advocacy
Library advocacyLibrary advocacy
Library advocacy
 
E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012E-textbooks Presentation Spring 2012
E-textbooks Presentation Spring 2012
 
July slidecast
July slidecastJuly slidecast
July slidecast
 
Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010Drupal at ASU - Drupalcon 2010
Drupal at ASU - Drupalcon 2010
 
DMEF Conference Vodcast Paper Fall 2011
DMEF Conference  Vodcast Paper Fall 2011DMEF Conference  Vodcast Paper Fall 2011
DMEF Conference Vodcast Paper Fall 2011
 
Sinónimos y antónimos (1)
Sinónimos y antónimos (1)Sinónimos y antónimos (1)
Sinónimos y antónimos (1)
 
UX Ukraine: The Kings are Dead
UX Ukraine: The Kings are DeadUX Ukraine: The Kings are Dead
UX Ukraine: The Kings are Dead
 
ASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and ViewsASU DUG - Advanced CCK and Views
ASU DUG - Advanced CCK and Views
 
MMA Green Calendars
MMA  Green CalendarsMMA  Green Calendars
MMA Green Calendars
 
ASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and WorkflowASU DUG Content Access Control and Workflow
ASU DUG Content Access Control and Workflow
 
Working 5 To 9 Presentation
Working 5 To 9 PresentationWorking 5 To 9 Presentation
Working 5 To 9 Presentation
 
SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014SM Index Case EDGE Summit 2014
SM Index Case EDGE Summit 2014
 
Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)Вся боль Рунета из-за вирусов (SNCE 2014)
Вся боль Рунета из-за вирусов (SNCE 2014)
 
DrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for PerformanceDrupalCon Austin: Planning for Performance
DrupalCon Austin: Planning for Performance
 

Similaire à Decoder Ring

Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointJoanne Klein
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivoMarieke Guy
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Sharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: IntroductionSharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: Introduction501 Commons
 
Practical Information Architecture
Practical Information ArchitecturePractical Information Architecture
Practical Information ArchitectureRob Bogue
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycVincent Biret
 
A SharePoint File Migration Framework
A SharePoint File Migration FrameworkA SharePoint File Migration Framework
A SharePoint File Migration FrameworkGerry Brimacombe
 
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Joanne Klein
 
Reference material: Topics or databases?
Reference material: Topics or databases?Reference material: Topics or databases?
Reference material: Topics or databases?Ben Colborn
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Information Architecture Explained
Information Architecture ExplainedInformation Architecture Explained
Information Architecture ExplainedLeigh White
 
How to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User ExperienceHow to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User ExperienceBrightEdge
 
Zero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExZero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExBradley Brown
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 

Similaire à Decoder Ring (20)

Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Navigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePointNavigating the Mess of a Shared drive Migration to SharePoint
Navigating the Mess of a Shared drive Migration to SharePoint
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
 
Single Source Publishing: Utilizing XML and DITA
Single Source Publishing: Utilizing XML and DITASingle Source Publishing: Utilizing XML and DITA
Single Source Publishing: Utilizing XML and DITA
 
DatoConference2015
DatoConference2015DatoConference2015
DatoConference2015
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Sharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: IntroductionSharepoint for Nonprofits: Introduction
Sharepoint for Nonprofits: Introduction
 
Practical Information Architecture
Practical Information ArchitecturePractical Information Architecture
Practical Information Architecture
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnyc
 
A SharePoint File Migration Framework
A SharePoint File Migration FrameworkA SharePoint File Migration Framework
A SharePoint File Migration Framework
 
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
Navigating the mess of a Shared Network Drive Migration to SharePoint - SPS B...
 
Reference material: Topics or databases?
Reference material: Topics or databases?Reference material: Topics or databases?
Reference material: Topics or databases?
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Information Architecture Explained
Information Architecture ExplainedInformation Architecture Explained
Information Architecture Explained
 
How to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User ExperienceHow to SEO a Terrific - and Profitable - User Experience
How to SEO a Terrific - and Profitable - User Experience
 
Zero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApExZero to Sixty with Oracle ApEx
Zero to Sixty with Oracle ApEx
 
Metadata
MetadataMetadata
Metadata
 
Anchor modeling
Anchor modelingAnchor modeling
Anchor modeling
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Decoder Ring

  • 1. Decoder Ring http://decoder-ring.net Jeff Beeman jeff.beeman@asu.edu @doogiemac GLS Conference 2010
  • 2. Background • Fall 2009 semester • Seminars w/ Jim & Betty • Wanted to do some sort of emulation of work I had been reading (Gee, Hayes, Steinkuehler, Duncan, etc.) • Seemed to me the process for doing it was painful
  • 3. Traditional process Copy into Take notes / Find content Word docs hi-light phrases Come up w/ Manually transfer equations & charts data to Excel (At least how I see it)
  • 4. Traditional process Copy into Take notes / Find content Word docs hi-light phrases Come up w/ Manually transfer equations & charts data to Excel Wasting time... and it’s BORING
  • 5. I’m lazy • I want to • use technology to solve repetitive, boring problems for me • write something once, use it many times • take advantage of work others have already done • work with a lot of data
  • 6.
  • 7.
  • 8. Better process Create Find content importer Import content Analyze content Get someone else to do this
  • 9. Initial requirements • Abstracted, flexible, powerful data model • Sustainable, low cost, framework • Web based to facilitate collaboration • Facilitate importing and browsing large data sets • Automated reporting
  • 11. Data model Collection Name Taxonomy Description Name Post User Term Title Username Name Body Avatar Description Author Creation date Post date Attributes (rank, sex, etc.) Parent post (optional) External identifier All data normalized into Collections, Posts, Users, Taxonomies
  • 12. Database-backed • Reports can be generated on the fly
  • 13. Database-backed • Data can be queried and searched
  • 14. Collaborative • Multiple projects, multiple contributors
  • 16. Getting the content Collections Posts Users Seems to be the overwhelmingly most difficult part of doing this work.
  • 17. Again, I’m lazy • I have a tool that has a normalized, predictable data model. • I can “scrape” websites or other data sets and put them into the data model.
  • 18. Write once... Scrapers / importers
  • 19. Reduced to as little work as possible • Given a common file format, data is quick and easy to import into Decoder Ring • Bad news: Scrapers need to be written for every site • Good news: They’re very quick to write (average 4 - 8 hours each)
  • 20. Analysis & Reporting Content navigation
  • 21. Analysis & Reporting Content editing
  • 24. This is great, but... • It’s making things faster, but what does it do that’s new? • Collaboration, networking of researchers • Immediate reporting provides insight where it may not otherwise be seen • Still some difficulties: • How do you effectively communicate how to use / apply a taxonomy?
  • 25. Demo
  • 26. Todo • Per-collection taxonomy visibility • Per-collection access control • Cross-collection reports • Search-based reports (i.e. taxonomy term activity for all posts with the word "tutorial") • More accurate and faster search (Solr): i.e. All posts with "violence" near the words "games OR video games OR entertainment" • More robust hosting infrastructure (more users, collections)
  • 27. Long-term todo • DR could "learn" over time about taxonomies and language: i.e. What words commonly appear in phrases tagged "scientific learning"? • Comparisons with external data: i.e. Thread activity corresponding to product release announcements (Starcraft II thread) • Web-based content import: Once a parser is written, the ability to queue up import via the DR website

Notes de l'éditeur

  1. **** Why scraping data is difficult but possible - Many sites use different terminology and structure for what are essentially similar data types (post vs. discussion vs. thread; user vs. account) - Unpredictable markup on websites -- often BAD markup - Picture of malformed HTML - Creating a generic scraper tool would be sloppy, inaccurate, and error-prone - Fortunately, writing site-specific scrapers is a pretty straight-forward process - Roughly 4 hours per scraper, getting to be less as I gain more experience