SlideShare une entreprise Scribd logo
1  sur  38
Department of Commerce App
        Challenge: Big Data Dashboards
          International Open Government Data Conference: Virtual Conference
          Best Practices From Around the World in Putting Data to Work
                                        Dr. Brand Niemann
                    Director and Senior Enterprise Architect – Data Scientist
                                       Semantic Community
                                  http://semanticommunity.info/
                                     AOL Government Blogger
                          http://gov.aol.com/bloggers/brand-niemann/
                  April 27, 2012. Updated April 30, 2012. Updated July 7, 2012.

http://semanticommunity.info/AOL_Government/2012_International_Open_Government_Data_Conference
http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge

                                                                                             1
International Open Government Data
       Conference: Virtual Conference
•   Questions to ask each presenter to supply afterwards for a directory - are you
    doing these things?
     – The way to document the public benefits with Open Data is to be able to answer the points
       below:
•   OPEN DATA
     – O: Not previously Open to the public (lots of the "Open data" has already been available and is
       just being re-advertised)
     – P: Serves a Purpose (there is a reason the data was collected that clearly serves a real purpose
       - e.g. Congressional redistricting)
     – E: Educates citizens and politicians to take action (results that provide a valid basis for action)
     – N: Made Newsworthy by journalists (results are communicated objectively and effectively)
     – D: The plural of Dataum - something given or admitted especially as a basis for reasoning or
       inference
     – A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with
     – T: Transparent (see where the data came from, how it was analyzed, where the results came
       from, etc.)
     – A: Answers questions posed by the above




                                                                                                         2
Open Data Example
•   O: Not previously Open to the public (lots of the "Open data" has already been available and is just
    being re-advertised)
     –   EPA Envirofacts Warehouse APIs (slow large queries and bulk downloads before)
•   P: Serves a Purpose (there is a reason the data was collected that clearly serves a real purpose - e.g.
    Congressional redistricting)
     –   EPA Envirofacts data is Congressionally mandated for protection of human health and welfare
•   E: Educates citizens and politicians to take action (results that provide a valid basis for action)
     –   EPA Envirofacts Web Site (over 2500 Web pages)
•   N: Made Newsworthy by journalists (results are communicated objectively and effectively)
     –   My AOL Government Story is one of many such efforts
•   D: The plural of Dataum - something given or admitted especially as a basis for reasoning or
    inference
     –   EPA has data standards and quality assurance methods for these data
•   A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with
     –   Yes
•   T: Transparent (see where the data came from, how it was analyzed, where the results came from,
    etc.)
     –   Yes, metadata is provided and combined with the new data APIs
•   A: Answers questions posed by the above
     –   See my AOL Government Story with summary results as one of many such efforts




                                                                                                           3
Beautiful Spreadsheet Data for EPA Envirofacts
   Warehouse Metadata and API Dashboard
• Built for my former EPA CIO, Malcolm Jackson (a mobile app - iPad)
• Always wanted to do since my early days in the EPA Data Standards
  Branch (2000-2002)
• Built a beautiful spreadsheet for public use and Spotfire application
• The format is both linked metadata and linked data
• Search all the metadata and get API data (but for only 9 of 13
  systems and for only 5000 rows at a time)
• Find key fields for data integration and build many apps
• Metadata results:
    –   Models: 15
    –   Tables: 227
    –   Rows: 2518
    –   Types: 40
    –   Columns (Data Elements): 1662

                                                                          4
Beautiful Spreadsheet Data for EPA Envirofacts
  Warehouse Metadata and API Dashboard




                  Web Player                     5
Data Science Analytics for 2012 IOGDC
“More data beats clever algorithms but
better data beats more data.” Monica
                                         • IOGDC Conference
Rogati @ Strata 2012                       Knowledge Bases
                                         • IOGDS Catalog Data
                                           Sets
                                         • IOGDS Data Analytics
                                           with BI Tools
                                           – Exploiting Linked Data
                                             with Business
                                             Intelligence Tools
                                              • Acknowledgement:
                                                Kingsley Idehen, CEO,
                                                OpenLink Software

                                                                        6
Data Science Analytics for 2012 IOGDC
           2012 IOGDC Knowledge Bases




                   Web Player
                                        7
Data Science Analytics for 2012 IOGDC
            IOGDS Catalog Data Sets




                  Web Player          8
Data Science Analytics for 2012 IOGDC
          IOGDS Data Analytics with BI Tools




                   Web Player
                                               9
An Information Platform
• An Information Platform is the critical
  infrastructure component for building a Learning
  Organization. The most critical human
  component for accelerating the learning process
  and making use of the Information Platform is
  taking the shape of a new role: the Data Scientist.
   – Jeff Hammerbacher, in Chapter 5: Information
     Platforms and the Rise of the Data Scientist in the His
     Book “Beautiful Data” (July 2009) (see Linked Data
     reference below)
  http://semanticommunity.info/AOL_Government/Beautiful_Data#Information_Platforms_As_Dataspaces
                                                                                               10
Jeff Hammerbacker
• The number two data scientist in the world, according to Tim
  O’Reilly, is Jeff Hammerbacker, who built the data science team at
  Facebook and is now at Cloudera, driving the success of Hadoop as
  a standard tool for processing large, unstructured data sets with a
  network of commodity computers. Jeff also teaches ”Introduction
  to Data Science”, at UC Berkeley, and in his opening lecture
  organizes reason's for doing so into three parts as follows:
   – 1. Personal - Jeff's training and job experiences
   – 2. Putting Data to Work - Theme of the 2012 International Open
     Government Data Conference
   – 3. The Emergence of Data Science - Dominate theme of future
     conferences according to Robert Ames, Senior VP for Technology at In-
     Q-Tel, at the FCW Executive Briefing on Big Data and the Government
     Enterprise, June 21, 2012

   http://www.forbes.com/pictures/lmm45emkh/tim-oreilly-is-the-founder-of-oreily-media/#gallerycontent

                                                                                                    11
My Mission Statement
• 1. Personal:
    – Senior Data Scientist at the US EPA:
         • Completed Data Science Academic Training and Many EPA Data Products
    – Detail to Data.gov:
         • Built Data.gov in An Information Platform
• 2. Putting Data To Work:
    – Data Journalist for Federal Computer Week and AOL Government:
         • Published Many Data Science Products and Built Own Data Journalism Handbook
    – Data as a First Class Citizen: Data Science and Journalism for Analytic
      Standards and Audit of Open Data Sites:
         • Working with CKAN, DoD, IC, NCOIC, NIST, OASIS, OMG, OSTP, W3C, etc.
• 3. The Emergence of Data Science:
    – Built a Data Science Team for the Government Community:
         • “Killer Semantic Web Application” (Semantic MedLine on the new Cray Graph Computer)
           for the Federal Big Data Senior Steering Group
    – Challenges and Contests Using the Best High Quality Data Sets:
         • Heritage Provider Network Health Prize, Health Data Initiative Forums, TedMed,
           Department of Commerce App Challenge, etc.

                                                                                            12
Data Scientist
• A data scientist is a job title for an employee or business intelligence (BI)
  consultant who excels at analyzing data, particularly large amounts of
  data, to help a business gain a competitive edge.

• The title data scientist is sometimes disparaged because it lacks specificity
  and can be perceived as an aggrandized synonym for data analyst.
  Regardless, the position is gaining acceptance with large enterprises who
  are interested in deriving meaning from big data, the voluminous amount
  of structured, unstructured and semi-structured data that a large
  enterprise produces.

• A data scientist possesses a combination of analytic, machine learning,
  data mining and statistical skills as well as experience with algorithms and
  coding. Perhaps the most important skill a data scientist possesses,
  however, is the ability to explain the significance of data in a way that can
  be easily understood by others.

   Source: http://searchbusinessanalytics.techtarget.com/definition/Data-scientist
                                                                                     13
Dr. Brand Niemann
• Former Senior
  Enterprise Architect and
  Data Scientist, US
  Environmental
  Protection Agency
  (1980-2010).
• Current
  Husband, Father, and
  Grandfather Enjoying
  the Golden Years!
                                14
Semantic Community
• Our Mantra is: Data Science Precedes the Use of SOA,
  Cloud, and Semantic Technologies! We use data science to
  help marketing and business development efforts.
• Our Mission is like Googles: Organize the world’s
  information and make it universally accessible and useful.
• Our Method is like Be Informed 4: Architectural Diagrams
  and Questions and Answers are not enough, you need
  Dynamic Case Management!
• Our Sound Byte: It is not just where you put your data
  (cloud), but how you put it there!
• Our Work: Semantically enhancing your data and writing
  data science stories about it.


                                                               15
Introduction
• I heard about this several months ago, but put it off until
  yesterday. I finished it today because I am a very good Data
  Scientist!
• Well I almost finished it. I need the Patent data in a format
  that I can more readily work with and I am in
  communication with the USPTO about that.
• I create Knowledge Bases about my Data Science work so
  others can follow what I do and even reproduce it
  themselves. My apps also work on mobile devices like
  iPads.
• My goal was, and still is, to create a set of multiple
  interactive dashboards of DoC data like they have
  for Foreign Trade.

                                                              16
Data Science Knowledge Base




http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge
                                                                                   17
Data Science Spreadsheet




http://semanticommunity.info/@api/deki/files/17946/=DoCApp.xlsx
                                                                  18
Spotfire Dashboards
• U.S. Census Bureau Geographic Names
  Information System
• U.S. International Trade in Goods and Services
• Data.Gov Data Catalog for US Department of
  Commerce
• U.S. Bureau of Economic Analysis
• U.S. Patent & Trademark Office


                                               19
U.S. Census Bureau Geographic Names
         Information System




               Web Player
                                  20
U.S. International Trade in
   Goods and Services




          Web Player
                              21
Data.Gov Data Catalog for US
 Department of Commerce




           Web Player
                               22
U.S. Bureau of Economic Analysis




            Web Player
                                   23
U.S. Patent & Trademark Office
• Methodology:
   – Overview: Apply Gall's Law and start with the end in mind (Mashups
     and Decision Support) and work out the details in a simple and small
     content example for my next AOL Government Story! Give everything
     a well-defined URL for a semantically enhanced index in a Dashboard
     (see next slide).
       • 1. Follow Gall's Law which says: "A complex system that works is invariably
         found to have evolved from a simple system that worked. The inverse
         proposition also appears to be true: a complex system designed from scratch
         never works and cannot be made to work. You have to start over, beginning
         with a simple system." - John Gall, systems theorist
       • 2. Copy to MindTouch and add structure to the Web Pages
            – See
              http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Chall
              enge/DOC_USPTO_Apps_for_Innovation
       • 3. Look at one ZIP file under each section and subsection to see what it
         contains and how to use it in MindTouch (in process)
            – See
              http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Chall
              enge/DOC_USPTO_Apps_for_Innovation/Electronic_Data_Products

                                                                                        24
U.S. Patent & Trademark Office




             Web Player
                                 25
MindTouch
          DoC USPTO Apps for Innovation




http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation
                                                                                                            26
MindTouch
                         Electronic Data Products




http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation/Electronic_Data_Products

                                                                                                                                   27
Work Plan in Process
 •     Mash-Ups:
        – Combine USPTO applicant/inventor information with other USPTO datasets (e.g., with USPTO
          assignments (ownership) data):
                •   Google or USPTO Daily and USPTO Retro
        – Combine USPTO patent grants and patent application publications with other DOC data (e.g.,
          Census or Economic data)
 •     Innovative Ideas:
        – Homogenize the patent grant bibliographic text data (i.e., make it all the same format).
        – Same for the patent application publication bibliographic data.
        – Capture patent grant bibliographic text data from 1790 to 1975 using the image data.
        – Build a text searchable database (updated weekly) that includes both of the datasets
          discussed in the Webinar. Search queries can be saved. Result sets can be
          saved/extracted/tailored.
        – Build a text searchable database (updated weekly) that includes subsets of both of the
          datasets discussed in the Webinar. (e.g., Green Technology related).
        – Same ideas as above, but use full-text (75 MB/104 MB per week) or full-text with embedded
          images (1.4 GB/1.5GB per week): http://www.google.com/googlebooks/uspto-patents.html



Source: http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation#Innovative_Ideas


                                                                                                                                     28
More Questions For Todd Park
            About Big Data




http://gov.aol.com/2012/04/25/more-questions-for-todd-park-about-big-data/
                                                                             29
Conclusions and Recommendations
    • A Data Science approach to the App Challenge
      provided examples for improvements in data
      dissemination and visualization.
    • Most of the data sets are “big data” when it
      comes to the app developer community working
      on simple mobile apps using smaller data sets.
    • The Patent data dissemination offers the most
      challenge for improvement and opportunity for
      creative piloting using a Data Science approach.
For details see: http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge#Submission
                                                                                                     30
Postscript
• Presentation to Federal Big Data Senior Steering Group
  for Big Data, September 27, 2012:
   – A Data Science team comprised of NLM (Tom
     Rindflesch), Noblis (Victor Pollara), Cray (Steve
     Reinhardt), and Semantic Community (Brand Niemann), is
     working to make what Dr. George Strawn refers to as “the
     killer semantic web application for government”, Semantic
     Medline, more well-know, and functional for medical
     research by putting the Semantic Medline RDF database
     into the new Cray Graph Computer and demonstrating its
     usefulness.
   – The background for this project is at:
      • http://semanticommunity.info/A_NITRD_Dashboard/Semantic_M
        edline

                                                                31
BusinessUSA.gov Their APIs Can be
         Data Interfaces




http://gov.aol.com/2012/07/02/why-apis-arent-enough-to-make-businessusa-gov-useful/
http://semanticommunity.info/AOL_Government/BusinessUSA.gov_Their_APIs_Can_be_Data_Interfaces
                                                                                            32
Imagination at Work! Unleash Your
  Creativity with Our Census API




http://semanticommunity.info/AOL_Government/Data_Services_for_Developers
                                                                           33
Digital Agenda For Europe:
    Data As First-Class Citizen




http://gov.aol.com/2012/06/29/digital-agenda-for-europe-data-as-first-class-citizen/
http://semanticommunity.info/AOL_Government/Digital_Agenda_for_Europe                  34
Data Science Spring 2012 Exercise 1:
             2012 Presidential Campaign Finance Data




http://semanticommunity.info/AOL_Government/Beautiful_Data#Spotfire_Dashboard
                                                                           35
Data Science Spring 2012 Exercise 3:
          Evaluate Models of R Package Recommendations




http://semanticommunity.info/AOL_Government/Beautiful_Data#Spotfire_Dashboard_2
                                                                             36
Big Data and The Government Enterprise
• “More data beats clever
  algorithms but better data
  beats more data.” Monica
  Rogati @ Strata 2012
• “Big Data in memory is
  necessary to avoid loss of
  information from filtering
  and aggregation and a data
  scientist knows the data
  science and the technology
  to do that.” Brand Niemann
  @ Big Data and the
  Government Enterprise
     http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise
                                                                                          37
Big Data and The Government Enterprise




 http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise
                                                                                      38

Contenu connexe

Tendances

Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Gregg Barrett
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & ChallengesRupen Momaya
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsEMC
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive FrameworkRan Zhang
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data ScienceEdureka!
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Big Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research ActivityBig Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research ActivityAndry Alamsyah
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’seSAT Journals
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA DATASCIENCE
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
 

Tendances (20)

Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Stanford DeepDive Framework
Stanford DeepDive FrameworkStanford DeepDive Framework
Stanford DeepDive Framework
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
How it works- Data Science
How it works- Data ScienceHow it works- Data Science
How it works- Data Science
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Big Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research ActivityBig Data Analytics : Understanding for Research Activity
Big Data Analytics : Understanding for Research Activity
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 

En vedette

Location based social media
Location based social mediaLocation based social media
Location based social mediarainbowbrite89
 
OSINT Black Magic: Listen who whispers your name in the dark!!!
OSINT Black Magic: Listen who whispers your name in the dark!!!OSINT Black Magic: Listen who whispers your name in the dark!!!
OSINT Black Magic: Listen who whispers your name in the dark!!!Nutan Kumar Panda
 
The Making of a simple Cyber Threat Intelligence Gathering System
The Making of a simple Cyber Threat Intelligence Gathering SystemThe Making of a simple Cyber Threat Intelligence Gathering System
The Making of a simple Cyber Threat Intelligence Gathering SystemNiran Seriki, CCISO, CISM
 
2006 multinational intelligence (centcom ccc)
2006 multinational intelligence (centcom ccc)2006 multinational intelligence (centcom ccc)
2006 multinational intelligence (centcom ccc)Robert David Steele Vivas
 

En vedette (7)

2004 04 intelligence collection seminar
2004 04 intelligence collection seminar2004 04 intelligence collection seminar
2004 04 intelligence collection seminar
 
Location based social media
Location based social mediaLocation based social media
Location based social media
 
Presentation1
Presentation1Presentation1
Presentation1
 
OSINT Black Magic: Listen who whispers your name in the dark!!!
OSINT Black Magic: Listen who whispers your name in the dark!!!OSINT Black Magic: Listen who whispers your name in the dark!!!
OSINT Black Magic: Listen who whispers your name in the dark!!!
 
The Making of a simple Cyber Threat Intelligence Gathering System
The Making of a simple Cyber Threat Intelligence Gathering SystemThe Making of a simple Cyber Threat Intelligence Gathering System
The Making of a simple Cyber Threat Intelligence Gathering System
 
Challenges in Telecom
Challenges in TelecomChallenges in Telecom
Challenges in Telecom
 
2006 multinational intelligence (centcom ccc)
2006 multinational intelligence (centcom ccc)2006 multinational intelligence (centcom ccc)
2006 multinational intelligence (centcom ccc)
 

Similaire à International Open Government Data Conference Virtual Conference

Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyJen Stirrup
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project ManagersTze-Yiu Yong
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Trends in Data Modeling
Trends in Data ModelingTrends in Data Modeling
Trends in Data ModelingDATAVERSITY
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]aj_cache
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsMohd Izhar Firdaus Ismail
 
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Stefan Popowycz
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfmustaq4
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdfPoornimaShetty27
 

Similaire à International Open Government Data Conference Virtual Conference (20)

Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technology
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Sq lite module1
Sq lite module1Sq lite module1
Sq lite module1
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project Managers
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Big data
Big dataBig data
Big data
 
Trends in Data Modeling
Trends in Data ModelingTrends in Data Modeling
Trends in Data Modeling
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact Solutions
 
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
Visualizing Healthcare Data with Tableau (Toronto Central LHIN Presentation)
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 

Dernier

Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditNhtLNguyn9
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Doge Mining Website
 

Dernier (20)

Chapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal auditChapter 9 PPT 4th edition.pdf internal audit
Chapter 9 PPT 4th edition.pdf internal audit
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
 

International Open Government Data Conference Virtual Conference

  • 1. Department of Commerce App Challenge: Big Data Dashboards International Open Government Data Conference: Virtual Conference Best Practices From Around the World in Putting Data to Work Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 27, 2012. Updated April 30, 2012. Updated July 7, 2012. http://semanticommunity.info/AOL_Government/2012_International_Open_Government_Data_Conference http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge 1
  • 2. International Open Government Data Conference: Virtual Conference • Questions to ask each presenter to supply afterwards for a directory - are you doing these things? – The way to document the public benefits with Open Data is to be able to answer the points below: • OPEN DATA – O: Not previously Open to the public (lots of the "Open data" has already been available and is just being re-advertised) – P: Serves a Purpose (there is a reason the data was collected that clearly serves a real purpose - e.g. Congressional redistricting) – E: Educates citizens and politicians to take action (results that provide a valid basis for action) – N: Made Newsworthy by journalists (results are communicated objectively and effectively) – D: The plural of Dataum - something given or admitted especially as a basis for reasoning or inference – A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with – T: Transparent (see where the data came from, how it was analyzed, where the results came from, etc.) – A: Answers questions posed by the above 2
  • 3. Open Data Example • O: Not previously Open to the public (lots of the "Open data" has already been available and is just being re-advertised) – EPA Envirofacts Warehouse APIs (slow large queries and bulk downloads before) • P: Serves a Purpose (there is a reason the data was collected that clearly serves a real purpose - e.g. Congressional redistricting) – EPA Envirofacts data is Congressionally mandated for protection of human health and welfare • E: Educates citizens and politicians to take action (results that provide a valid basis for action) – EPA Envirofacts Web Site (over 2500 Web pages) • N: Made Newsworthy by journalists (results are communicated objectively and effectively) – My AOL Government Story is one of many such efforts • D: The plural of Dataum - something given or admitted especially as a basis for reasoning or inference – EPA has data standards and quality assurance methods for these data • A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with – Yes • T: Transparent (see where the data came from, how it was analyzed, where the results came from, etc.) – Yes, metadata is provided and combined with the new data APIs • A: Answers questions posed by the above – See my AOL Government Story with summary results as one of many such efforts 3
  • 4. Beautiful Spreadsheet Data for EPA Envirofacts Warehouse Metadata and API Dashboard • Built for my former EPA CIO, Malcolm Jackson (a mobile app - iPad) • Always wanted to do since my early days in the EPA Data Standards Branch (2000-2002) • Built a beautiful spreadsheet for public use and Spotfire application • The format is both linked metadata and linked data • Search all the metadata and get API data (but for only 9 of 13 systems and for only 5000 rows at a time) • Find key fields for data integration and build many apps • Metadata results: – Models: 15 – Tables: 227 – Rows: 2518 – Types: 40 – Columns (Data Elements): 1662 4
  • 5. Beautiful Spreadsheet Data for EPA Envirofacts Warehouse Metadata and API Dashboard Web Player 5
  • 6. Data Science Analytics for 2012 IOGDC “More data beats clever algorithms but better data beats more data.” Monica • IOGDC Conference Rogati @ Strata 2012 Knowledge Bases • IOGDS Catalog Data Sets • IOGDS Data Analytics with BI Tools – Exploiting Linked Data with Business Intelligence Tools • Acknowledgement: Kingsley Idehen, CEO, OpenLink Software 6
  • 7. Data Science Analytics for 2012 IOGDC 2012 IOGDC Knowledge Bases Web Player 7
  • 8. Data Science Analytics for 2012 IOGDC IOGDS Catalog Data Sets Web Player 8
  • 9. Data Science Analytics for 2012 IOGDC IOGDS Data Analytics with BI Tools Web Player 9
  • 10. An Information Platform • An Information Platform is the critical infrastructure component for building a Learning Organization. The most critical human component for accelerating the learning process and making use of the Information Platform is taking the shape of a new role: the Data Scientist. – Jeff Hammerbacher, in Chapter 5: Information Platforms and the Rise of the Data Scientist in the His Book “Beautiful Data” (July 2009) (see Linked Data reference below) http://semanticommunity.info/AOL_Government/Beautiful_Data#Information_Platforms_As_Dataspaces 10
  • 11. Jeff Hammerbacker • The number two data scientist in the world, according to Tim O’Reilly, is Jeff Hammerbacker, who built the data science team at Facebook and is now at Cloudera, driving the success of Hadoop as a standard tool for processing large, unstructured data sets with a network of commodity computers. Jeff also teaches ”Introduction to Data Science”, at UC Berkeley, and in his opening lecture organizes reason's for doing so into three parts as follows: – 1. Personal - Jeff's training and job experiences – 2. Putting Data to Work - Theme of the 2012 International Open Government Data Conference – 3. The Emergence of Data Science - Dominate theme of future conferences according to Robert Ames, Senior VP for Technology at In- Q-Tel, at the FCW Executive Briefing on Big Data and the Government Enterprise, June 21, 2012 http://www.forbes.com/pictures/lmm45emkh/tim-oreilly-is-the-founder-of-oreily-media/#gallerycontent 11
  • 12. My Mission Statement • 1. Personal: – Senior Data Scientist at the US EPA: • Completed Data Science Academic Training and Many EPA Data Products – Detail to Data.gov: • Built Data.gov in An Information Platform • 2. Putting Data To Work: – Data Journalist for Federal Computer Week and AOL Government: • Published Many Data Science Products and Built Own Data Journalism Handbook – Data as a First Class Citizen: Data Science and Journalism for Analytic Standards and Audit of Open Data Sites: • Working with CKAN, DoD, IC, NCOIC, NIST, OASIS, OMG, OSTP, W3C, etc. • 3. The Emergence of Data Science: – Built a Data Science Team for the Government Community: • “Killer Semantic Web Application” (Semantic MedLine on the new Cray Graph Computer) for the Federal Big Data Senior Steering Group – Challenges and Contests Using the Best High Quality Data Sets: • Heritage Provider Network Health Prize, Health Data Initiative Forums, TedMed, Department of Commerce App Challenge, etc. 12
  • 13. Data Scientist • A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge. • The title data scientist is sometimes disparaged because it lacks specificity and can be perceived as an aggrandized synonym for data analyst. Regardless, the position is gaining acceptance with large enterprises who are interested in deriving meaning from big data, the voluminous amount of structured, unstructured and semi-structured data that a large enterprise produces. • A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. Perhaps the most important skill a data scientist possesses, however, is the ability to explain the significance of data in a way that can be easily understood by others. Source: http://searchbusinessanalytics.techtarget.com/definition/Data-scientist 13
  • 14. Dr. Brand Niemann • Former Senior Enterprise Architect and Data Scientist, US Environmental Protection Agency (1980-2010). • Current Husband, Father, and Grandfather Enjoying the Golden Years! 14
  • 15. Semantic Community • Our Mantra is: Data Science Precedes the Use of SOA, Cloud, and Semantic Technologies! We use data science to help marketing and business development efforts. • Our Mission is like Googles: Organize the world’s information and make it universally accessible and useful. • Our Method is like Be Informed 4: Architectural Diagrams and Questions and Answers are not enough, you need Dynamic Case Management! • Our Sound Byte: It is not just where you put your data (cloud), but how you put it there! • Our Work: Semantically enhancing your data and writing data science stories about it. 15
  • 16. Introduction • I heard about this several months ago, but put it off until yesterday. I finished it today because I am a very good Data Scientist! • Well I almost finished it. I need the Patent data in a format that I can more readily work with and I am in communication with the USPTO about that. • I create Knowledge Bases about my Data Science work so others can follow what I do and even reproduce it themselves. My apps also work on mobile devices like iPads. • My goal was, and still is, to create a set of multiple interactive dashboards of DoC data like they have for Foreign Trade. 16
  • 17. Data Science Knowledge Base http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge 17
  • 19. Spotfire Dashboards • U.S. Census Bureau Geographic Names Information System • U.S. International Trade in Goods and Services • Data.Gov Data Catalog for US Department of Commerce • U.S. Bureau of Economic Analysis • U.S. Patent & Trademark Office 19
  • 20. U.S. Census Bureau Geographic Names Information System Web Player 20
  • 21. U.S. International Trade in Goods and Services Web Player 21
  • 22. Data.Gov Data Catalog for US Department of Commerce Web Player 22
  • 23. U.S. Bureau of Economic Analysis Web Player 23
  • 24. U.S. Patent & Trademark Office • Methodology: – Overview: Apply Gall's Law and start with the end in mind (Mashups and Decision Support) and work out the details in a simple and small content example for my next AOL Government Story! Give everything a well-defined URL for a semantically enhanced index in a Dashboard (see next slide). • 1. Follow Gall's Law which says: "A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: a complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a simple system." - John Gall, systems theorist • 2. Copy to MindTouch and add structure to the Web Pages – See http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Chall enge/DOC_USPTO_Apps_for_Innovation • 3. Look at one ZIP file under each section and subsection to see what it contains and how to use it in MindTouch (in process) – See http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Chall enge/DOC_USPTO_Apps_for_Innovation/Electronic_Data_Products 24
  • 25. U.S. Patent & Trademark Office Web Player 25
  • 26. MindTouch DoC USPTO Apps for Innovation http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation 26
  • 27. MindTouch Electronic Data Products http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation/Electronic_Data_Products 27
  • 28. Work Plan in Process • Mash-Ups: – Combine USPTO applicant/inventor information with other USPTO datasets (e.g., with USPTO assignments (ownership) data): • Google or USPTO Daily and USPTO Retro – Combine USPTO patent grants and patent application publications with other DOC data (e.g., Census or Economic data) • Innovative Ideas: – Homogenize the patent grant bibliographic text data (i.e., make it all the same format). – Same for the patent application publication bibliographic data. – Capture patent grant bibliographic text data from 1790 to 1975 using the image data. – Build a text searchable database (updated weekly) that includes both of the datasets discussed in the Webinar. Search queries can be saved. Result sets can be saved/extracted/tailored. – Build a text searchable database (updated weekly) that includes subsets of both of the datasets discussed in the Webinar. (e.g., Green Technology related). – Same ideas as above, but use full-text (75 MB/104 MB per week) or full-text with embedded images (1.4 GB/1.5GB per week): http://www.google.com/googlebooks/uspto-patents.html Source: http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge/DOC_USPTO_Apps_for_Innovation#Innovative_Ideas 28
  • 29. More Questions For Todd Park About Big Data http://gov.aol.com/2012/04/25/more-questions-for-todd-park-about-big-data/ 29
  • 30. Conclusions and Recommendations • A Data Science approach to the App Challenge provided examples for improvements in data dissemination and visualization. • Most of the data sets are “big data” when it comes to the app developer community working on simple mobile apps using smaller data sets. • The Patent data dissemination offers the most challenge for improvement and opportunity for creative piloting using a Data Science approach. For details see: http://semanticommunity.info/AOL_Government/Department_of_Commerce_App_Challenge#Submission 30
  • 31. Postscript • Presentation to Federal Big Data Senior Steering Group for Big Data, September 27, 2012: – A Data Science team comprised of NLM (Tom Rindflesch), Noblis (Victor Pollara), Cray (Steve Reinhardt), and Semantic Community (Brand Niemann), is working to make what Dr. George Strawn refers to as “the killer semantic web application for government”, Semantic Medline, more well-know, and functional for medical research by putting the Semantic Medline RDF database into the new Cray Graph Computer and demonstrating its usefulness. – The background for this project is at: • http://semanticommunity.info/A_NITRD_Dashboard/Semantic_M edline 31
  • 32. BusinessUSA.gov Their APIs Can be Data Interfaces http://gov.aol.com/2012/07/02/why-apis-arent-enough-to-make-businessusa-gov-useful/ http://semanticommunity.info/AOL_Government/BusinessUSA.gov_Their_APIs_Can_be_Data_Interfaces 32
  • 33. Imagination at Work! Unleash Your Creativity with Our Census API http://semanticommunity.info/AOL_Government/Data_Services_for_Developers 33
  • 34. Digital Agenda For Europe: Data As First-Class Citizen http://gov.aol.com/2012/06/29/digital-agenda-for-europe-data-as-first-class-citizen/ http://semanticommunity.info/AOL_Government/Digital_Agenda_for_Europe 34
  • 35. Data Science Spring 2012 Exercise 1: 2012 Presidential Campaign Finance Data http://semanticommunity.info/AOL_Government/Beautiful_Data#Spotfire_Dashboard 35
  • 36. Data Science Spring 2012 Exercise 3: Evaluate Models of R Package Recommendations http://semanticommunity.info/AOL_Government/Beautiful_Data#Spotfire_Dashboard_2 36
  • 37. Big Data and The Government Enterprise • “More data beats clever algorithms but better data beats more data.” Monica Rogati @ Strata 2012 • “Big Data in memory is necessary to avoid loss of information from filtering and aggregation and a data scientist knows the data science and the technology to do that.” Brand Niemann @ Big Data and the Government Enterprise http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise 37
  • 38. Big Data and The Government Enterprise http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise 38