SlideShare une entreprise Scribd logo
1  sur  11
Digital Enterprise Research Institute                                                       www.deri.ie




                             Google Public Data Explorer

                                                                              Aftab Iqbal



 Stefan.Decker@deri.org
 http://www.StefanDecker.org/

 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Introduction
Digital Enterprise Research Institute   www.deri.ie




            DSPL consists of :
                   XML
                   CSV files
DSPL Dataset
Digital Enterprise Research Institute                                                www.deri.ie

            General information
                   About the dataset
            Concepts
                   Definitions of "things" that appear in the dataset (e.g., counties,
                    unemployment rate, gender, etc.)
            Slices
                   Combinations of concepts for which there are data
            Tables
                   Data for concepts and slices. Concept tables hold enumerations
                    and slice tables hold statistical data
            Topics
                   Organize the concepts of the dataset in a meaningful hierarchy
                    through labeling
School Enrollment 2009_2010 *
Digital Enterprise Research Institute                                                              www.deri.ie




        School_Roll_No                  Short_Name    Level           Male               Female


              00697S               ST BRIDGIDS NS    Primary           377                   447
              01170G                     NAUL NS     Primary           40                    61
             09492W               BALSCADDEN NS      Primary           98                    133
                  …                         …          …                …                    …




* Snapshot took from http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385
DSPL – Contd.
Digital Enterprise Research Institute                                                                www.deri.ie




            General Information
                   General information about the provider of the dataset

           <info>
              <name>
               <value>School</value>
              </name>
              <description>
               <value>Statistics about Fingal County Schools</value>
              </description>
              <url>
               <value></value>
              </url>
            </info>

            <provider>
             <name>
              <value>County Fingal School Enrollment Statistics</value>
             </name>
             <url>
              <value>http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385</value>
             </url>
            </provider>
DSPL – Contd.
Digital Enterprise Research Institute                                                                                                   www.deri.ie




            Concepts
                   Type of data that appears in a dataset
           <concept id="Schools“ extends="geo:location" >
               <info>                                                     <table id="schools_table">
                <name>                                                       <column id="School" type="string"/>
                  <value>Schools</value>                                     <column id=“School_Roll_No" type="string"/>
                </name>                                                      <column id="latitude" type="float"/>
                <description>                                                <column id="longitude" type="float"/>
                  <value>List of schools for Co. Fingal</value>              <data>
                </description>                                                 <file format="csv" encoding="utf-8">schools.csv</file>
               </info>                                                       </data>
               <type ref="string"/>                                         </table>
               <table ref="schools_table"/>
           </concept>


                     school                             name                          latitude                   longitude
                     00697S              Saint Bridgids National School                      53.37514                  -6.36221
                     01170G              S N Na H Aille Naul National School                 53.57887                  -6.28564
                     09492W              Balscadden National School                          53.61528                  -6.23218
                     09642P              Burrow National School                              53.39129                  -6.10028
                       …                                  …                              …                          …
DSPL – Contd.
Digital Enterprise Research Institute                                                                   www.deri.ie




            Slices
                   It’s a combination of concepts for which data exists
                   contains two kinds of concept references: Dimensions and
                    metrics.

                                                     <table id="enrolment_slice_table">
          <slice id="enrolment_slice">                  <column id="school" type="string"/>
              <dimension concept="school"/>             <column id="M" type="integer"/>
              <dimension concept="time:year"/>          <column id="F" type="integer"/>
              <metric concept="M"/>                     <column id="year" type="date" format="yyyy"/>
              <metric concept="F"/>                     <data>
              <table ref="enrolment_slice_table"/>        <file format="csv" encoding="utf-
          </slice>                                   8">school_enrolment_slice.csv</file>
                                                        </data>
                                                       </table>
School Enrollment Slice
Digital Enterprise Research Institute                                               www.deri.ie




                                           Dimensions     metrics




                        School                     Male             Female   Year


          Saint Bridgids National School            377              447     2009

          Saint Bridgids National School            475              392     2010

            Balscadden National School              98               133     2009

            Balscadden National School              126              102     2010
                           …                        …                 …       …
DSPL – Contd.
Digital Enterprise Research Institute                                                         www.deri.ie




            Topics
                   Classify concepts hierarchically, and are used by applications to
                    help users navigate to your data.



                                <topic id="Male_indicators">
                                   <info>
                                     <name><value>Male Students Enrollment</value></name>
                                   </info>
                                  </topic>
                                  <topic id="Female_indicators">
                                   <info>
                                     <name><value>Female Students Enrollment</value></name>
                                   </info>
                                  </topic>
Data Cleansing
Digital Enterprise Research Institute                                                                                                 www.deri.ie


                School Enrollment 2009                                                                School Enrollment 2010
 School_Roll_No Short_Name               Level     Male        Female         School_Roll_No Short_Name          Level       Male     Female
     00697S    ST BRIDGIDS NS           Primary    377           447              00697S    ST BRIDGIDS NS      Primary      475        392
    01170G        NAUL NS               Primary     40           61              01170G        NAUL NS          Primary       58        40
       …              …                    …        …             …                 …              …               …          …          …




                                                  School              Male               Female          Year
                                                  00697S              377                  447           2009
                                                  00697S              475                  392           2010
                                                  01170G               40                   61           2009
                                                  01170G               58                   40           2010
                                                    …                  …                    …             …
                                                              School_Enrollment_Slice.csv


                            School                                Name                            Latitude                Longitude
                            00697S                    Saint Bridgids National School              53.37514                 -6.36221
                            01170G                 S N Na H Aille Naul National School            53.57887                 -6.28564
                              …                                      …                                …                        …
                                                                         Schools.csv
Digital Enterprise Research Institute                                                                                                  www.deri.ie




                                                         <table id="enrolment_slice_table">
   <slice id="enrolment_slice">                             <column id="school" type="string"/>
       <dimension concept="school"/>                        <column id="Male" type="integer"/>
       <dimension concept="time:year"/>                     <column id="Female" type="integer"/>
       <metric concept="Male"/>                             <column id="year" type="date" format="yyyy"/>
       <metric concept="Female"/>                           <data>
       <table ref="enrolment_slice_table"/>                   <file format="csv" encoding="utf-8">School_Enrollment_Slice.csv</file>
   </slice>                                                 </data>
                                                           </table>




   Deployment

                                                  Compressed




                 CSV files                    metadata

Contenu connexe

Similaire à Google Public Data Explorer

Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...DATAVERSITY
 
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...DATAVERSITY
 
RTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke Eprints
RTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke EprintsRTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke Eprints
RTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke EprintsIsmail Fahmi
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
 
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...Betacowork
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptxHakkinsRaj
 
Data Curation Lifecycle Management at the University of Edinburgh
Data Curation Lifecycle Management at the University of EdinburghData Curation Lifecycle Management at the University of Edinburgh
Data Curation Lifecycle Management at the University of EdinburghEDINA, University of Edinburgh
 
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...Data Blueprint
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John KayeUKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John KayeUKSG: connecting the knowledge community
 
2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts
2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts
2011 SAIR Updating Digital Measures Activity Insight Using MS Office - HandoutsDavid Onder
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challengesMichael Hausenblas
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
Finding & accessing data
Finding & accessing dataFinding & accessing data
Finding & accessing dataISSDA
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesRichard Cyganiak
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 

Similaire à Google Public Data Explorer (20)

Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
 
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
Practical Applications for Data Warehousing, Analytics, BI, and Meta-Integrat...
 
RTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke Eprints
RTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke EprintsRTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke Eprints
RTFM: Cara Migrasi dari GDL (Ganesha Digital Library) ke Eprints
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
How to Publish Open Data
How to Publish Open DataHow to Publish Open Data
How to Publish Open Data
 
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
Course 4 : Big Data Structuring, Integration and Management Systems by Daan G...
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
 
Data Curation Lifecycle Management at the University of Edinburgh
Data Curation Lifecycle Management at the University of EdinburghData Curation Lifecycle Management at the University of Edinburgh
Data Curation Lifecycle Management at the University of Edinburgh
 
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
Data-Ed Online: Practical Applications for Data Warehousing, Analytics, BI, a...
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John KayeUKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
 
2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts
2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts
2011 SAIR Updating Digital Measures Activity Insight Using MS Office - Handouts
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
Finding & accessing data
Finding & accessing dataFinding & accessing data
Finding & accessing data
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
dcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data cataloguesdcat: An RDF vocabulary for interoperability of data catalogues
dcat: An RDF vocabulary for interoperability of data catalogues
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 

Google Public Data Explorer

  • 1. Digital Enterprise Research Institute www.deri.ie Google Public Data Explorer Aftab Iqbal Stefan.Decker@deri.org http://www.StefanDecker.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • 2. Introduction Digital Enterprise Research Institute www.deri.ie  DSPL consists of :  XML  CSV files
  • 3. DSPL Dataset Digital Enterprise Research Institute www.deri.ie  General information  About the dataset  Concepts  Definitions of "things" that appear in the dataset (e.g., counties, unemployment rate, gender, etc.)  Slices  Combinations of concepts for which there are data  Tables  Data for concepts and slices. Concept tables hold enumerations and slice tables hold statistical data  Topics  Organize the concepts of the dataset in a meaningful hierarchy through labeling
  • 4. School Enrollment 2009_2010 * Digital Enterprise Research Institute www.deri.ie School_Roll_No Short_Name Level Male Female 00697S ST BRIDGIDS NS Primary 377 447 01170G NAUL NS Primary 40 61 09492W BALSCADDEN NS Primary 98 133 … … … … … * Snapshot took from http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385
  • 5. DSPL – Contd. Digital Enterprise Research Institute www.deri.ie  General Information  General information about the provider of the dataset <info> <name> <value>School</value> </name> <description> <value>Statistics about Fingal County Schools</value> </description> <url> <value></value> </url> </info> <provider> <name> <value>County Fingal School Enrollment Statistics</value> </name> <url> <value>http://data.fingal.ie/ViewDataSets/Details/default.aspx?datasetID=385</value> </url> </provider>
  • 6. DSPL – Contd. Digital Enterprise Research Institute www.deri.ie  Concepts  Type of data that appears in a dataset <concept id="Schools“ extends="geo:location" > <info> <table id="schools_table"> <name> <column id="School" type="string"/> <value>Schools</value> <column id=“School_Roll_No" type="string"/> </name> <column id="latitude" type="float"/> <description> <column id="longitude" type="float"/> <value>List of schools for Co. Fingal</value> <data> </description> <file format="csv" encoding="utf-8">schools.csv</file> </info> </data> <type ref="string"/> </table> <table ref="schools_table"/> </concept> school name latitude longitude 00697S Saint Bridgids National School 53.37514 -6.36221 01170G S N Na H Aille Naul National School 53.57887 -6.28564 09492W Balscadden National School 53.61528 -6.23218 09642P Burrow National School 53.39129 -6.10028 … … … …
  • 7. DSPL – Contd. Digital Enterprise Research Institute www.deri.ie  Slices  It’s a combination of concepts for which data exists  contains two kinds of concept references: Dimensions and metrics. <table id="enrolment_slice_table"> <slice id="enrolment_slice"> <column id="school" type="string"/> <dimension concept="school"/> <column id="M" type="integer"/> <dimension concept="time:year"/> <column id="F" type="integer"/> <metric concept="M"/> <column id="year" type="date" format="yyyy"/> <metric concept="F"/> <data> <table ref="enrolment_slice_table"/> <file format="csv" encoding="utf- </slice> 8">school_enrolment_slice.csv</file> </data> </table>
  • 8. School Enrollment Slice Digital Enterprise Research Institute www.deri.ie Dimensions metrics School Male Female Year Saint Bridgids National School 377 447 2009 Saint Bridgids National School 475 392 2010 Balscadden National School 98 133 2009 Balscadden National School 126 102 2010 … … … …
  • 9. DSPL – Contd. Digital Enterprise Research Institute www.deri.ie  Topics  Classify concepts hierarchically, and are used by applications to help users navigate to your data. <topic id="Male_indicators"> <info> <name><value>Male Students Enrollment</value></name> </info> </topic> <topic id="Female_indicators"> <info> <name><value>Female Students Enrollment</value></name> </info> </topic>
  • 10. Data Cleansing Digital Enterprise Research Institute www.deri.ie School Enrollment 2009 School Enrollment 2010 School_Roll_No Short_Name Level Male Female School_Roll_No Short_Name Level Male Female 00697S ST BRIDGIDS NS Primary 377 447 00697S ST BRIDGIDS NS Primary 475 392 01170G NAUL NS Primary 40 61 01170G NAUL NS Primary 58 40 … … … … … … … … … … School Male Female Year 00697S 377 447 2009 00697S 475 392 2010 01170G 40 61 2009 01170G 58 40 2010 … … … … School_Enrollment_Slice.csv School Name Latitude Longitude 00697S Saint Bridgids National School 53.37514 -6.36221 01170G S N Na H Aille Naul National School 53.57887 -6.28564 … … … … Schools.csv
  • 11. Digital Enterprise Research Institute www.deri.ie <table id="enrolment_slice_table"> <slice id="enrolment_slice"> <column id="school" type="string"/> <dimension concept="school"/> <column id="Male" type="integer"/> <dimension concept="time:year"/> <column id="Female" type="integer"/> <metric concept="Male"/> <column id="year" type="date" format="yyyy"/> <metric concept="Female"/> <data> <table ref="enrolment_slice_table"/> <file format="csv" encoding="utf-8">School_Enrollment_Slice.csv</file> </slice> </data> </table> Deployment Compressed CSV files metadata

Notes de l'éditeur

  1. A DSPL dataset is a bundle that contains an XML file and a set of CSV files. The CSV files are simple tables containing the data of the dataset. The XML file describes the metadata of the dataset, including informational metadata like descriptions of measures, as well as structural metadata like references between tables. The metadata lets non-expert users explore and visualize your data.The only prerequisite for understanding this tutorial is a good level of understanding of XML. Some understanding of simple database concepts (e.g., tables, primary keys) may help, but it&apos;s not required. For reference, the completed XML file and complete dataset bundle associated with this tutorial are also available for review.
  2. General information about the provider of the dataset: its name and a URL where more information can be found (generally the data provider&apos;s home page)The &lt;info&gt; element contains general information about the dataset: name, description, and a URL where more information can be foundThe &lt;provider&gt; element contains information about the provider of the dataset: its name and a URL where more information can be found (generally the data provider&apos;s home page).
  3. Now that we have provided some general information about the dataset, we&apos;re ready to start defining its contents.Concepts that are categorical, such as state, are associated with concept tables, which enumerate all their possible values (California, Arizona, etc.). Concepts may have additional columns for properties such as the name or the country of a state.A concept is a definition of a type of data that appears in a dataset. The data values that correspond to a given concept are called instances of that concept.Every concept must provide an id that uniquely identifies the concept within the dataset. Just like for the dataset and its provider, the &lt;info&gt; elements provide textual information about the concept, such as its name and description. The &lt;type&gt; element specifies the data type for the instances of the concept (in other words, its &quot;values&quot;).Finally, the school concept has a &lt;table&gt; element. This element references a table that enumerates the list of all schools.The schools table specifies the columns of the table and their types, and references a CSV file that contains the data.
  4. The values of metrics vary with the values of dimensions.Just like concepts, slices include a reference to a table that contains the data of the slice. The referenced table must have one column for each dimension and metric of the slice. Just as for concepts, the slice&apos;s dimensions and metrics are mapped to the table columns with the same ids.Slices define each combination of concepts for which there is statistical data in the dataset. A slice contains dimensions and metrics. In the above picture, the dimensions are blue and the metrics are orange. In this example, the slice gender_country_slice has data for the metric population and the dimensions country, year and gender. Another slice, called country_slice, gives total yearly population numbers (metric) for countries.