SlideShare une entreprise Scribd logo
1  sur  7
Télécharger pour lire hors ligne
1




                 Open Governmental Data on the Web: A
                         Semantic Approach
                                                            Julia Hoxha
                             Institute of Applied Informatics and Formal Description Methods (AIFB)
                                            Karlsruhe Institute of Technology, Germany
                                                       Julia.Hoxha@kit.edu

                                                          Armand Brahaj
                             FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
                                                Armand.Brahaj@fiz-karlsruhe.de



                                                                     to knowledge representation.
   Abstract— Initiatives of making governmental data open are           From a technical perspective, we refer as Open Data to any
continuously gaining interest recently. While this presents          sets of data which can be reused with no restrictions by any
immense benefits for increasing transparency, the problem is that
                                                                     form of licensing or patents, data that are well structured and
the data are frequently offered in heterogeneous formats, missing
clear semantics that clarify what the data describes. The data are   can be easily accessed and reused by institutions, scientist or
displayed in ways that are not always clearly understandable to a    the web community.
broad range of user communities that need to make informed              Being driven by benefits of the aforementioned paradigm,
decisions.                                                           we aim to increase openness of data offered by different
   We address these problems and propose an overall approach,        governmental institutions, and at the same time, provide means
in which raw statistical data independently gathered from the        of representing these data in more understandable ways. There
different government institutions are formally and semantically      are several challenges underlying our work, starting from the
represented, based on an ontology that we present in this paper.
                                                                     way that data is originally provided by the sources. Data are
We further introduce the approach we deploy in publishing these
data in alignment with Linked Data principles, and present the       mostly offered in heterogeneous formats, missing clear
methods implemented to query single or combined dataset and          semantics that clarify what the data describes, and displayed in
visualize the results in understandable ways. The introduced         ways that is not clearly understandable to a broad range user
approach enables data integration, leading to vast opportunities     communities that need to make informed decisions. These
for information exchange, analysis on combined datasets,             kinds of representation impede the integration of data coming
simplicity to create mashups, and exploration of innovative ways     from different sources, consequently leading to limited
to use these data creatively.                                        opportunities for information exchange, analysis on combined
                                                                     datasets, simplicity to create mashups, and exploration of
                                                                     innovative ways to use these data creatively.
  Keywords— linked open data, open governmental data,
                                                                       From a technical perspective, our work consists in applying
ontology, semantic web, statistical metadata
                                                                     Semantic Web technologies to enable data integration among
                                                                     the different organizations and establish links to interconnect
                                                                     data together on the Web. Since a crucial issue when
I. INTRODUCTION
                                                                     semantically modeling data is the preservation of their

I   NITIATIVESof making governmental data open are gaining
   great interest recently, with more countries constantly
embracing the open data paradigm. Open Data is a new form
                                                                     meaning, we present an approach, which deploys Linked Open
                                                                     Data [linkedata.org] principles and introduces an ontology as
                                                                     foundation for the semantic representation of government
of representation of the traditional public information, which       statistical data. On the one hand, the ontology offers a formal,
has been the basis of decision making in political, economical       explicit descriptions of knowledge enabling in this way a
or sociological activities for centuries.                            platform for integration, where data can be still interpreted
   The term open has been introduced in the late 2000th to           correctly for valid research outcomes. On the other hand,
generalize the accessibility of the information that can be          Linked Data provides a technical basis for publishing, sharing
public and easily accessible by anyone. The concept of               and linking data on the web, based on standardized formats
information is also generalized to the concept of data, by           and interfaces.
referring to raw sets of valuable information which can be             In this paper, we present the ontology that enables formal,
structured, analyzed and presented in different forms leading        semantic representation of government statistical data. We
                                                                     further introduce the approach we deploy in publishing them
2

as linked data, and the methods implemented to query single or              alignment with Linked Data principles, which we present in
combined dataset and visualize the results. The semantic                    Section 3.
approach and the ontology are described in details in Section                  Another important component in our approach is the Query
2, whereas Section 3 gives an overview of Linked Open Data                  Interface, which enables the user community as well as the
and our work in alignment to those principles. Section 4                    source institutions that offer these statistical data to access the
presents the technical implementation of our approach                       knowledge base and pose queries upon it. This component
focusing on the issues of integration and processing of data                consists of an online graphical interface as well as a SPARQL2
from different sources and querying and visualization                       Endpoint, which we both present in Section 4. The results of a
techniques for representing multiple datasets in a graph. We                query may be displayed as structured Excel and RDF files to
give an overview of related work in the field of open data and              the users, but they are additionally visualized in ways that
statistical vocabularies in Section 5, concluding in Section 6              make these data much more tangible and understandable. This
with a summary of our work.                                                 approach provides open, structured and linked data to the
                                                                            different institutions, which are then able to use each-others
                                                                            datasets very easily.
II. SEMANTIC APPROACH
   In our work, we are dealing with a large set of sources from                A. ODA Ontology
which statistical data is gathered. These sources are mainly                The need to provide a unified representation for the different,
websites of different government institutions, which offer data             heterogeneous data gathered from the various sources, as well
online in unstructured or semi-structured formats such as text              as the necessity to provide formal semantics to these data, have
documents, excel files or XML files. There are very few                     greatly motivated us to choose a semantic approach using an
sources that can provide data structured in Entity-Relationship             ontology, referred as ODA3 ontology, which we make
model.                                                                      available online4. While modeling the ontology, our goal was
   Our goal, after gathering the raw data independently from                to follow linked data principles and reuse as much as possible
the different sources, is to convert them to a representation               existing well-known ontologies, but at the same time extending
based on a unified structure and provide semantics for each of              it with other important entities.
the data entries. For this purpose, we have chosen to                           An ontology is defined as a “a formal, explicit specification
semantically represent the data based on an ontology that we                of a shared conceptualization” [1]. It provides a common
have built, and then further serialize data as RDF triples                  understanding of the domain knowledge, increasing
following Linked Data principles. In this section, we present               communication capabilities between people and heterogeneous
the modeled ontology as well as introduce our approach in                   applications. Using an ontology, we are able to formally
publishing Linked Open Data.                                                describe the semantics of terms and items of statistical datasets
                                                                            and give explicit meaning to the provided information. Besides
                                                                            sharing a common understanding, another direct benefit of a
                                                                            formal semantics is the support for machine-readability, which
                                                                            then enables automated reasoning, information integration and
                                                                            application of intelligent approaches such as semantic search.
                                                                                The objective of the modeled ontology is to provide a
                                                                            schema (also referred to as vocabulary) for the semantic
                                                                            representation of statistical data, which, on one side, is
                                                                            understandable and simple to be used from different
                                                                            institutions or organizations that offer statistical datasets, and
                                                                            on the other side, is consistent and complete enough to model
                                                                            all the aspects and entries in these datasets.
                                                                                 A dataset is created as a container for the data entries found
                                                                            in a particular statistical file. Each dataset contains information
                                                                            about the date it was created, its creator and publisher, and a
                                                                            title. The dataset contains data entries, which represent the
                                                                            central component modeled in our ontology. A data entry has a
                                                                            fixed value and, among other possible dimensions, is measured
                                                                            for a particular year and a specific country.
                                                                                In a dataset, the data entries serve to measure particular
  Fig. 1. Overall semantic approach for publishing open governmental data   indicators that monitor for example the socio-economic
                                                                            development of a country, demographic changes, public
   In our approach, the knowledge base consists of the                      health, developments in education, and many other aspects.
Ontology Model and a storage for RDF files. The raw
statistical data are converted to RDF1 using a converter in                   2
                                                                                http://www.w3.org/TR/rdf-sparql-query/
                                                                              3
                                                                                ODA stands for Open Data Albania
  1                                                                           4
      Resource Description Framework, http://www.w3.org/RDF/                    http://open.data.al/oda.owl
3

These numerous indicators, also organized among each-other            URI: http://open.data.al/oda#Dataset
in groups, are additionally assigned particular topics. Each          Superclass: skos:Concept
indicator is used to estimate and monitor the progress done in
a particular aspect. Besides, a more complete view is provided
when looking at the progress of all indicators within a topic.        Class: oda:DataEntry
                                                                      DataEntry- a data entry dataset, represents a single piece of
   All these concepts are modeled in the ontology, which is
                                                                      data
illustrated in Figure 2. It is encoded using the sublanguage
                                                                      URI: http://open.data.al/oda#DataEntry
OWL DL of the Web Ontology Language (OWL)5, which goes
                                                                      Superclass: event:Event
beyond other standards in its power to represent machine
interpretable content on the Web. Moreover, OWL DL itself
offers high expressiveness.                                           Class: oda:Indicator
                                                                      Indicator - a statistical indicator that is being measured by
                                                                      the entries in the dataset
                                                                      URI: http://open.data.al/oda#Indicator

                                                                      Class: oda:Topic
                                                                      Topic - a theme to which the indicator and dataset belong
                                                                      URI: http://open.data.al/oda#Topic

                                                                      Class: oda:Dimension
                                                                      Dimension - a dimension of a statistical data entry
                                                                      URI: http://open.data.al/oda#Dimension
                                                                      Class: oda:Year
                                                                      Year- time dimension, represents the year to which a
                                                                      statistical data entry belongs
                                                                      URI: http://open.data.al/oda#Year
                                                                      Superclass: oda:Dimension

                                                                      Class: oda:Country
                                                                      Country- spatial dimension, represents the country to which
                                                                      a statistical data entry belongs
                                                                      URI: http://open.data.al/oda#Country
                                                                      Superclass: oda:Dimension

                                                                       Properties. The relations among the different individuals of
                                                                    the ontology classes are defined by using the properties
                                                                    construct, distinguishing between two main types: Object
                                                                    properties and Datatype properties. The Object properties link
                                                                    individuals to individuals, whereas the Datatype properties
                                                                    link individuals to data values. These are a few of the main
                                                                    Object properties modeled in our ontology:
                                                                      Property: oda:dataset
                           Fig. 2. ODA Ontology
                                                                      dataset- relates a statistical data entry to a dataset
                                                                      Domain: oda:DataEntry
                                                                      Range: oda:Dataset
   We have modeled the ontology using Protégé6, a platform
that offers support for ontology creation, visualization and          Property: oda:indicator
manipulation. We explain below the main entities that                 indicator- relates a statistical data entry to an indicator
compose the modeled ODA ontology.                                     Domain: oda:DataEntry
   Classes. The concepts and terms of the domain of                   Range: oda:Indicator
knowledge are represented in an ontology as classes. Our
ontology consists of the following main classes:                      Property: oda:subindicator
   Class: oda:Dataset                                                 subindicator- relates a parent indicator to other indicators
   Dataset - a statistical dataset, represents the container of a     classified as its subcomponents (serves for grouping
   set of data                                                        indicators)
                                                                      Domain: oda:Indicator
  5
      http://www.w3.org/TR/owl-guide/                                 Range: oda:Indicator
  6
      http://protege.stanford.edu/
4

  Property: oda:dimension                                              period
  indicator- relates a statistical data entry to an indicator          Ontology: Statistical Data and Metadata Exchange (SDMX)
  Domain: oda:DataEntry                                                vocabulary10
  Range: oda:Dimension
                                                                       Property: dc:title, dc:publisher, dc:creator, dc:date
   We have also defined a set of Datatype properties e.g.              Ontology: Dublin Core Metadata Element Set11
oda:indicatorname or oda:topicname properties, which
assign a literal value (in this case string) to the individuals of   III. LINKING OPEN GOVERNMENTAL DATA
the class Indicator and Topic, respectively.                           Making data freely available for everyone on the Web is
                                                                     further empowered when there are well-formed links
   Individuals. The classes of an ontology are instantiated,         established among the items of the different data sources.
specifying the concrete objects or individuals that belong to a      Making data open and linked is the goal of Linked Open Data
particular class. We have created a large number of topic            paradigm [2], which promotes the ways of providing data that
instances such as Social Development, Economic                       is freely accessible, verifiably, and connected to other data
Development, Social Inclusion, Public Health, Education, etc.        sources.
We have related these topics to indicators, such that a topic           There are four basic principles expected to be fulfilled when
may have multiple indicators assigned to it. In order to             aiming to publish linked data, in order to make these data
formalize the topics consistently, we have also consulted the        interconnected [3]:
way statistical data is being organized into particular themes             Use Uniform Resource Identifiers (URIs) as names to
from Eurostat7, the statistical office of the European Union.                 identify things
   The indicators are modeled after a thorough analysis of the             Use HTTP URIs so that people can refer to and look
statistical data offered by the different governmental                        up (“dereference”) those names
institution. They are organized in groups using the                        When someone looks up a URI, provide useful
oda:subindicator relation, forming this way a hierarchy of
                                                                              information using standards such as RDF, SPARQL
parent-child indicators.                                                   Include links to other URIs, so that more things and
   We have currently created in our ontology instances of more                related information can be discovered on the Web
than 100 indicators. Just to give an insight on what do these
indicators consist, we mention below a few of them belonging            From a technical perspective [5], the objective is to use
to the topic of Economic Development:                                common standards and techniques to extend the Web by
     GDP Indicators: GDP nominal per capita, GDP nominal            publishing data as RDF (graph data format for representing
      growth rate, GDP per capita in PPS (for EU states)             information on the Web), creating well-formatted RDF links
     Budget Indicators: Expenditures (Total Expenditures,           between the data items, and performing search on the data via
      Capital      Expenditures,       Current      Expenditures,    standardized languages such as SPARQL query language for
      Domestic/Foreign Financing Expenditures), Revenue,             RDF.
      Deficit (Deficit Growth Rate, Deficit Financing), etc.            Recognizing the many benefits of Linked Open Data, we
     Inflation Indicators: monthly Inflation, yearly Inflation      have also followed an approach that is in alignment with these
      rate.                                                          principles, illustrating it with the following example in Figure
                                                                     3.    The figure shows an excerpt of the RDF/XML
   Aiming to rigorously follow the Linked Data principles,           representation of the information contained in a dataset on the
explained in more details in Section 3, one of our goals while       yearly inflation rates of different countries in Europe.
modeling      the    ontology      was     to   reuse     existing
vocabularies/ontologies. The following entities are used from           <rdf:Description
the ontologies that we have imported:                                   rdf:about="http://open.data.al/yearlyinflationrate#dataset">
   Class: event:Event                                                      <rdf:type rdf:resource="http://open.data.al/oda#Dataset"/>
   Event – An arbitrary classification of a space/time region,             <dc:title>Inflation Rate per year and country</dc:title>
                                                                           <dc:publisher>International Monetary Fund (IMF)</dc:publisher>
   by a cognitive agent. An event may have actively
                                                                           <dc:creator>Open Data Albania (ODA)</dc:creator>
   participating agents, passive factors, products, and a                  <dc:date>2011-03-22T10:56:40+0000</dc:date>
   location in space/time.                                              </rdf:Description>
   Ontology: The Event Ontology8
                                                                        <rdf:Description rdf:nodeID="A0">
                                                                         <oda:dataset
  Class: skos:Concept
                                                                           rdf:resource="/yearlyinflationrate#dataset"/>
  Ontology: Simple Knowledge Organization System                         <oda:indicator
  (SKOS)9                                                                  rdf:resource="http://open.data.al/oda#YearlyInflationRate"/>
                                                                         <oda:country rdf:resource="http://dbpedia.org/resource/Greece"/>
  Property: sdmx-measure:obsValue                                        <oda:year rdf:resource="http://dbpedia.org/resource/2002"/>
  obsValue- The value of a particular variable at a particular           <sdmx-measure:obsValue
                                                                           rdf:datatype="http://www.w3.org/2001/XMLSchema#double">
  7
    http://epp.eurostat.ec.europa.eu/
  8                                                                    10
    http://motools.sourceforge.net/event/event.html                         http://www.sdmx.org/
  9                                                                    11
    http://www.w3.org/2004/02/skos/                                         http://dublincore.org/documents/dces/
5

         3.498 </sdmx-measure:obsValue>                               of records and examples of visual representation of datasets
       <rdf:type rdf:resource="http://open.data.al/oda#DataEntry"/>   generated by visitors or editors of the website.
      </rdf:Description>
                                                                        A visualization service could be delivered through the site
                                                                      and could include analytics, graphics, charting, and other ways
            Fig. 3. Excerpt of a Dataset in RDF/XML Representation    of using the data. The enhanced visualizations is built on top
                                                                      of published APIs in collaboration with third party open
   Based on the aforementioned principles, well-formed URI            source applications (Fig. 4).
references have been used in our approach to identify
resources such as dataset, indicator, country, year, etc. We
have invested on a technical infrastructure to make the URIs
dereferenceable, making it able to retrieve further an
RDF/XML description and various classifications of the
identified resource when accessing its URI. RDF links to
resources from other data sources are established, enabling
visitors and user agents to navigate the Web of data. Examples
include direct links to DBpedia12 for resources such as country
and year, or references via the rdfs:seeAlso property for the
Indicators, in the case when the link exists, e.g. the indicator
“GDP Nominal per Capita” would have a link to the
respective resource13 in DBpedia.
   Reusability of existing vocabularies is also a requirement
that tends to facilitate data processing from different client
applications. Accordingly, terms from well-known ontologies                         Fig. 4. Data layers and tools implemented
have been reused, but we have certainly complemented these
vocabularies with additional concepts for representing the               Our work consists of two stages. The first stage is the
statistical data. Particularly important to mention in our            aggregation of the data and the processing of linked data
approach are the concept and taxonomy of indicators as well           structures. The second stage consists on providing a set of
as topics, which are additionally modeled in the ontology and         tools that can make use of these data. As an example of the
defined in our own namespace. Common constructs for                   technology that can be used, a set of visualization
providing mappings to other ontologies are rdfs:subClassOf            representation are generated. The visual representations are
or rdfs:subPropertyOf. As already shown in the presented              published together with a comment providing some additional
ontology above, we have modeled a set of terms as subclasses          information on the dataset retrieved. The operations from the
of concepts (same as for properties) belonging to other               aggregation of data up to the publishing of examples goes
vocabularies.                                                         through the following main steps.
   The linked data on the web is primarily intended to be
processed by machines, but there is an increasing interest from
the human visitors to access and navigate through the data.             A. Aggregation of the information
Therefore, we have aimed towards an approach that is                     Data is aggregated from periodical governmental agency
convenient for both parts. On the one hand, human readability         reports. The format of these reports is most of the time
of the data is facilitated via provision of textual information       provided in Pdf files and sometimes in spreadsheet format. All
(possibly in different languages) in forms of comments and            the information is stored in an offline repository for original
labels, using respectively the properties rdfs:comment and            archiving purposes.
rdfs:label.
   On the other hand, our objective is to ease machine
                                                                        B. Data Processing
interpretability, stating important information explicitly, such
as specification of domain, range and cardinality in our                 The information gathered in the aggregation step is semi-
ontology. We provide information in such a way that avoids            automatically processed to a well-organized Linked Data
inconsistencies in data representation, but at the same time          structure. In order to respond to all the ontologies on the
allows flexibility of the schema to be reused by other parties        system, a set of spreadsheet templates is generated. The
for data modeling.                                                    information is imported into the spreadsheets and then
                                                                      automatically transferred into RDF structure through a tool
IV. TECHNICAL IMPLEMENTATION                                          referred to as ODA Import to RDF.
                                                                         The RDF datasets are stored in a CKAN repository, which
   We aim to provide access to all the citizens, technically          is made public and can be accessed via the CKAN web
inclined or not, to discover structured data, and reuse them. To      interface and CKAN API.
serve up these data sets, the ODA website accesses a catalog             Beside the interface of CKAN, the datasets are also stored
                                                                      in the ODA repository, which is directly connected to an API.
12
     http://dbpedia.org/                                              The Open Data API is a RESTful, service-oriented platform
13
     http://dbpedia.org/property/gdpNominalPerCapita
6

that allows developers to easily access datasets and create                data.addColumn('number','Year');
                                                                           data.addColumn('number','Value');
independent services through these calls. REST uses the HTTP               data.addRows([
protocol and, as such, requests use the common URL format.                   ["Albania",2010,25],
                                                                             ["Albania",2011,25],
   The API provides simple methods that developers can use to                ["Croatia",2010,37],
tap into the functionality and rich datasets, and gather                     ["Croatia",2011,37],
information, in JSON or XML format, related to different                     ["Greece",2010,39],
                                                                             ["Greece",2011,42]
indicators and topics.                                                       ]);
   All requests on the ODA API should consist of a host name
("http://api.open.data.al "), a path that contains an object and             The result provided by the SPARQL Endpoint is modified
an action (e.g. "/indicator/info" or "/indicator/list"), and a             by a php script to generate the Data Table required by the
query string that contains the method parameters (e.g.                     Visualisation API (See Fig 5).
"apiKey", "term", "indicatorId", etc.).
   The URL template of a request looks like:
  http://api.open.data.al/object/action?apiKey
=myKey&param1=value1&param2=value2

    Another way of accessing the information from the RDF
datasets is through a SPARQL endpoint. The SPARQL
endpoint is protocol service as defined in the SPROT
14
   specification. The SPARQL endpoint enables users to query
a knowledge base via the SPARQL language. The common
result from a SPARQL endpoint is an XML-encoded SPARQL
results. In order to access third party Visualization tools, we
generate the result sets by default in JSON format.

  C. Graphical Visualization
  One of the values of making data available is that it enables
anyone to construct nice visualizations over the data. This is
particularly useful for public sector information since it can
                                                                                    Fig. 5 Motion Chart generated through SPARQL query
provide feedback on how effective particular policies have
been.
  It is important that the generation of visual graphs should be             D. Community contributions
done on the fly as soon as result-sets are retrieved by the                  In order to provide a clear example of how the system will
SPARQL Endpoint. For this reason, a set of open source visual              be used, a set of starting articles that based on datasets and
application interfaces were evaluated and we chose to work                 generated graphs is created. An article consists of the
with Google Visualization API15.                                           representation of one or more graphs with a respective
  In order to generate graphs on the fly the following elements            description.
were needed:                                                                 The graphic representation of the data assists the audience to
    A RDF document: in order to present your data through                 draw concrete and logical conclusions on the progress or
      Google Visualization, a source of RDF document is                    regress of certain indicators that influence socio-economic
      needed.                                                              activity in a domain that the dataset covers. These articles are
      A SPARQL query endpoint: a Sparql query endpoint is                  examples that the community may follow to contribute with
      needed to process your queries and get the results                   more content, which can be generated by any visitor based on
      http://api.talis.com/stores/rdfquery-dev1/services/sparql            various self-generated graph representations.
    Google Visualization:
      http://code.google.com/apis/visualization/                           V. RELATED WORK

   Google Visualization API relies heavily on Data Table, a                   The initiatives of making government data open are gaining
table arranged in typed columns. An instance of this class is              a lot of interest recently. While more countries are embracing
the result of a request to a data source. A Data Table can be              the Open Government paradigm, among the researchers
rendered in many ways: JSON, HTML, CSV.                                    working with those data there is an increasing awareness in
In order to feed the Visualization API a similar code should be            using semantic techniques to represent them. Two of the key
generated by the application.                                              leading players that promote the openness of government data
                                                                           combined with the deployment of Semantic Web technologies
var data = new google.visualization.DataTable();                           are the US government[7] and the UK government[6].
data.addColumn('string','Country');

  14
       SPARQL Protocol for RDF http://www.w3.org/TR/rdf-sparql-protocol/
  15
       http://code.google.com/apis/charttools/index.html
7

                                1617
   In the respective portals        manifesting the outcomes of      VI. CONCLUSION
their work, there are numerous data catalogs and respective             Nowadays, there is a lot of attention to Open Data projects
metadata published, many of the datasets offered as RDF              in many countries and cities all over the world. It is a common
triples. Data.gov has made a number of its datasets available as     understanding today that Open Data is simply 'data on the
Linked Data, altogether summing up to 6.5 billion triples. The       web,' whereas Linked Data is a 'web of data'.
goal of these projects is to improve access to the data                 In this paper we address the problem of data provided in
generated by the government, increase the ability of the people
                                                                     heterogeneous formats, omitting semantics that clarify what
to find desired data, so that they can be further used in various
                                                                     the data describes. Our approach is based on a semantic
creative ways. In order to access the data, they offer tools to
                                                                     ontology providing semantic description of all the data, which
search by keywords, category and agency.
   Detailed statistics on the EU and candidate countries are         are further published in alignment with Linked Data principles.
provided by Eurostat[9], which is continuously publishing               The paper offers also an approach for the provision of
statistical data in tabular formats, but not in RDF. Unlike other    visualization services deriving from RDF result sets. This is
portals, its website provides a clear structure for organizing the   achieved by querying the RDFs with SPARQL syntax and
datasets by themes and by their respective indicators, but this      passing the result to an open-source visualization API.
structure is HTML-based and not semantically described.                 The ontology created for the ODA and the supporting tools
   Similar to these projects, our approach provides ways to          built on top of it describe a method of publishing structured
easily access government data. Unlike them, it concentrates on       data, so that it can be interlinked and become more useful to
promoting a well-defined, unified schema for the semantic            the community.
representation of data, in order to increase interoperability and
integration among the different providers. A novel aspect of
our work consists in providing a semantic representation of the
different indicators that can be measured using the datasets, as     REFERENCES
well as the topics to which they belong. This offers for any         [1]   Studer, R., Benjamins, R.V., Fensel, D.: Knowledge engineering:
specific area a complete picture, making it easier to monitor              Principles and methods. Data and Knowledge Engineering, 25, 161–197
the progress, make estimations or take a comparative                       (1998).
approach.                                                            [2]   Bizer, C., Heath, T., and Berners-Lee, T. (2009). Linked Data - The
                                                                           Story So Far. International Journal on Semantic Web and Information
From the semantic web perspective, there exist a limited                   Systems (IJSWIS), Vol. 5(3), Pages 1-22. DOI:
number of vocabularies for representing statistical data such as           10.4018/jswis.2009081901.
SCOVO [4] and the RDF Data Cube Vocabulary18. While                  [3]   Berners-Lee, T. (2006). Linked Data - Design Issues. Retrieved 2010-
SCOVO does not provide semantics about the items that the                  10-25 from http://www.w3.org/DesignIssues/LinkedData.html
                                                                     [4]   Hausenblas, M., Halb, W., Raimond, Y., Feigenbaum, L., and Ayers, D.
statistical data describe, the Data Cube vocabulary is more                (2009). SCOVO: Using Statistics on the Web of Data. Proceedings of
extended and allows modeling of complex data structures and                the 6th European Semantic Web Conference on the Semantic Web:
sets, as well as ways to link data with other sources. An                  Research and Applications (Heraklion, Crete, Greece, 2009). Pages 708-
approach that uses the Data Cube vocabulary to model                       722.
                                                                     [5]   Bizer, C., Cyganiak, R., and Heath, T. (2007). How to publish Linked
statistical data and publish them as Linked Data is the Eurostat
                                                                           Data on the Web. Retrieved 2010-10-25 from http://www4.wiwiss.fu-
wrapper[8]. Another approach uses this vocabulary to                       berlin.de/bizer/pub/LinkedDataTutorial/
semantically represent statistical data from social sciences,        [6]   J. Sheridan and J. Tennison. Linking UK government data. Linked Data
focusing on the capabilities of performing statistical                     on the Web (LDOW2010) Workshop at WWW2010, April 2010
calculations and analysis on linked data from heterogeneous          [7]   L. Ding, D. DiFranzo, A. Graves, J. R. Michaelis, X. Li, D. L.
                                                                           McGuinness, and J. Hendler. Data-gov wiki: Towards linking
and distributed datasets.                                                  government data. In AAAI Spring Symposium on Linked Data Meets
While our work encompasses more aspects such as mining of                  AI, 2010.
the statistical data, converting and publishing as linked data,      [8]   Zapilko - Enriching and Analysing Statistics with Linked Open Data
data aggregation and integration, as well as search and              [9]   Eurostat Wrapper - the experimental Eurostat wrapper to RDF
                                                                           http://estatwrap.ontologycentral.com/
visualization modules, the underlying semantic approach aims
at providing semantics of these data based on an ontology,
similar to the Data Cube vocabulary. While this vocabulary
presents a core foundation for describing data, our ontology is
designed to accommodate extended semantics of government-
oriented statistical data. Furthermore, we tend to reuse
components of various vocabularies, adhering in this way to
linked data principles.



  16
      Data.gov
  17
      Data.gov.uk
  18
     http://publishing-statistical-
     data.googlecode.com/svn/trunk/specs/src/main/html/cube.html

Contenu connexe

Tendances

Overview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceOverview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceHaklae Kim
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
Internet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam UniversityInternet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam Universitymwe400
 
Open Linked Data as Part of a Government Enterprise Architecture
Open Linked Data as Part of a Government Enterprise ArchitectureOpen Linked Data as Part of a Government Enterprise Architecture
Open Linked Data as Part of a Government Enterprise ArchitectureJohann Höchtl
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVEUDAT
 
IN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationIN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationAlexander Haffner
 
WAPWG 16 Jan Thomson holdslide
WAPWG 16 Jan Thomson holdslideWAPWG 16 Jan Thomson holdslide
WAPWG 16 Jan Thomson holdslideSara Day Thomson
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) DataAli Khalili
 
How Linked Data is transforming eGovernment
How Linked Data is transforming eGovernmentHow Linked Data is transforming eGovernment
How Linked Data is transforming eGovernmentNikos Loutas
 
Linked data introduction w exempel
Linked data introduction w exempelLinked data introduction w exempel
Linked data introduction w exempelKerstin Forsberg
 
OpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilOpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilKrista Thomas
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?Martin Donnelly
 
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...EDINA, University of Edinburgh
 

Tendances (20)

Overview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web ScienceOverview of Open Data, Linked Data and Web Science
Overview of Open Data, Linked Data and Web Science
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Open Data: a brief introduction
Open Data:  a brief introductionOpen Data:  a brief introduction
Open Data: a brief introduction
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Internet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam UniversityInternet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam University
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Open Linked Data as Part of a Government Enterprise Architecture
Open Linked Data as Part of a Government Enterprise ArchitectureOpen Linked Data as Part of a Government Enterprise Architecture
Open Linked Data as Part of a Government Enterprise Architecture
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
IN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationIN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority Collaboration
 
WAPWG 16 Jan Thomson holdslide
WAPWG 16 Jan Thomson holdslideWAPWG 16 Jan Thomson holdslide
WAPWG 16 Jan Thomson holdslide
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
How Linked Data is transforming eGovernment
How Linked Data is transforming eGovernmentHow Linked Data is transforming eGovernment
How Linked Data is transforming eGovernment
 
Linked data introduction w exempel
Linked data introduction w exempelLinked data introduction w exempel
Linked data introduction w exempel
 
Badgerlink presentation
Badgerlink presentationBadgerlink presentation
Badgerlink presentation
 
OpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry CouncilOpenCalais At The San Diego Software Industry Council
OpenCalais At The San Diego Software Industry Council
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?
 
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
Recommendation to the EU Hearing on Access to and Preservation of Scientific ...
 

En vedette

3rd weekly news
3rd weekly news3rd weekly news
3rd weekly newstisha
 
Com com(2011)0877 sv
Com com(2011)0877 svCom com(2011)0877 sv
Com com(2011)0877 svPeter Krantz
 
Öppna data - varför Chydenius vänder sig i sin grav
Öppna data - varför Chydenius vänder sig i sin gravÖppna data - varför Chydenius vänder sig i sin grav
Öppna data - varför Chydenius vänder sig i sin gravPeter Krantz
 
Poverenica Savet Za R Narodna Skupstina
Poverenica  Savet Za  R Narodna SkupstinaPoverenica  Savet Za  R Narodna Skupstina
Poverenica Savet Za R Narodna SkupstinaNebojsa Matic
 

En vedette (6)

3rd weekly news
3rd weekly news3rd weekly news
3rd weekly news
 
Towers of today
Towers of today Towers of today
Towers of today
 
Com com(2011)0877 sv
Com com(2011)0877 svCom com(2011)0877 sv
Com com(2011)0877 sv
 
Öppna data - varför Chydenius vänder sig i sin grav
Öppna data - varför Chydenius vänder sig i sin gravÖppna data - varför Chydenius vänder sig i sin grav
Öppna data - varför Chydenius vänder sig i sin grav
 
Poverenica Savet Za R Narodna Skupstina
Poverenica  Savet Za  R Narodna SkupstinaPoverenica  Savet Za  R Narodna Skupstina
Poverenica Savet Za R Narodna Skupstina
 
Fly Machine
Fly MachineFly Machine
Fly Machine
 

Similaire à Open Government Data on the Web - A Semantic Approach

Linked Open Data as Element of Public Administration Information Management
Linked Open Data as Element of Public Administration Information ManagementLinked Open Data as Element of Public Administration Information Management
Linked Open Data as Element of Public Administration Information ManagementJohann Höchtl
 
US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-2013US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-20133 Round Stones
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?Li Ding
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systemsijceronline
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Von Open Data zu Linked Open Data, M. Kaltenböck, SWC
Von Open Data zu Linked Open Data, M. Kaltenböck, SWCVon Open Data zu Linked Open Data, M. Kaltenböck, SWC
Von Open Data zu Linked Open Data, M. Kaltenböck, SWCMartin Kaltenböck
 
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...Aravind Sesagiri Raamkumar
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical OverviewDigital Reasoning
 
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
Neuroinformatics_Databses_Ontologies_Federated Database.pptxNeuroinformatics_Databses_Ontologies_Federated Database.pptx
Neuroinformatics_Databses_Ontologies_Federated Database.pptxJagannath University
 
Neuroinformatics Databases Ontologies Federated Database.pptx
Neuroinformatics Databases Ontologies Federated Database.pptxNeuroinformatics Databases Ontologies Federated Database.pptx
Neuroinformatics Databases Ontologies Federated Database.pptxJagannath University
 
Discovering Resume Information using linked data  
Discovering Resume Information using linked data  Discovering Resume Information using linked data  
Discovering Resume Information using linked data  dannyijwest
 
Dealing with Open Data in Istat
Dealing with Open Data in IstatDealing with Open Data in Istat
Dealing with Open Data in IstatGiovanni Barbieri
 
Ross Wilkinson - Data Publication: Australian and Global Policy Developments
Ross Wilkinson - Data Publication: Australian and Global Policy DevelopmentsRoss Wilkinson - Data Publication: Australian and Global Policy Developments
Ross Wilkinson - Data Publication: Australian and Global Policy DevelopmentsWiley
 
Delivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked DataDelivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked Data3 Round Stones
 
Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstractsbutest
 

Similaire à Open Government Data on the Web - A Semantic Approach (20)

Linked Open Data as Element of Public Administration Information Management
Linked Open Data as Element of Public Administration Information ManagementLinked Open Data as Element of Public Administration Information Management
Linked Open Data as Element of Public Administration Information Management
 
US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-2013US EPA OSWER Linked Data Workshop 1-Feb-2013
US EPA OSWER Linked Data Workshop 1-Feb-2013
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systems
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Von Open Data zu Linked Open Data, M. Kaltenböck, SWC
Von Open Data zu Linked Open Data, M. Kaltenböck, SWCVon Open Data zu Linked Open Data, M. Kaltenböck, SWC
Von Open Data zu Linked Open Data, M. Kaltenböck, SWC
 
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
Proposal for Designing a Linked Data Migrational Framework for Singapore Gove...
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical Overview
 
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
Neuroinformatics_Databses_Ontologies_Federated Database.pptxNeuroinformatics_Databses_Ontologies_Federated Database.pptx
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
 
Neuroinformatics Databases Ontologies Federated Database.pptx
Neuroinformatics Databases Ontologies Federated Database.pptxNeuroinformatics Databases Ontologies Federated Database.pptx
Neuroinformatics Databases Ontologies Federated Database.pptx
 
Linked data migrational framework
Linked data migrational frameworkLinked data migrational framework
Linked data migrational framework
 
Discovering Resume Information using linked data  
Discovering Resume Information using linked data  Discovering Resume Information using linked data  
Discovering Resume Information using linked data  
 
Dealing with Open Data in Istat
Dealing with Open Data in IstatDealing with Open Data in Istat
Dealing with Open Data in Istat
 
Ross Wilkinson - Data Publication: Australian and Global Policy Developments
Ross Wilkinson - Data Publication: Australian and Global Policy DevelopmentsRoss Wilkinson - Data Publication: Australian and Global Policy Developments
Ross Wilkinson - Data Publication: Australian and Global Policy Developments
 
Delivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked DataDelivering on Standards for Publishing Government Linked Data
Delivering on Standards for Publishing Government Linked Data
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...Sentimental classification analysis of polarity multi-view textual data using...
Sentimental classification analysis of polarity multi-view textual data using...
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Paper 28
Paper 28Paper 28
Paper 28
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 

Plus de Peter Krantz

Öppna data – till vad och för vem
Öppna data – till vad och för vem Öppna data – till vad och för vem
Öppna data – till vad och för vem Peter Krantz
 
Great ux is no longer optional
Great ux is no longer optionalGreat ux is no longer optional
Great ux is no longer optionalPeter Krantz
 
Intern administration och dålig användbarhet - offentliga rummet
Intern administration och dålig användbarhet - offentliga rummetIntern administration och dålig användbarhet - offentliga rummet
Intern administration och dålig användbarhet - offentliga rummetPeter Krantz
 
E-legitimation tolv möjligheter att förbattra användbarheten
E-legitimation tolv möjligheter att förbattra användbarhetenE-legitimation tolv möjligheter att förbattra användbarheten
E-legitimation tolv möjligheter att förbattra användbarhetenPeter Krantz
 
Dom angående utlämnande av handlingar
Dom angående utlämnande av handlingarDom angående utlämnande av handlingar
Dom angående utlämnande av handlingarPeter Krantz
 
Lean och systemsyn i stat och kommun
Lean och systemsyn i stat och kommunLean och systemsyn i stat och kommun
Lean och systemsyn i stat och kommunPeter Krantz
 
Öppna data - öppen förvaltning intro
Öppna data - öppen förvaltning introÖppna data - öppen förvaltning intro
Öppna data - öppen förvaltning introPeter Krantz
 
REVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTS
REVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTSREVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTS
REVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTSPeter Krantz
 
Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...
Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...
Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...Peter Krantz
 
ITRE - DRAFT AGENDA
ITRE - DRAFT AGENDAITRE - DRAFT AGENDA
ITRE - DRAFT AGENDAPeter Krantz
 
Öppen data status (Codemocracy)
Öppen data status (Codemocracy)Öppen data status (Codemocracy)
Öppen data status (Codemocracy)Peter Krantz
 

Plus de Peter Krantz (12)

Öppna data – till vad och för vem
Öppna data – till vad och för vem Öppna data – till vad och för vem
Öppna data – till vad och för vem
 
Great ux is no longer optional
Great ux is no longer optionalGreat ux is no longer optional
Great ux is no longer optional
 
Intern administration och dålig användbarhet - offentliga rummet
Intern administration och dålig användbarhet - offentliga rummetIntern administration och dålig användbarhet - offentliga rummet
Intern administration och dålig användbarhet - offentliga rummet
 
NORDLOD
NORDLODNORDLOD
NORDLOD
 
E-legitimation tolv möjligheter att förbattra användbarheten
E-legitimation tolv möjligheter att förbattra användbarhetenE-legitimation tolv möjligheter att förbattra användbarheten
E-legitimation tolv möjligheter att förbattra användbarheten
 
Dom angående utlämnande av handlingar
Dom angående utlämnande av handlingarDom angående utlämnande av handlingar
Dom angående utlämnande av handlingar
 
Lean och systemsyn i stat och kommun
Lean och systemsyn i stat och kommunLean och systemsyn i stat och kommun
Lean och systemsyn i stat och kommun
 
Öppna data - öppen förvaltning intro
Öppna data - öppen förvaltning introÖppna data - öppen förvaltning intro
Öppna data - öppen förvaltning intro
 
REVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTS
REVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTSREVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTS
REVIEW OF RECENT STUDIES ON PSI RE-USE AND RELATED MARKET DEVELOPMENTS
 
Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...
Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...
Ändring av direktiv 2003/98/EG om vidareutnyttjande av information från den ...
 
ITRE - DRAFT AGENDA
ITRE - DRAFT AGENDAITRE - DRAFT AGENDA
ITRE - DRAFT AGENDA
 
Öppen data status (Codemocracy)
Öppen data status (Codemocracy)Öppen data status (Codemocracy)
Öppen data status (Codemocracy)
 

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Open Government Data on the Web - A Semantic Approach

  • 1. 1 Open Governmental Data on the Web: A Semantic Approach Julia Hoxha Institute of Applied Informatics and Formal Description Methods (AIFB) Karlsruhe Institute of Technology, Germany Julia.Hoxha@kit.edu Armand Brahaj FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany Armand.Brahaj@fiz-karlsruhe.de to knowledge representation. Abstract— Initiatives of making governmental data open are From a technical perspective, we refer as Open Data to any continuously gaining interest recently. While this presents sets of data which can be reused with no restrictions by any immense benefits for increasing transparency, the problem is that form of licensing or patents, data that are well structured and the data are frequently offered in heterogeneous formats, missing clear semantics that clarify what the data describes. The data are can be easily accessed and reused by institutions, scientist or displayed in ways that are not always clearly understandable to a the web community. broad range of user communities that need to make informed Being driven by benefits of the aforementioned paradigm, decisions. we aim to increase openness of data offered by different We address these problems and propose an overall approach, governmental institutions, and at the same time, provide means in which raw statistical data independently gathered from the of representing these data in more understandable ways. There different government institutions are formally and semantically are several challenges underlying our work, starting from the represented, based on an ontology that we present in this paper. way that data is originally provided by the sources. Data are We further introduce the approach we deploy in publishing these data in alignment with Linked Data principles, and present the mostly offered in heterogeneous formats, missing clear methods implemented to query single or combined dataset and semantics that clarify what the data describes, and displayed in visualize the results in understandable ways. The introduced ways that is not clearly understandable to a broad range user approach enables data integration, leading to vast opportunities communities that need to make informed decisions. These for information exchange, analysis on combined datasets, kinds of representation impede the integration of data coming simplicity to create mashups, and exploration of innovative ways from different sources, consequently leading to limited to use these data creatively. opportunities for information exchange, analysis on combined datasets, simplicity to create mashups, and exploration of innovative ways to use these data creatively. Keywords— linked open data, open governmental data, From a technical perspective, our work consists in applying ontology, semantic web, statistical metadata Semantic Web technologies to enable data integration among the different organizations and establish links to interconnect data together on the Web. Since a crucial issue when I. INTRODUCTION semantically modeling data is the preservation of their I NITIATIVESof making governmental data open are gaining great interest recently, with more countries constantly embracing the open data paradigm. Open Data is a new form meaning, we present an approach, which deploys Linked Open Data [linkedata.org] principles and introduces an ontology as foundation for the semantic representation of government of representation of the traditional public information, which statistical data. On the one hand, the ontology offers a formal, has been the basis of decision making in political, economical explicit descriptions of knowledge enabling in this way a or sociological activities for centuries. platform for integration, where data can be still interpreted The term open has been introduced in the late 2000th to correctly for valid research outcomes. On the other hand, generalize the accessibility of the information that can be Linked Data provides a technical basis for publishing, sharing public and easily accessible by anyone. The concept of and linking data on the web, based on standardized formats information is also generalized to the concept of data, by and interfaces. referring to raw sets of valuable information which can be In this paper, we present the ontology that enables formal, structured, analyzed and presented in different forms leading semantic representation of government statistical data. We further introduce the approach we deploy in publishing them
  • 2. 2 as linked data, and the methods implemented to query single or alignment with Linked Data principles, which we present in combined dataset and visualize the results. The semantic Section 3. approach and the ontology are described in details in Section Another important component in our approach is the Query 2, whereas Section 3 gives an overview of Linked Open Data Interface, which enables the user community as well as the and our work in alignment to those principles. Section 4 source institutions that offer these statistical data to access the presents the technical implementation of our approach knowledge base and pose queries upon it. This component focusing on the issues of integration and processing of data consists of an online graphical interface as well as a SPARQL2 from different sources and querying and visualization Endpoint, which we both present in Section 4. The results of a techniques for representing multiple datasets in a graph. We query may be displayed as structured Excel and RDF files to give an overview of related work in the field of open data and the users, but they are additionally visualized in ways that statistical vocabularies in Section 5, concluding in Section 6 make these data much more tangible and understandable. This with a summary of our work. approach provides open, structured and linked data to the different institutions, which are then able to use each-others datasets very easily. II. SEMANTIC APPROACH In our work, we are dealing with a large set of sources from A. ODA Ontology which statistical data is gathered. These sources are mainly The need to provide a unified representation for the different, websites of different government institutions, which offer data heterogeneous data gathered from the various sources, as well online in unstructured or semi-structured formats such as text as the necessity to provide formal semantics to these data, have documents, excel files or XML files. There are very few greatly motivated us to choose a semantic approach using an sources that can provide data structured in Entity-Relationship ontology, referred as ODA3 ontology, which we make model. available online4. While modeling the ontology, our goal was Our goal, after gathering the raw data independently from to follow linked data principles and reuse as much as possible the different sources, is to convert them to a representation existing well-known ontologies, but at the same time extending based on a unified structure and provide semantics for each of it with other important entities. the data entries. For this purpose, we have chosen to An ontology is defined as a “a formal, explicit specification semantically represent the data based on an ontology that we of a shared conceptualization” [1]. It provides a common have built, and then further serialize data as RDF triples understanding of the domain knowledge, increasing following Linked Data principles. In this section, we present communication capabilities between people and heterogeneous the modeled ontology as well as introduce our approach in applications. Using an ontology, we are able to formally publishing Linked Open Data. describe the semantics of terms and items of statistical datasets and give explicit meaning to the provided information. Besides sharing a common understanding, another direct benefit of a formal semantics is the support for machine-readability, which then enables automated reasoning, information integration and application of intelligent approaches such as semantic search. The objective of the modeled ontology is to provide a schema (also referred to as vocabulary) for the semantic representation of statistical data, which, on one side, is understandable and simple to be used from different institutions or organizations that offer statistical datasets, and on the other side, is consistent and complete enough to model all the aspects and entries in these datasets. A dataset is created as a container for the data entries found in a particular statistical file. Each dataset contains information about the date it was created, its creator and publisher, and a title. The dataset contains data entries, which represent the central component modeled in our ontology. A data entry has a fixed value and, among other possible dimensions, is measured for a particular year and a specific country. In a dataset, the data entries serve to measure particular Fig. 1. Overall semantic approach for publishing open governmental data indicators that monitor for example the socio-economic development of a country, demographic changes, public In our approach, the knowledge base consists of the health, developments in education, and many other aspects. Ontology Model and a storage for RDF files. The raw statistical data are converted to RDF1 using a converter in 2 http://www.w3.org/TR/rdf-sparql-query/ 3 ODA stands for Open Data Albania 1 4 Resource Description Framework, http://www.w3.org/RDF/ http://open.data.al/oda.owl
  • 3. 3 These numerous indicators, also organized among each-other URI: http://open.data.al/oda#Dataset in groups, are additionally assigned particular topics. Each Superclass: skos:Concept indicator is used to estimate and monitor the progress done in a particular aspect. Besides, a more complete view is provided when looking at the progress of all indicators within a topic. Class: oda:DataEntry DataEntry- a data entry dataset, represents a single piece of All these concepts are modeled in the ontology, which is data illustrated in Figure 2. It is encoded using the sublanguage URI: http://open.data.al/oda#DataEntry OWL DL of the Web Ontology Language (OWL)5, which goes Superclass: event:Event beyond other standards in its power to represent machine interpretable content on the Web. Moreover, OWL DL itself offers high expressiveness. Class: oda:Indicator Indicator - a statistical indicator that is being measured by the entries in the dataset URI: http://open.data.al/oda#Indicator Class: oda:Topic Topic - a theme to which the indicator and dataset belong URI: http://open.data.al/oda#Topic Class: oda:Dimension Dimension - a dimension of a statistical data entry URI: http://open.data.al/oda#Dimension Class: oda:Year Year- time dimension, represents the year to which a statistical data entry belongs URI: http://open.data.al/oda#Year Superclass: oda:Dimension Class: oda:Country Country- spatial dimension, represents the country to which a statistical data entry belongs URI: http://open.data.al/oda#Country Superclass: oda:Dimension Properties. The relations among the different individuals of the ontology classes are defined by using the properties construct, distinguishing between two main types: Object properties and Datatype properties. The Object properties link individuals to individuals, whereas the Datatype properties link individuals to data values. These are a few of the main Object properties modeled in our ontology: Property: oda:dataset Fig. 2. ODA Ontology dataset- relates a statistical data entry to a dataset Domain: oda:DataEntry Range: oda:Dataset We have modeled the ontology using Protégé6, a platform that offers support for ontology creation, visualization and Property: oda:indicator manipulation. We explain below the main entities that indicator- relates a statistical data entry to an indicator compose the modeled ODA ontology. Domain: oda:DataEntry Classes. The concepts and terms of the domain of Range: oda:Indicator knowledge are represented in an ontology as classes. Our ontology consists of the following main classes: Property: oda:subindicator Class: oda:Dataset subindicator- relates a parent indicator to other indicators Dataset - a statistical dataset, represents the container of a classified as its subcomponents (serves for grouping set of data indicators) Domain: oda:Indicator 5 http://www.w3.org/TR/owl-guide/ Range: oda:Indicator 6 http://protege.stanford.edu/
  • 4. 4 Property: oda:dimension period indicator- relates a statistical data entry to an indicator Ontology: Statistical Data and Metadata Exchange (SDMX) Domain: oda:DataEntry vocabulary10 Range: oda:Dimension Property: dc:title, dc:publisher, dc:creator, dc:date We have also defined a set of Datatype properties e.g. Ontology: Dublin Core Metadata Element Set11 oda:indicatorname or oda:topicname properties, which assign a literal value (in this case string) to the individuals of III. LINKING OPEN GOVERNMENTAL DATA the class Indicator and Topic, respectively. Making data freely available for everyone on the Web is further empowered when there are well-formed links Individuals. The classes of an ontology are instantiated, established among the items of the different data sources. specifying the concrete objects or individuals that belong to a Making data open and linked is the goal of Linked Open Data particular class. We have created a large number of topic paradigm [2], which promotes the ways of providing data that instances such as Social Development, Economic is freely accessible, verifiably, and connected to other data Development, Social Inclusion, Public Health, Education, etc. sources. We have related these topics to indicators, such that a topic There are four basic principles expected to be fulfilled when may have multiple indicators assigned to it. In order to aiming to publish linked data, in order to make these data formalize the topics consistently, we have also consulted the interconnected [3]: way statistical data is being organized into particular themes  Use Uniform Resource Identifiers (URIs) as names to from Eurostat7, the statistical office of the European Union. identify things The indicators are modeled after a thorough analysis of the  Use HTTP URIs so that people can refer to and look statistical data offered by the different governmental up (“dereference”) those names institution. They are organized in groups using the  When someone looks up a URI, provide useful oda:subindicator relation, forming this way a hierarchy of information using standards such as RDF, SPARQL parent-child indicators.  Include links to other URIs, so that more things and We have currently created in our ontology instances of more related information can be discovered on the Web than 100 indicators. Just to give an insight on what do these indicators consist, we mention below a few of them belonging From a technical perspective [5], the objective is to use to the topic of Economic Development: common standards and techniques to extend the Web by  GDP Indicators: GDP nominal per capita, GDP nominal publishing data as RDF (graph data format for representing growth rate, GDP per capita in PPS (for EU states) information on the Web), creating well-formatted RDF links  Budget Indicators: Expenditures (Total Expenditures, between the data items, and performing search on the data via Capital Expenditures, Current Expenditures, standardized languages such as SPARQL query language for Domestic/Foreign Financing Expenditures), Revenue, RDF. Deficit (Deficit Growth Rate, Deficit Financing), etc. Recognizing the many benefits of Linked Open Data, we  Inflation Indicators: monthly Inflation, yearly Inflation have also followed an approach that is in alignment with these rate. principles, illustrating it with the following example in Figure 3. The figure shows an excerpt of the RDF/XML Aiming to rigorously follow the Linked Data principles, representation of the information contained in a dataset on the explained in more details in Section 3, one of our goals while yearly inflation rates of different countries in Europe. modeling the ontology was to reuse existing vocabularies/ontologies. The following entities are used from <rdf:Description the ontologies that we have imported: rdf:about="http://open.data.al/yearlyinflationrate#dataset"> Class: event:Event <rdf:type rdf:resource="http://open.data.al/oda#Dataset"/> Event – An arbitrary classification of a space/time region, <dc:title>Inflation Rate per year and country</dc:title> <dc:publisher>International Monetary Fund (IMF)</dc:publisher> by a cognitive agent. An event may have actively <dc:creator>Open Data Albania (ODA)</dc:creator> participating agents, passive factors, products, and a <dc:date>2011-03-22T10:56:40+0000</dc:date> location in space/time. </rdf:Description> Ontology: The Event Ontology8 <rdf:Description rdf:nodeID="A0"> <oda:dataset Class: skos:Concept rdf:resource="/yearlyinflationrate#dataset"/> Ontology: Simple Knowledge Organization System <oda:indicator (SKOS)9 rdf:resource="http://open.data.al/oda#YearlyInflationRate"/> <oda:country rdf:resource="http://dbpedia.org/resource/Greece"/> Property: sdmx-measure:obsValue <oda:year rdf:resource="http://dbpedia.org/resource/2002"/> obsValue- The value of a particular variable at a particular <sdmx-measure:obsValue rdf:datatype="http://www.w3.org/2001/XMLSchema#double"> 7 http://epp.eurostat.ec.europa.eu/ 8 10 http://motools.sourceforge.net/event/event.html http://www.sdmx.org/ 9 11 http://www.w3.org/2004/02/skos/ http://dublincore.org/documents/dces/
  • 5. 5 3.498 </sdmx-measure:obsValue> of records and examples of visual representation of datasets <rdf:type rdf:resource="http://open.data.al/oda#DataEntry"/> generated by visitors or editors of the website. </rdf:Description> A visualization service could be delivered through the site and could include analytics, graphics, charting, and other ways Fig. 3. Excerpt of a Dataset in RDF/XML Representation of using the data. The enhanced visualizations is built on top of published APIs in collaboration with third party open Based on the aforementioned principles, well-formed URI source applications (Fig. 4). references have been used in our approach to identify resources such as dataset, indicator, country, year, etc. We have invested on a technical infrastructure to make the URIs dereferenceable, making it able to retrieve further an RDF/XML description and various classifications of the identified resource when accessing its URI. RDF links to resources from other data sources are established, enabling visitors and user agents to navigate the Web of data. Examples include direct links to DBpedia12 for resources such as country and year, or references via the rdfs:seeAlso property for the Indicators, in the case when the link exists, e.g. the indicator “GDP Nominal per Capita” would have a link to the respective resource13 in DBpedia. Reusability of existing vocabularies is also a requirement that tends to facilitate data processing from different client applications. Accordingly, terms from well-known ontologies Fig. 4. Data layers and tools implemented have been reused, but we have certainly complemented these vocabularies with additional concepts for representing the Our work consists of two stages. The first stage is the statistical data. Particularly important to mention in our aggregation of the data and the processing of linked data approach are the concept and taxonomy of indicators as well structures. The second stage consists on providing a set of as topics, which are additionally modeled in the ontology and tools that can make use of these data. As an example of the defined in our own namespace. Common constructs for technology that can be used, a set of visualization providing mappings to other ontologies are rdfs:subClassOf representation are generated. The visual representations are or rdfs:subPropertyOf. As already shown in the presented published together with a comment providing some additional ontology above, we have modeled a set of terms as subclasses information on the dataset retrieved. The operations from the of concepts (same as for properties) belonging to other aggregation of data up to the publishing of examples goes vocabularies. through the following main steps. The linked data on the web is primarily intended to be processed by machines, but there is an increasing interest from the human visitors to access and navigate through the data. A. Aggregation of the information Therefore, we have aimed towards an approach that is Data is aggregated from periodical governmental agency convenient for both parts. On the one hand, human readability reports. The format of these reports is most of the time of the data is facilitated via provision of textual information provided in Pdf files and sometimes in spreadsheet format. All (possibly in different languages) in forms of comments and the information is stored in an offline repository for original labels, using respectively the properties rdfs:comment and archiving purposes. rdfs:label. On the other hand, our objective is to ease machine B. Data Processing interpretability, stating important information explicitly, such as specification of domain, range and cardinality in our The information gathered in the aggregation step is semi- ontology. We provide information in such a way that avoids automatically processed to a well-organized Linked Data inconsistencies in data representation, but at the same time structure. In order to respond to all the ontologies on the allows flexibility of the schema to be reused by other parties system, a set of spreadsheet templates is generated. The for data modeling. information is imported into the spreadsheets and then automatically transferred into RDF structure through a tool IV. TECHNICAL IMPLEMENTATION referred to as ODA Import to RDF. The RDF datasets are stored in a CKAN repository, which We aim to provide access to all the citizens, technically is made public and can be accessed via the CKAN web inclined or not, to discover structured data, and reuse them. To interface and CKAN API. serve up these data sets, the ODA website accesses a catalog Beside the interface of CKAN, the datasets are also stored in the ODA repository, which is directly connected to an API. 12 http://dbpedia.org/ The Open Data API is a RESTful, service-oriented platform 13 http://dbpedia.org/property/gdpNominalPerCapita
  • 6. 6 that allows developers to easily access datasets and create data.addColumn('number','Year'); data.addColumn('number','Value'); independent services through these calls. REST uses the HTTP data.addRows([ protocol and, as such, requests use the common URL format. ["Albania",2010,25], ["Albania",2011,25], The API provides simple methods that developers can use to ["Croatia",2010,37], tap into the functionality and rich datasets, and gather ["Croatia",2011,37], information, in JSON or XML format, related to different ["Greece",2010,39], ["Greece",2011,42] indicators and topics. ]); All requests on the ODA API should consist of a host name ("http://api.open.data.al "), a path that contains an object and The result provided by the SPARQL Endpoint is modified an action (e.g. "/indicator/info" or "/indicator/list"), and a by a php script to generate the Data Table required by the query string that contains the method parameters (e.g. Visualisation API (See Fig 5). "apiKey", "term", "indicatorId", etc.). The URL template of a request looks like: http://api.open.data.al/object/action?apiKey =myKey&param1=value1&param2=value2 Another way of accessing the information from the RDF datasets is through a SPARQL endpoint. The SPARQL endpoint is protocol service as defined in the SPROT 14 specification. The SPARQL endpoint enables users to query a knowledge base via the SPARQL language. The common result from a SPARQL endpoint is an XML-encoded SPARQL results. In order to access third party Visualization tools, we generate the result sets by default in JSON format. C. Graphical Visualization One of the values of making data available is that it enables anyone to construct nice visualizations over the data. This is particularly useful for public sector information since it can Fig. 5 Motion Chart generated through SPARQL query provide feedback on how effective particular policies have been. It is important that the generation of visual graphs should be D. Community contributions done on the fly as soon as result-sets are retrieved by the In order to provide a clear example of how the system will SPARQL Endpoint. For this reason, a set of open source visual be used, a set of starting articles that based on datasets and application interfaces were evaluated and we chose to work generated graphs is created. An article consists of the with Google Visualization API15. representation of one or more graphs with a respective In order to generate graphs on the fly the following elements description. were needed: The graphic representation of the data assists the audience to  A RDF document: in order to present your data through draw concrete and logical conclusions on the progress or Google Visualization, a source of RDF document is regress of certain indicators that influence socio-economic needed. activity in a domain that the dataset covers. These articles are A SPARQL query endpoint: a Sparql query endpoint is examples that the community may follow to contribute with needed to process your queries and get the results more content, which can be generated by any visitor based on http://api.talis.com/stores/rdfquery-dev1/services/sparql various self-generated graph representations.  Google Visualization: http://code.google.com/apis/visualization/ V. RELATED WORK Google Visualization API relies heavily on Data Table, a The initiatives of making government data open are gaining table arranged in typed columns. An instance of this class is a lot of interest recently. While more countries are embracing the result of a request to a data source. A Data Table can be the Open Government paradigm, among the researchers rendered in many ways: JSON, HTML, CSV. working with those data there is an increasing awareness in In order to feed the Visualization API a similar code should be using semantic techniques to represent them. Two of the key generated by the application. leading players that promote the openness of government data combined with the deployment of Semantic Web technologies var data = new google.visualization.DataTable(); are the US government[7] and the UK government[6]. data.addColumn('string','Country'); 14 SPARQL Protocol for RDF http://www.w3.org/TR/rdf-sparql-protocol/ 15 http://code.google.com/apis/charttools/index.html
  • 7. 7 1617 In the respective portals manifesting the outcomes of VI. CONCLUSION their work, there are numerous data catalogs and respective Nowadays, there is a lot of attention to Open Data projects metadata published, many of the datasets offered as RDF in many countries and cities all over the world. It is a common triples. Data.gov has made a number of its datasets available as understanding today that Open Data is simply 'data on the Linked Data, altogether summing up to 6.5 billion triples. The web,' whereas Linked Data is a 'web of data'. goal of these projects is to improve access to the data In this paper we address the problem of data provided in generated by the government, increase the ability of the people heterogeneous formats, omitting semantics that clarify what to find desired data, so that they can be further used in various the data describes. Our approach is based on a semantic creative ways. In order to access the data, they offer tools to ontology providing semantic description of all the data, which search by keywords, category and agency. Detailed statistics on the EU and candidate countries are are further published in alignment with Linked Data principles. provided by Eurostat[9], which is continuously publishing The paper offers also an approach for the provision of statistical data in tabular formats, but not in RDF. Unlike other visualization services deriving from RDF result sets. This is portals, its website provides a clear structure for organizing the achieved by querying the RDFs with SPARQL syntax and datasets by themes and by their respective indicators, but this passing the result to an open-source visualization API. structure is HTML-based and not semantically described. The ontology created for the ODA and the supporting tools Similar to these projects, our approach provides ways to built on top of it describe a method of publishing structured easily access government data. Unlike them, it concentrates on data, so that it can be interlinked and become more useful to promoting a well-defined, unified schema for the semantic the community. representation of data, in order to increase interoperability and integration among the different providers. A novel aspect of our work consists in providing a semantic representation of the different indicators that can be measured using the datasets, as REFERENCES well as the topics to which they belong. This offers for any [1] Studer, R., Benjamins, R.V., Fensel, D.: Knowledge engineering: specific area a complete picture, making it easier to monitor Principles and methods. Data and Knowledge Engineering, 25, 161–197 the progress, make estimations or take a comparative (1998). approach. [2] Bizer, C., Heath, T., and Berners-Lee, T. (2009). Linked Data - The Story So Far. International Journal on Semantic Web and Information From the semantic web perspective, there exist a limited Systems (IJSWIS), Vol. 5(3), Pages 1-22. DOI: number of vocabularies for representing statistical data such as 10.4018/jswis.2009081901. SCOVO [4] and the RDF Data Cube Vocabulary18. While [3] Berners-Lee, T. (2006). Linked Data - Design Issues. Retrieved 2010- SCOVO does not provide semantics about the items that the 10-25 from http://www.w3.org/DesignIssues/LinkedData.html [4] Hausenblas, M., Halb, W., Raimond, Y., Feigenbaum, L., and Ayers, D. statistical data describe, the Data Cube vocabulary is more (2009). SCOVO: Using Statistics on the Web of Data. Proceedings of extended and allows modeling of complex data structures and the 6th European Semantic Web Conference on the Semantic Web: sets, as well as ways to link data with other sources. An Research and Applications (Heraklion, Crete, Greece, 2009). Pages 708- approach that uses the Data Cube vocabulary to model 722. [5] Bizer, C., Cyganiak, R., and Heath, T. (2007). How to publish Linked statistical data and publish them as Linked Data is the Eurostat Data on the Web. Retrieved 2010-10-25 from http://www4.wiwiss.fu- wrapper[8]. Another approach uses this vocabulary to berlin.de/bizer/pub/LinkedDataTutorial/ semantically represent statistical data from social sciences, [6] J. Sheridan and J. Tennison. Linking UK government data. Linked Data focusing on the capabilities of performing statistical on the Web (LDOW2010) Workshop at WWW2010, April 2010 calculations and analysis on linked data from heterogeneous [7] L. Ding, D. DiFranzo, A. Graves, J. R. Michaelis, X. Li, D. L. McGuinness, and J. Hendler. Data-gov wiki: Towards linking and distributed datasets. government data. In AAAI Spring Symposium on Linked Data Meets While our work encompasses more aspects such as mining of AI, 2010. the statistical data, converting and publishing as linked data, [8] Zapilko - Enriching and Analysing Statistics with Linked Open Data data aggregation and integration, as well as search and [9] Eurostat Wrapper - the experimental Eurostat wrapper to RDF http://estatwrap.ontologycentral.com/ visualization modules, the underlying semantic approach aims at providing semantics of these data based on an ontology, similar to the Data Cube vocabulary. While this vocabulary presents a core foundation for describing data, our ontology is designed to accommodate extended semantics of government- oriented statistical data. Furthermore, we tend to reuse components of various vocabularies, adhering in this way to linked data principles. 16 Data.gov 17 Data.gov.uk 18 http://publishing-statistical- data.googlecode.com/svn/trunk/specs/src/main/html/cube.html