SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
Open Source Tools for Creating Mashups with
           Government Datasets

        Mohammed Firdaus, Muhd Sharuzzamal Bakri


                                June 29, 2010




Mohammed Firdaus, Muhd Sharuzzamal Bakri   Open Source Tools for Creating Mashups with Government Datas
Introduction   About the Speakers

About the Speakers




     Mohammed Firdaus bin Mohammed Ab Halim
     (@firdaus halim) and Muhd Sharuzzamal Bakri (@amai)
     Founders of Persada Terbilang Sdn Bhd - We have no
     relationship whatsoever to any fertilizer supplier




     Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Introduction   What are Mashups?

Mashups



     A mashup is a web page or application that uses and
     combines data, presentation or functionality from two or
     more sources to create new services.
     (Source: Wikipedia)

     Data mashups combine similar types of media and
     information from multiple sources into a single
     representation.
     (Source: Wikipedia)




    Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Challenges   Data Sets are Not Available in Machine Readable Form

Data Sets are Not Available in Machine Readable Form




     Nothing useful here:
            filetype:csv site:.gov.my
            filetype:xml site:.gov.my
            filetype:rdf site:.gov.my
     We have to resort to web scraping.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Challenges   No Data Dictionaries

No Data Dictionaries



      Since the data sets that are available were meant for humans
      to consume rather machines they are usually published
      without any type of data dictionary.
      This means that an application developer will have to make
      assumptions about the structure of each field e.g. whether it’s
      unique, whether it’s a multi-value field, which fields are
      mandatory/option.
      These assumptions may or may not turn out be correct as you
      see more and more data in the data set.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Challenges   New Data Sets Constantly Become Available

New Data Sets Constantly Become Available




     This is a not a bad thing.
     However, our code, database and schema must be flexible
     enough to deal with future data sets that we might want to
     use in our applications.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Challenges   Lack of Standards Across Agencies

Lack of Standards Across Agencies




     Different identifiers for referring to the same entity.
     The lack of common identifiers makes it tedious to combine
     data sets together which maybe describing the same entity.
     MyCoID and MyID are steps in the right direction.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Challenges   Summary

In Summary




     Because of these challenges, we need an agile method for
     modeling, storing and processing these government datasets in
     our application.
     The purpose of this presentation is to show how representing
     your data as a graph both help you deal with these challenges
     and at the same time help make compelling data mashups.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Graphs   Introduction to Graphs

What is a Graph?




     A data structure that consists of a collection of vertices and
     the connections between those vertices, called edges.
     Vertices are sometimes called nodes or dots.
     Edges are sometimes called relationships or edges.
     The terminology differs between software packages.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Types of Graphs


     A directed graph (or digraph) is one where the edges have a
     direction (i.e. there’s an outgoing and incoming vertex).
     A multigraph is one where multiple edges can exist between
     two vertices.
     An edge-labeled graph is a graph where edges have labels.
     Similarly, a vertex-labeled graph is one in which the vertices
     have labels.
     An attributed graph is one in which the vertices and edges can
     have attributes (key-value pairs).
     A graph can have more than one of these properties e.g. a
     multi digraph is one which multiple directed edges can exist
     between two vertices.


     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Types of Graphs - Simple/Undirected Graphs




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Types of Graphs - Directed Graph




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Types of Graphs - Edge and Node Labeled Graph




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Types of Graphs - Multigraph




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Types of Graphs - Attributed Multigraph




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Types of Graphs

Examples - Social Graphs




                     Source: http://www.flickr.com/photos/greenem/11696663/


     Undirected Graph - Vertices represent people and edges
     represents friendship.
     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs    Types of Graphs

Examples - Web Graph




                 http://en.wikipedia.org/wiki/File:WorldWideWebAroundWikipedia.png


     Multi-digraph - Vertices represent web pages and directed
     edges represent links between pages.
     Mohammed Firdaus, Muhd Sharuzzamal Bakri       Open Source Tools for Creating Mashups with Government Datas
Graphs   Property Graphs

Property Graphs



     ’Property graph’ is another term for attributed labeled
     multi-digraph.
     Property graphs are flexible enough to support most types of
     graph data. Other types of graphs (with the exception of
     hypergraphs) can be built on top of property graphs by
     removing features or using features of the property graph in
     certain ways.
     The tools that we are covering in this presentation deal
     primarily with property graphs.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graphs   Property Graphs

Property Graphs




         Source: http://wiki.github.com/tinkerpop/gremlin/defining-a-property-graph



     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Treasury Procurement Data

Treasury - Tenders Awarded




       Source: http://myprocurement.treasury.gov.my/index.php/en/list-keputusan-tender



     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Treasury Procurement Data

Fields


         Tajuk Tender (Title of Tender)
         Nombor Tender (Tendor Number)
         Kategori Perolehan (Procurement Category)
         Kementerian (Ministry)
         Petender Berjaya (Winner of Tender)
         No Pendaftaran Dengan ROB/ROS/ROC (Registration
         Number with ROB/ROS/ROC)
         No Pendaftaran Dengan MOF/PKK (Registration Number
         with MOF/PKK)
         Harga Setuju Terima (Agreed Upon Value)



         Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Treasury Procurement Data

Code and Data in Machine Readable Form




     For this presentation we are using data that we scraped form
     this site on 2010-04-26
     The source code for our scraper and the CSV dump from
     2010-04-26 is available at
     http://mfirdaus.com/mosc-paper/
     The dump contains 2615 records.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Treasury Procurement Data

The Dump




    Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Issues with this Data Sets

Missing Fields




  Out of the 2615 records in the dump
      510 records were missing a tender number
      472 records were missing a category
      1836 records were missing a ROB/ROS/ROC number
      510 records were missing a MOF no




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Issues with this Data Sets

Tender Numbers are Not Unique


     32 records have the same tender number and title as another
     record
     23 records have the same tender number as another record
     In some cases these appear to be duplicate records since the
     fields all match up.
     In other cases, one or two fields are slightly different
     indicating that there was a probably a typo (erroneous record
     was not deleted).
     In some cases, the other fields are completely different which
     leads us to think that it’s possible for there to be multiple
     winners of a tender (need some government officials to verify
     this for us).


     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Issues with this Data Sets

Format of Tender Numbers



  Examples of tender numbers:
      8/2009
      PL.(T).08.2009(JKP)
      X0141110101090021
      128/2009
      KBS.S.4-14/69 (T.26/2009)
  Probably not a good idea to write code that attempts to parse the
  tender number.




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Issues with this Data Sets

Format of the ”Petender Berjaya” Field


      SYARIKAT PROSPECTRUM SDN BHD
      TELEKOM SMART SCHOOL SDN BHD NO.45-8, LEVEL 3,
      BLOCK C, PLAZA DAMANSARA, JALAN MEDAN SETIA
      1, BUKIT DAMANSARA 50490 KUALA LUMPUR
      1. GLOBAL AEROSPACE SDN BHD (A002) 2. SYSTEM
      ALLIANCE TECHNOLOGY SDN. BHD.(A003) 3. KARISMA
      WIRA SDN. BHD. (A004) 4. KESUMA TECHNOLOGY
      SDN. BHD (A005)
      A QUALITY REPUTATION SDN BHD B PRIMABUMI SDN
      BHD




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Modeling

Modeling this Data Set as a Property Graph

  One way to model this data as a graph is to:
      Vertices to represent tenders, ministries and
      companies/businesses.
      An ”awarded by” labeled edge to associate a tender with a
      ministry.
      An ”awarded to” labeled edge to associate a tender with the
      winner of the tender (the company/business).
      Attributes on tender vertices for the tender title, number,
      value, category
      Attributes on company/business vertices for the
      company/business name, ROB/ROC/ROS registration
      number and MOF registration number.
      Attributes on ministry vertices from the name of the ministry.

      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Data Sets   Modeling

Example




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Neo4j - Introduction

Neo4j


        Neo4j is a graph database. Persists data in graph form.
        Property graph data model with the exception of vertex labels.
        In Neo4j terms, vertices are nodes, edges are relationships and
        attributes are properties.
        Property values can be a String or any Java primitive (arrays
        of these types are supported as well).
        Licensed under the AGPLv3. Which basically means that you
        don’t need a license if your application is released under a
        compatible free software license.
        For other uses, you need a commercial license from them.



        Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Neo4j - Introduction

Neo4j



        Written in Java.
        Bindings available for Python, Ruby, Clojure, Erlang, Groovy,
        Scalan and PHP.
        We will be using the Python bindings in this talk.
        An embedded database, meaning that it runs in the same
        process space as the application.
        There’s a standalone REST server for those who prefer it.




        Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Inserting into Neo4j

Initializing the Database




  import neo4j

  db = neo4j.GraphDatabase("db")




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Inserting into Neo4j

Creating the Nodes




  ministry node = db.node(name=ministry, type="ministry")

  entity node = db.node(name=entity name, no=entity no,
         mof no=entity mof no, type="business entity")

  tender node = db.node(no=tender no, title=tender title,
        category=tender category, value=tender value,
  type="tender")




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Inserting into Neo4j

Creating the Relationships




  tender node.awarded by(ministry node)
  tender node.awarded to(entity node)




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Inserting into Neo4j

Indexing Nodes


  ministries = db.index("ministries", create=True)
  business entities = db.index("business entities",
  create=True)
  tenders by no = db.index("tenders by no", create=True)
  tenders by title = db.index("tenders by title", create=True)


  tenders by no[tender no] = tender node
  tenders by title[tender title] = tender node




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Databases and Neo4j   Inserting into Neo4j

The Result




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Traversals

Traversing the Graph




       Traversing is the process of walking around the graph.




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Traversals

Graph Traversal Options




     Graph Traversal Framework
     Gremlin
     SPARQL
     Manual traversal




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Traversals

Problem




  Lets use graph traversal to find all the companies who have been
  awarded contracts by Kementerian Kesihatan.




      Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Traversals

Graph Around Kementerian Kesihatan




     Mohammed Firdaus, Muhd Sharuzzamal Bakri     Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversal Framework

Defining the Traversal

  # Companies who have gotten contracts from a particular ministry
  # The start node is a ministry
  class Contractors(neo4j.Traversal):
     types = [neo4j.Incoming.awarded by,
            neo4j.Outgoing.awarded to]

     order = neo4j.DEPTH FIRST
     stop = neo4j.STOP AT END OF GRAPH

     def isReturnable(self, position):
        if position["type"] == "business entity":
            return True
        else:
            return False




      Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversal Framework

Using the Traversal




  with db.transaction:
     moh = ministries["KEMENTERIAN KESIHATAN"]
     contractors = Contractors(moh)
     for c in contractors:
        print c["name"]




      Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversal Framework

Output



  RAF SYNERGY SDN BHD
  PRIMABUMI SDN BHD
  AVERROES PHARMACEUTICALS SDN BHD
  QUALITY REPUTATION SDN BHD
  UNISENDO SDN BHD
  PRESTIGE PHARMA SDN BHD
  PHARMANIAGA LOGISTICS SDN BHD
  IDAMAN PHARMA SDN BHD
  PHARMASERV ALLIANCES SDN BHD




     Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversing Graphs with Gremlin

Gremlin




     Gremlin is a graph based programming language.
     Can express complex graph traversals concisely.
     Available at
     http://wiki.github.com/tinkerpop/gremlin/




     Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversing Graphs with Gremlin

Traversing the Graph with Gremlin
  $ ./gremlin.sh
         ,,,/
         (o o)
  --–-oOOo-( )-oOOo--–-
  gremlin> $ := g:key(”ministries”, ”KEMENTERIAN KESIHATAN”)
  ==>v[66]
  gremlin> ./inE[@label=”awarded by”]/outV/
                   outE[@label=”awarded to”]/inV/@name
  ==>PHARMASERV ALLIANCES SDN BHD
  ==>IDAMAN PHARMA SDN BHD
  ==>PHARMANIAGA LOGISTICS SDN BHD
  ==>PRIMABUMI SDN BHD
  ==>PRESTIGE PHARMA SDN BHD
  ==>UNISENDO SDN BHD
  ==>PRIMABUMI SDN BHD
  ==>QUALITY REPUTATION SDN BHD
  ==>AVERROES PHARMACEUTICALS SDN BHD
  ==>PRIMABUMI SDN BHD
  .....
      Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversing Graphs with Gremlin

Explanation




  ./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name


      inE - incoming edges
      outV - outgoing vertices
      outE - outgoing edges
      inV - incoming vertices




      Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversing Graphs with Gremlin

Explanation


  ./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name




      Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Traversals   Traversing Graphs with Gremlin

Explanation

  ./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name


      Get current object (.) (the ’KEMENTERIAN KESIHATAN’
      node).
      Get the incoming edges labeled ”awarded by”
      (inE[@label=”awarded by”]).
      Get the outgoing vertices of those edges (outV) (the contract
      nodes).
      Get the outgoing ”awarded to” edges of the contract nodes
      (outE[@label=”awarded to”]).
      Get the incoming vertices of those edges (inV) (the business
      entity vertices).
      Get the name attributes of those vertices (@name).

      Mohammed Firdaus, Muhd Sharuzzamal Bakri      Open Source Tools for Creating Mashups with Government Datas
Graph Visualizations   Gephi

Gephi




        Photoshop for graphs.
        Supports for various graph layout algorithms.
        Graph metrics supported - clustering coefficient. pagerank,
        diameter, betweeness centrality, closeness centrality
        File formats supported - csv, graphml, gexf etc..
        http://www.gephi.org




        Mohammed Firdaus, Muhd Sharuzzamal Bakri       Open Source Tools for Creating Mashups with Government Datas
Graph Visualizations   Gephi




Mohammed Firdaus, Muhd Sharuzzamal Bakri       Open Source Tools for Creating Mashups with Government Datas
Graph Visualizations   Gephi




Mohammed Firdaus, Muhd Sharuzzamal Bakri       Open Source Tools for Creating Mashups with Government Datas
Mashing Up   Adding External Data Sources

Mashing Up




  Lets add shareholding data from Suruhanjaya Syarikat Malaysia
  (SSM) to the graph so that we can show the tenders that have
  been awarded to Telekom Malaysia BERHAD and any of its
  subsidiaries/associate companies.




      Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Mashing Up   Adding External Data Sources

Connecting Telekom Malaysia Berhad and Telekom Smart
School Sdn Bhd

  telekom = business entities["TELEKOM MALAYSIA BERHAD"]
  telekom smart school = business entities["TELEKOM SMART SCHOOL SDN
  BHD"]

  telekom multi media = db.node(
        name="TELEKOM MULTI-MEDIA SDN BHD",
        no="345420-H", text="TELEKOM MULTI-MEDIA SDN BHD",
        type="business entity")

  telekom.shareholder in(telekom multi media, units=1650000)
  telekom multi media.shareholder in(telekom smart school,
        units=7650000)




      Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Mashing Up   Adding External Data Sources

Graph Centered at Telekom Malaysia Berhad




     Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Mashing Up   Adding External Data Sources

Graph Centered at Telekom Smart School Sdn Bhd




     Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Mashing Up   Traversing to Find Direct/Indirect Awards

The Traverser


  class AllTendersDirectIndirect(neo4j.Traversal):
     types = [neo4j.Incoming.awarded to,
            neo4j.Outgoing.shareholder in]

     order = neo4j.DEPTH FIRST
     stop = neo4j.STOP AT END OF GRAPH

     def isReturnable(self, position):
        if position["type"] == "tender":
            return True
        else:
            return False




      Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Mashing Up   Traversing to Find Direct/Indirect Awards

Executing the Traverser and the Output

  Executing the Traversal Definition
  telekom = business entities["TELEKOM MALAYSIA BERHAD"]
      tenders = AllTendersDirectIndirect(telekom)
      for tender in tenders:
         print tender["no"]


  Output
  30/2009
  35/2009
  8/2009
  162/2009
  JASA/OP/1/2009


      Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Wrapup   Making this Easier




Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Wrapup   Making this Easier




Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas
Wrapup   Making this Easier




Mohammed Firdaus, Muhd Sharuzzamal Bakri    Open Source Tools for Creating Mashups with Government Datas

Contenu connexe

Similaire à Open Source Tools for Creating Mashups with Government Datasets MOSC2010

Watson data platform_sofia_20171017
Watson data platform_sofia_20171017Watson data platform_sofia_20171017
Watson data platform_sofia_20171017Mladen Jovanovski
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowPeterWinstanley1
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataMartin Kaltenböck
 
Privacy policy information in data value chains
Privacy policy information in data value chainsPrivacy policy information in data value chains
Privacy policy information in data value chainsBig Data Value Association
 
Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...
Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...
Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...Dave Tinker, CFRE
 
Driving your data forward with FME
Driving your data forward with FMEDriving your data forward with FME
Driving your data forward with FMERaghavendran S
 
Common Service and Common Data Model by Henry McCallum
Common Service and Common Data Model by Henry McCallumCommon Service and Common Data Model by Henry McCallum
Common Service and Common Data Model by Henry McCallumKTL Solutions
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery WorkshopRaf Buyle
 
Data and information visualization tools 2012
Data and information visualization tools 2012Data and information visualization tools 2012
Data and information visualization tools 2012Euforic Services
 
Open Data Strategies & Mobile Government (GCC Perspective)
Open Data Strategies & Mobile Government (GCC Perspective)Open Data Strategies & Mobile Government (GCC Perspective)
Open Data Strategies & Mobile Government (GCC Perspective)Dr Usman Zafar
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010George Thomas
 
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Denodo
 
Linked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastLinked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastJames Hendler
 
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTourismFastForward
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphsopenCypher
 
Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...
Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...
Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...Richard Harbridge
 
TUW-ASE Summer 2015: Data marketplaces: core models and concepts
TUW-ASE Summer 2015: Data marketplaces:  core models and conceptsTUW-ASE Summer 2015: Data marketplaces:  core models and concepts
TUW-ASE Summer 2015: Data marketplaces: core models and conceptsHong-Linh Truong
 

Similaire à Open Source Tools for Creating Mashups with Government Datasets MOSC2010 (20)

Watson data platform_sofia_20171017
Watson data platform_sofia_20171017Watson data platform_sofia_20171017
Watson data platform_sofia_20171017
 
Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019
 
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, GlasgowNotes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
 
Putting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open DataPutting the L in front: from Open Data to Linked Open Data
Putting the L in front: from Open Data to Linked Open Data
 
Privacy policy information in data value chains
Privacy policy information in data value chainsPrivacy policy information in data value chains
Privacy policy information in data value chains
 
Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...
Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...
Dave Tinker, CFRE Mashups and Fundraising from May/June 2011 Advancing Philan...
 
Driving your data forward with FME
Driving your data forward with FMEDriving your data forward with FME
Driving your data forward with FME
 
Common Service and Common Data Model by Henry McCallum
Common Service and Common Data Model by Henry McCallumCommon Service and Common Data Model by Henry McCallum
Common Service and Common Data Model by Henry McCallum
 
Big Data into the MuleSoft world
Big Data into the MuleSoft worldBig Data into the MuleSoft world
Big Data into the MuleSoft world
 
#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop#opentourism - Linked Open Data Publishing and Discovery Workshop
#opentourism - Linked Open Data Publishing and Discovery Workshop
 
Big data
Big dataBig data
Big data
 
Data and information visualization tools 2012
Data and information visualization tools 2012Data and information visualization tools 2012
Data and information visualization tools 2012
 
Open Data Strategies & Mobile Government (GCC Perspective)
Open Data Strategies & Mobile Government (GCC Perspective)Open Data Strategies & Mobile Government (GCC Perspective)
Open Data Strategies & Mobile Government (GCC Perspective)
 
George thomas gtra2010
George thomas gtra2010George thomas gtra2010
George thomas gtra2010
 
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
 
Linked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech EastLinked Open Govt Data - Sem Tech East
Linked Open Govt Data - Sem Tech East
 
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen TechnologienTFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
TFF2016, Rudi Studer, Smarte Dienstleistungen mit semantischen Technologien
 
Use case: processing multiple graphs
Use case: processing multiple graphsUse case: processing multiple graphs
Use case: processing multiple graphs
 
Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...
Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...
Strategy & Success: Practical Tools & Techniques For The Strategist, Architec...
 
TUW-ASE Summer 2015: Data marketplaces: core models and concepts
TUW-ASE Summer 2015: Data marketplaces:  core models and conceptsTUW-ASE Summer 2015: Data marketplaces:  core models and concepts
TUW-ASE Summer 2015: Data marketplaces: core models and concepts
 

Plus de Linuxmalaysia Malaysia

Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Linuxmalaysia Malaysia
 
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...Linuxmalaysia Malaysia
 
Malaysia Open Source Conference MOSCMY 2013 Itinerary And Streams MOSC2013 a...
Malaysia Open Source Conference MOSCMY 2013  Itinerary And Streams MOSC2013 a...Malaysia Open Source Conference MOSCMY 2013  Itinerary And Streams MOSC2013 a...
Malaysia Open Source Conference MOSCMY 2013 Itinerary And Streams MOSC2013 a...Linuxmalaysia Malaysia
 
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013Linuxmalaysia Malaysia
 
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochureBrochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochureLinuxmalaysia Malaysia
 
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)Linuxmalaysia Malaysia
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaLinuxmalaysia Malaysia
 
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...Linuxmalaysia Malaysia
 
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)Linuxmalaysia Malaysia
 
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT SlideOSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT SlideLinuxmalaysia Malaysia
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Linuxmalaysia Malaysia
 
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrainBuilding Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrainLinuxmalaysia Malaysia
 
OSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source CommunityOSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source CommunityLinuxmalaysia Malaysia
 
33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-developmentLinuxmalaysia Malaysia
 
DNS solution trumps cloud computing competition
DNS solution trumps cloud computing competitionDNS solution trumps cloud computing competition
DNS solution trumps cloud computing competitionLinuxmalaysia Malaysia
 
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)Linuxmalaysia Malaysia
 
Benchmarking On Web Server For Budget 2008 Day
Benchmarking On  Web  Server For  Budget 2008  DayBenchmarking On  Web  Server For  Budget 2008  Day
Benchmarking On Web Server For Budget 2008 DayLinuxmalaysia Malaysia
 

Plus de Linuxmalaysia Malaysia (20)

Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
 
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
Call For Speakers Malaysia Open Source Conference 2014 (MOSCMY 2014 - MOSCMY2...
 
Malaysia Open Source Conference MOSCMY 2013 Itinerary And Streams MOSC2013 a...
Malaysia Open Source Conference MOSCMY 2013  Itinerary And Streams MOSC2013 a...Malaysia Open Source Conference MOSCMY 2013  Itinerary And Streams MOSC2013 a...
Malaysia Open Source Conference MOSCMY 2013 Itinerary And Streams MOSC2013 a...
 
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
MOSC2013 MOSCMY Brochure Malaysia Open Source Conference 2013
 
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochureBrochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
Brochure Malaysia Open Source Conference 2013 MOSCMY 2013 (MOSC2013) brochure
 
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
Hala Tuju Kemahiran Keselamatan Komputer Dan Internet (ICT)
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
 
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
Questionnaire For Establishment Of Board of Computing Professionals Malaysia ...
 
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)Sponsorship Prospectus Malaysia Open Source Conference 2012  (MOSC2012)
Sponsorship Prospectus Malaysia Open Source Conference 2012 (MOSC2012)
 
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT SlideOSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
OSS Community Forum Regarding Proposed BCPM2011 SWOT Slide
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011
 
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrainBuilding Smart Phone Web Apps MOSC2010 Bikesh iTrain
Building Smart Phone Web Apps MOSC2010 Bikesh iTrain
 
OSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source CommunityOSDC.my Master Plan For Malaysia Open Source Community
OSDC.my Master Plan For Malaysia Open Source Community
 
33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development33853955 bikesh-beginning-smart-phone-web-development
33853955 bikesh-beginning-smart-phone-web-development
 
DNS solution trumps cloud computing competition
DNS solution trumps cloud computing competitionDNS solution trumps cloud computing competition
DNS solution trumps cloud computing competition
 
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
Brochure MSC Malaysia Open Source Conference 2010 (MSC MOSC2010)
 
Benchmarking On Web Server For Budget 2008 Day
Benchmarking On  Web  Server For  Budget 2008  DayBenchmarking On  Web  Server For  Budget 2008  Day
Benchmarking On Web Server For Budget 2008 Day
 
Sesuaikan Masa Sempena 2010
Sesuaikan Masa Sempena 2010Sesuaikan Masa Sempena 2010
Sesuaikan Masa Sempena 2010
 
OSS Community In Malaysia 2009 List
OSS Community In Malaysia 2009 ListOSS Community In Malaysia 2009 List
OSS Community In Malaysia 2009 List
 
List Of OSS Communities Malaysia 2009
List Of OSS Communities Malaysia 2009List Of OSS Communities Malaysia 2009
List Of OSS Communities Malaysia 2009
 

Open Source Tools for Creating Mashups with Government Datasets MOSC2010

  • 1. Open Source Tools for Creating Mashups with Government Datasets Mohammed Firdaus, Muhd Sharuzzamal Bakri June 29, 2010 Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 2. Introduction About the Speakers About the Speakers Mohammed Firdaus bin Mohammed Ab Halim (@firdaus halim) and Muhd Sharuzzamal Bakri (@amai) Founders of Persada Terbilang Sdn Bhd - We have no relationship whatsoever to any fertilizer supplier Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 3. Introduction What are Mashups? Mashups A mashup is a web page or application that uses and combines data, presentation or functionality from two or more sources to create new services. (Source: Wikipedia) Data mashups combine similar types of media and information from multiple sources into a single representation. (Source: Wikipedia) Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 4. Challenges Data Sets are Not Available in Machine Readable Form Data Sets are Not Available in Machine Readable Form Nothing useful here: filetype:csv site:.gov.my filetype:xml site:.gov.my filetype:rdf site:.gov.my We have to resort to web scraping. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 5. Challenges No Data Dictionaries No Data Dictionaries Since the data sets that are available were meant for humans to consume rather machines they are usually published without any type of data dictionary. This means that an application developer will have to make assumptions about the structure of each field e.g. whether it’s unique, whether it’s a multi-value field, which fields are mandatory/option. These assumptions may or may not turn out be correct as you see more and more data in the data set. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 6. Challenges New Data Sets Constantly Become Available New Data Sets Constantly Become Available This is a not a bad thing. However, our code, database and schema must be flexible enough to deal with future data sets that we might want to use in our applications. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 7. Challenges Lack of Standards Across Agencies Lack of Standards Across Agencies Different identifiers for referring to the same entity. The lack of common identifiers makes it tedious to combine data sets together which maybe describing the same entity. MyCoID and MyID are steps in the right direction. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 8. Challenges Summary In Summary Because of these challenges, we need an agile method for modeling, storing and processing these government datasets in our application. The purpose of this presentation is to show how representing your data as a graph both help you deal with these challenges and at the same time help make compelling data mashups. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 9. Graphs Introduction to Graphs What is a Graph? A data structure that consists of a collection of vertices and the connections between those vertices, called edges. Vertices are sometimes called nodes or dots. Edges are sometimes called relationships or edges. The terminology differs between software packages. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 10. Graphs Types of Graphs Types of Graphs A directed graph (or digraph) is one where the edges have a direction (i.e. there’s an outgoing and incoming vertex). A multigraph is one where multiple edges can exist between two vertices. An edge-labeled graph is a graph where edges have labels. Similarly, a vertex-labeled graph is one in which the vertices have labels. An attributed graph is one in which the vertices and edges can have attributes (key-value pairs). A graph can have more than one of these properties e.g. a multi digraph is one which multiple directed edges can exist between two vertices. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 11. Graphs Types of Graphs Types of Graphs - Simple/Undirected Graphs Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 12. Graphs Types of Graphs Types of Graphs - Directed Graph Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 13. Graphs Types of Graphs Types of Graphs - Edge and Node Labeled Graph Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 14. Graphs Types of Graphs Types of Graphs - Multigraph Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 15. Graphs Types of Graphs Types of Graphs - Attributed Multigraph Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 16. Graphs Types of Graphs Examples - Social Graphs Source: http://www.flickr.com/photos/greenem/11696663/ Undirected Graph - Vertices represent people and edges represents friendship. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 17. Graphs Types of Graphs Examples - Web Graph http://en.wikipedia.org/wiki/File:WorldWideWebAroundWikipedia.png Multi-digraph - Vertices represent web pages and directed edges represent links between pages. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 18. Graphs Property Graphs Property Graphs ’Property graph’ is another term for attributed labeled multi-digraph. Property graphs are flexible enough to support most types of graph data. Other types of graphs (with the exception of hypergraphs) can be built on top of property graphs by removing features or using features of the property graph in certain ways. The tools that we are covering in this presentation deal primarily with property graphs. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 19. Graphs Property Graphs Property Graphs Source: http://wiki.github.com/tinkerpop/gremlin/defining-a-property-graph Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 20. Data Sets Treasury Procurement Data Treasury - Tenders Awarded Source: http://myprocurement.treasury.gov.my/index.php/en/list-keputusan-tender Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 21. Data Sets Treasury Procurement Data Fields Tajuk Tender (Title of Tender) Nombor Tender (Tendor Number) Kategori Perolehan (Procurement Category) Kementerian (Ministry) Petender Berjaya (Winner of Tender) No Pendaftaran Dengan ROB/ROS/ROC (Registration Number with ROB/ROS/ROC) No Pendaftaran Dengan MOF/PKK (Registration Number with MOF/PKK) Harga Setuju Terima (Agreed Upon Value) Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 22. Data Sets Treasury Procurement Data Code and Data in Machine Readable Form For this presentation we are using data that we scraped form this site on 2010-04-26 The source code for our scraper and the CSV dump from 2010-04-26 is available at http://mfirdaus.com/mosc-paper/ The dump contains 2615 records. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 23. Data Sets Treasury Procurement Data The Dump Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 24. Data Sets Issues with this Data Sets Missing Fields Out of the 2615 records in the dump 510 records were missing a tender number 472 records were missing a category 1836 records were missing a ROB/ROS/ROC number 510 records were missing a MOF no Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 25. Data Sets Issues with this Data Sets Tender Numbers are Not Unique 32 records have the same tender number and title as another record 23 records have the same tender number as another record In some cases these appear to be duplicate records since the fields all match up. In other cases, one or two fields are slightly different indicating that there was a probably a typo (erroneous record was not deleted). In some cases, the other fields are completely different which leads us to think that it’s possible for there to be multiple winners of a tender (need some government officials to verify this for us). Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 26. Data Sets Issues with this Data Sets Format of Tender Numbers Examples of tender numbers: 8/2009 PL.(T).08.2009(JKP) X0141110101090021 128/2009 KBS.S.4-14/69 (T.26/2009) Probably not a good idea to write code that attempts to parse the tender number. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 27. Data Sets Issues with this Data Sets Format of the ”Petender Berjaya” Field SYARIKAT PROSPECTRUM SDN BHD TELEKOM SMART SCHOOL SDN BHD NO.45-8, LEVEL 3, BLOCK C, PLAZA DAMANSARA, JALAN MEDAN SETIA 1, BUKIT DAMANSARA 50490 KUALA LUMPUR 1. GLOBAL AEROSPACE SDN BHD (A002) 2. SYSTEM ALLIANCE TECHNOLOGY SDN. BHD.(A003) 3. KARISMA WIRA SDN. BHD. (A004) 4. KESUMA TECHNOLOGY SDN. BHD (A005) A QUALITY REPUTATION SDN BHD B PRIMABUMI SDN BHD Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 28. Data Sets Modeling Modeling this Data Set as a Property Graph One way to model this data as a graph is to: Vertices to represent tenders, ministries and companies/businesses. An ”awarded by” labeled edge to associate a tender with a ministry. An ”awarded to” labeled edge to associate a tender with the winner of the tender (the company/business). Attributes on tender vertices for the tender title, number, value, category Attributes on company/business vertices for the company/business name, ROB/ROC/ROS registration number and MOF registration number. Attributes on ministry vertices from the name of the ministry. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 29. Data Sets Modeling Example Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 30. Graph Databases and Neo4j Neo4j - Introduction Neo4j Neo4j is a graph database. Persists data in graph form. Property graph data model with the exception of vertex labels. In Neo4j terms, vertices are nodes, edges are relationships and attributes are properties. Property values can be a String or any Java primitive (arrays of these types are supported as well). Licensed under the AGPLv3. Which basically means that you don’t need a license if your application is released under a compatible free software license. For other uses, you need a commercial license from them. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 31. Graph Databases and Neo4j Neo4j - Introduction Neo4j Written in Java. Bindings available for Python, Ruby, Clojure, Erlang, Groovy, Scalan and PHP. We will be using the Python bindings in this talk. An embedded database, meaning that it runs in the same process space as the application. There’s a standalone REST server for those who prefer it. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 32. Graph Databases and Neo4j Inserting into Neo4j Initializing the Database import neo4j db = neo4j.GraphDatabase("db") Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 33. Graph Databases and Neo4j Inserting into Neo4j Creating the Nodes ministry node = db.node(name=ministry, type="ministry") entity node = db.node(name=entity name, no=entity no, mof no=entity mof no, type="business entity") tender node = db.node(no=tender no, title=tender title, category=tender category, value=tender value, type="tender") Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 34. Graph Databases and Neo4j Inserting into Neo4j Creating the Relationships tender node.awarded by(ministry node) tender node.awarded to(entity node) Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 35. Graph Databases and Neo4j Inserting into Neo4j Indexing Nodes ministries = db.index("ministries", create=True) business entities = db.index("business entities", create=True) tenders by no = db.index("tenders by no", create=True) tenders by title = db.index("tenders by title", create=True) tenders by no[tender no] = tender node tenders by title[tender title] = tender node Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 36. Graph Databases and Neo4j Inserting into Neo4j The Result Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 37. Graph Traversals Traversing the Graph Traversing is the process of walking around the graph. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 38. Graph Traversals Graph Traversal Options Graph Traversal Framework Gremlin SPARQL Manual traversal Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 39. Graph Traversals Problem Lets use graph traversal to find all the companies who have been awarded contracts by Kementerian Kesihatan. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 40. Graph Traversals Graph Around Kementerian Kesihatan Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 41. Graph Traversals Traversal Framework Defining the Traversal # Companies who have gotten contracts from a particular ministry # The start node is a ministry class Contractors(neo4j.Traversal): types = [neo4j.Incoming.awarded by, neo4j.Outgoing.awarded to] order = neo4j.DEPTH FIRST stop = neo4j.STOP AT END OF GRAPH def isReturnable(self, position): if position["type"] == "business entity": return True else: return False Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 42. Graph Traversals Traversal Framework Using the Traversal with db.transaction: moh = ministries["KEMENTERIAN KESIHATAN"] contractors = Contractors(moh) for c in contractors: print c["name"] Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 43. Graph Traversals Traversal Framework Output RAF SYNERGY SDN BHD PRIMABUMI SDN BHD AVERROES PHARMACEUTICALS SDN BHD QUALITY REPUTATION SDN BHD UNISENDO SDN BHD PRESTIGE PHARMA SDN BHD PHARMANIAGA LOGISTICS SDN BHD IDAMAN PHARMA SDN BHD PHARMASERV ALLIANCES SDN BHD Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 44. Graph Traversals Traversing Graphs with Gremlin Gremlin Gremlin is a graph based programming language. Can express complex graph traversals concisely. Available at http://wiki.github.com/tinkerpop/gremlin/ Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 45. Graph Traversals Traversing Graphs with Gremlin Traversing the Graph with Gremlin $ ./gremlin.sh ,,,/ (o o) --–-oOOo-( )-oOOo--–- gremlin> $ := g:key(”ministries”, ”KEMENTERIAN KESIHATAN”) ==>v[66] gremlin> ./inE[@label=”awarded by”]/outV/ outE[@label=”awarded to”]/inV/@name ==>PHARMASERV ALLIANCES SDN BHD ==>IDAMAN PHARMA SDN BHD ==>PHARMANIAGA LOGISTICS SDN BHD ==>PRIMABUMI SDN BHD ==>PRESTIGE PHARMA SDN BHD ==>UNISENDO SDN BHD ==>PRIMABUMI SDN BHD ==>QUALITY REPUTATION SDN BHD ==>AVERROES PHARMACEUTICALS SDN BHD ==>PRIMABUMI SDN BHD ..... Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 46. Graph Traversals Traversing Graphs with Gremlin Explanation ./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name inE - incoming edges outV - outgoing vertices outE - outgoing edges inV - incoming vertices Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 47. Graph Traversals Traversing Graphs with Gremlin Explanation ./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 48. Graph Traversals Traversing Graphs with Gremlin Explanation ./inE[@label=”awarded by”]/outV/outE[@label=”awarded to”]/inV/@name Get current object (.) (the ’KEMENTERIAN KESIHATAN’ node). Get the incoming edges labeled ”awarded by” (inE[@label=”awarded by”]). Get the outgoing vertices of those edges (outV) (the contract nodes). Get the outgoing ”awarded to” edges of the contract nodes (outE[@label=”awarded to”]). Get the incoming vertices of those edges (inV) (the business entity vertices). Get the name attributes of those vertices (@name). Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 49. Graph Visualizations Gephi Gephi Photoshop for graphs. Supports for various graph layout algorithms. Graph metrics supported - clustering coefficient. pagerank, diameter, betweeness centrality, closeness centrality File formats supported - csv, graphml, gexf etc.. http://www.gephi.org Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 50. Graph Visualizations Gephi Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 51. Graph Visualizations Gephi Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 52. Mashing Up Adding External Data Sources Mashing Up Lets add shareholding data from Suruhanjaya Syarikat Malaysia (SSM) to the graph so that we can show the tenders that have been awarded to Telekom Malaysia BERHAD and any of its subsidiaries/associate companies. Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 53. Mashing Up Adding External Data Sources Connecting Telekom Malaysia Berhad and Telekom Smart School Sdn Bhd telekom = business entities["TELEKOM MALAYSIA BERHAD"] telekom smart school = business entities["TELEKOM SMART SCHOOL SDN BHD"] telekom multi media = db.node( name="TELEKOM MULTI-MEDIA SDN BHD", no="345420-H", text="TELEKOM MULTI-MEDIA SDN BHD", type="business entity") telekom.shareholder in(telekom multi media, units=1650000) telekom multi media.shareholder in(telekom smart school, units=7650000) Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 54. Mashing Up Adding External Data Sources Graph Centered at Telekom Malaysia Berhad Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 55. Mashing Up Adding External Data Sources Graph Centered at Telekom Smart School Sdn Bhd Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 56. Mashing Up Traversing to Find Direct/Indirect Awards The Traverser class AllTendersDirectIndirect(neo4j.Traversal): types = [neo4j.Incoming.awarded to, neo4j.Outgoing.shareholder in] order = neo4j.DEPTH FIRST stop = neo4j.STOP AT END OF GRAPH def isReturnable(self, position): if position["type"] == "tender": return True else: return False Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 57. Mashing Up Traversing to Find Direct/Indirect Awards Executing the Traverser and the Output Executing the Traversal Definition telekom = business entities["TELEKOM MALAYSIA BERHAD"] tenders = AllTendersDirectIndirect(telekom) for tender in tenders: print tender["no"] Output 30/2009 35/2009 8/2009 162/2009 JASA/OP/1/2009 Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 58. Wrapup Making this Easier Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 59. Wrapup Making this Easier Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas
  • 60. Wrapup Making this Easier Mohammed Firdaus, Muhd Sharuzzamal Bakri Open Source Tools for Creating Mashups with Government Datas