SlideShare une entreprise Scribd logo
1  sur  71
Télécharger pour lire hors ligne
Storing and Manipulating Graphs
            in HBase


            Dan Lynn
          dan@fullcontact.com
              @danklynn
Keeps Contact Information Current and Complete


  Based in Denver, Colorado




                              CTO & Co-Founder
Turn Partial Contacts
 Into Full Contacts
Refresher: Graph Theory
Refresher: Graph Theory
Refresher: Graph Theory




     rt ex
Ve
Refresher: Graph Theory




                          Edg
                                e
Social Networks
Tweets

@danklynn

              retweeted


                                   “#HBase rocks”
 follows


                          author



            @xorlev
Web Links


http://fullcontact.com/blog/



                               <a href=”...”>TechStars</a>




                               http://techstars.com/
Why should you care?

Vertex Influence
- PageRank

- Social Influence

- Network bottlenecks

Identifying Communities
Storage Options
neo4j
neo4j




Very expressive querying
       (e.g. Gremlin)
neo4j




Transactional
neo4j




Data must fit on a
 single machine

       :-(
FlockDB
FlockDB




Scales horizontally
FlockDB




Very fast
FlockDB




No multi-hop query support

           :-(
RDBMS
(e.g. MySQL, Postgres, et al.)
RDBMS




Transactional
RDBMS




Huge amounts of JOINing

          :-(
HBase




Massively scalable
HBase




Data model well-suited
HBase




Multi-hop querying?
Modeling
Techniques
Adjacency Matrix


1
             3




    2
Adjacency Matrix

    1   2    3

1   0   1    1

2   1   0    1

3   1   1    0
Adjacency Matrix




Can use vectorized libraries
Adjacency Matrix




Requires   O(n2)   memory
                   n = number of vertices
Adjacency Matrix




Hard(er) to distribute
Adjacency List


1
                3




      2
Adjacency List




1           2,3

2           1,3

3           1,2
Adjacency List Design in HBase

e:dan@fullcontact.com
                                p:+13039316251




                   t:danklynn
Adjacency List Design in HBase
      row key               “edges” column family

e:dan@fullcontact.com   p:+13039316251= ...

                        t:danklynn= ...


p:+13039316251          e:dan@fullcontact.com= ...

                        t:danklynn= ...


t:danklynn              e:dan@fullcontact.com= ...

                        p:+13039316251= ...
Adjacency List Design in HBase
      row key               “edges” column family

e:dan@fullcontact.com   p:+13039316251= ...

                        t:danklynn= ...
                                                      at to
                                                W e?h
p:+13039316251          e:dan@fullcontact.com= ...
                                                   st or
                        t:danklynn= ...


t:danklynn              e:dan@fullcontact.com= ...

                        p:+13039316251= ...
Custom Writables
package org.apache.hadoop.io;

public interface Writable   {

    void write(java.io.DataOutput dataOutput);

    void readFields(java.io.DataInput dataInput);
}
                                                    java
Custom Writables
class EdgeValueWritable implements Writable {

    EdgeValue edgeValue

    void write(DataOutput dataOutput) {
        dataOutput.writeDouble edgeValue.weight
    }

    void readFields(DataInput dataInput) {
        Double weight = dataInput.readDouble()
        edgeValue = new EdgeValue(weight)
    }

    // ...
}
                                                  groovy
Don’t get fancy with byte[]
class EdgeValueWritable implements Writable {
   EdgeValue edgeValue

    byte[] toBytes() {
        // use strings if you can help it
    }

    static EdgeValueWritable fromBytes(byte[] bytes) {
        // use strings if you can help it
    }
}
                                                     groovy
Querying by vertex
def get = new Get(vertexKeyBytes)
get.addFamily(edgesFamilyBytes)

Result result = table.get(get);
result.noVersionMap.each {family, data ->

    // construct edge objects as needed
    // data is a Map<byte[],byte[]>
}
Adding edges to a vertex
def put = new Put(vertexKeyBytes)

put.add(
    edgesFamilyBytes,
    destinationVertexBytes,
    edgeValue.toBytes() // your own implementation here
)

// if writing directly
table.put(put)


// if using TableReducer
context.write(NullWritable.get(), put)
Distributed Traversal / Indexing

e:dan@fullcontact.com
                         p:+13039316251




                          t:danklynn
Distributed Traversal / Indexing

e:dan@fullcontact.com
                         p:+13039316251




                          t:danklynn
Distributed Traversal / Indexing

e:dan@fullcontact.com
                                         p:+13039316251


                    Pi v
                           ot v
                                  e rt
                                         ex

                                         t:danklynn
Distributed Traversal / Indexing

 e:dan@fullcontact.com
                          p:+13039316251




Ma pReduce ove r
out bou nd edges
                           t:danklynn
Distributed Traversal / Indexing

  e:dan@fullcontact.com
                           p:+13039316251




Em it vertexes an d edge
dat a gro upe d by
the piv ot               t:danklynn
Distributed Traversal / Indexing

   Re duc e key                p:+13039316251




“Ou t” vertex
                e:dan@fullcontact.com



                                        t:danklynn
“In” vertex
Distributed Traversal / Indexing


e:dan@fullcontact.com       t:danklynn




Re duc er em its higher-order edge
Distributed Traversal / Indexing




Ite rat ion 0
Distributed Traversal / Indexing




Ite rat ion 1
Distributed Traversal / Indexing




Ite rat ion 2
Distributed Traversal / Indexing




                               Reuse edges created
                               during previ ous
                               iterat ions




Ite rat ion 2
Distributed Traversal / Indexing




Ite rat ion 3
Distributed Traversal / Indexing




                               Reuse edges created
                               during previ ous
                               iterat ions




Ite rat ion 3
Distributed Traversal / Indexing


   hop s req uires on ly

                   ite rat ion s
Tips / Gotchas
Do implement your own comparator
public static class Comparator
               extends WritableComparator {


    public int compare(
        byte[] b1, int s1, int l1,
        byte[] b2, int s2, int l2) {

        // .....
    }
}
                                              java
Do implement your own comparator


static {
    WritableComparator.define(VertexKeyWritable,
         new VertexKeyWritable.Comparator())
}



                                                   java
MultiScanTableInputFormat

MultiScanTableInputFormat.setTable(conf,
   "graph");

MultiScanTableInputFormat.addScan(conf,
   new Scan());

job.setInputFormatClass(
   MultiScanTableInputFormat.class);


                                          java
TableMapReduceUtil



TableMapReduceUtil.initTableReducerJob(
    "graph", MyReducer.class, job);

                                      java
Elastic
MapReduce
Elastic MapReduce

HFi les
Elastic MapReduce

HFi les
     Copy to S3



  Seq uen ceFiles
Elastic MapReduce

HFi les
     Copy to S3
                     Elastic MapReduce



  Seq uen ceFiles Seq uen ceFiles
Elastic MapReduce

HFi les
     Copy to S3
                     Elastic MapReduce



  Seq uen ceFiles Seq uen ceFiles
Elastic MapReduce

HFi les
     Copy to S3
                                Elastic MapReduce



  Seq uen ceFiles Seq uen ceFiles
          HFileOutputFormat.configureIncrementalLoad(job, outputTable)



  HFi les
Elastic MapReduce

HFi les
     Copy to S3
                                Elastic MapReduce



  Seq uen ceFiles Seq uen ceFiles
          HFileOutputFormat.configureIncrementalLoad(job, outputTable)



  HFi les                                          HBase
                   $ hadoop jar hbase-VERSION.jar completebulkload
Additional Resources
Google Pregel: BSP-based graph processing system

Apache Giraph: Implementation of Pregel for Hadoop

MultiScanTableInputFormat example

Apache Mahout - Distributed machine learning on Hadoop
Thanks!
dan@fullcontact.com

Contenu connexe

Tendances

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGMatthew McCullough
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureDr. Christian Betz
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
How LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentHow LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentSasha Ovsankin
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithmsDuyhai Doan
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchAppsBradley Holt
 

Tendances (20)

Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Indexed Hive
Indexed HiveIndexed Hive
Indexed Hive
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Apache pig
Apache pigApache pig
Apache pig
 
How LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentHow LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product Development
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
 
OSCON 2011 CouchApps
OSCON 2011 CouchAppsOSCON 2011 CouchApps
OSCON 2011 CouchApps
 
Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 

En vedette

HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
 
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBaseHBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBaseCloudera, Inc.
 
HBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgentHBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgentHBaseCon
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...DataWorks Summit
 
안드로이드에서 플러리를 쉽게 사용하기
안드로이드에서 플러리를 쉽게 사용하기안드로이드에서 플러리를 쉽게 사용하기
안드로이드에서 플러리를 쉽게 사용하기Booseol Shin
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - HortonworksAvery Ching
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandrajohnrjenson
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudHBaseCon
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...HBaseCon
 

En vedette (20)

HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBaseHBaseCon 2012 | Storing and Manipulating Graphs in HBase
HBaseCon 2012 | Storing and Manipulating Graphs in HBase
 
HBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgentHBaseCon 2015: HBase @ CyberAgent
HBaseCon 2015: HBase @ CyberAgent
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
 
안드로이드에서 플러리를 쉽게 사용하기
안드로이드에서 플러리를 쉽게 사용하기안드로이드에서 플러리를 쉽게 사용하기
안드로이드에서 플러리를 쉽게 사용하기
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
Redis vs Aerospike
Redis vs AerospikeRedis vs Aerospike
Redis vs Aerospike
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 

Similaire à Storing and manipulating graphs in HBase

Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBaseWibiData
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...HostedbyConfluent
 
Tech Days Paris Intoduction F# and Collective Intelligence
Tech Days Paris Intoduction F# and Collective IntelligenceTech Days Paris Intoduction F# and Collective Intelligence
Tech Days Paris Intoduction F# and Collective IntelligenceRobert Pickering
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)RichardWarburton
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability MahoutJake Mannix
 
Generics Past, Present and Future
Generics Past, Present and FutureGenerics Past, Present and Future
Generics Past, Present and FutureRichardWarburton
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeArangoDB Database
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
From Python to Scala
From Python to ScalaFrom Python to Scala
From Python to ScalaFFunction inc
 
cppt-170218053903 (1).pptx
cppt-170218053903 (1).pptxcppt-170218053903 (1).pptx
cppt-170218053903 (1).pptxWatchDog13
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 

Similaire à Storing and manipulating graphs in HBase (20)

Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Performing Data Science with HBase
Performing Data Science with HBasePerforming Data Science with HBase
Performing Data Science with HBase
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
Tech Days Paris Intoduction F# and Collective Intelligence
Tech Days Paris Intoduction F# and Collective IntelligenceTech Days Paris Intoduction F# and Collective Intelligence
Tech Days Paris Intoduction F# and Collective Intelligence
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)Generics Past, Present and Future (Latest)
Generics Past, Present and Future (Latest)
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Generics Past, Present and Future
Generics Past, Present and FutureGenerics Past, Present and Future
Generics Past, Present and Future
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
From Python to Scala
From Python to ScalaFrom Python to Scala
From Python to Scala
 
cppt-170218053903 (1).pptx
cppt-170218053903 (1).pptxcppt-170218053903 (1).pptx
cppt-170218053903 (1).pptx
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 

Plus de Dan Lynn

The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache SparkDan Lynn
 
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve  with On-Demand SchemasAgilData - How I Learned to Stop Worrying and Evolve  with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand SchemasDan Lynn
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology OverviewDan Lynn
 
Data decay and the illusion of the present
Data decay and the illusion of the presentData decay and the illusion of the present
Data decay and the illusion of the presentDan Lynn
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2Dan Lynn
 

Plus de Dan Lynn (8)

The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Hands on with Apache Spark
Hands on with Apache SparkHands on with Apache Spark
Hands on with Apache Spark
 
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve  with On-Demand SchemasAgilData - How I Learned to Stop Worrying and Evolve  with On-Demand Schemas
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology Overview
 
Data decay and the illusion of the present
Data decay and the illusion of the presentData decay and the illusion of the present
Data decay and the illusion of the present
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2When it rains: Prepare for scale with Amazon EC2
When it rains: Prepare for scale with Amazon EC2
 

Dernier

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Storing and manipulating graphs in HBase