SlideShare a Scribd company logo
1 of 49
Lily
Smart data at scale



    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
big data,
big problems

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
MOORE vs data

» IDC says Digital Universe will be 35                                                        data
  Zettabytes by 2020
» 20% = enterprise data (structured,
                                                                                                 moore
  curated, $$$)
» Facebook, Yahoo!, Google, Rapleaf,
  Amazon show us how the
  remaining 80% can be monetized
  » some of them even rent out their data
      platform
       » ... at the cost of infrastructure lock-in

1 Zettabyte = 1,000,000,000,000,000,000,000 bytes, or 1 billion terrabytes


                  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org          3
MOORE vs data

                                                                                      data

» coping with volume + need for
  timeliness = parallel processing                                                       moore
» data becomes business-critical =
  resilience through distributed
  architectures
» Hadoop, MapReduce, HBase:
  the future data platform




          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org          4
the CHALLENGES


» process ALL data
» process data in REAL-TIME
» derive INSIGHTS
» provide INSTANT FEEDBACK




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   5
current thinking


                         ETL

                                     data
    data STORE                                                      analytics
                                   warehouse




batched, off-line, overnight
     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   6
1. store and manage all YOUR data

                      DATA




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7
2. store user behaviour, nearby

                       DATA




USER
Behavior




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   8
3. analyze usage patterns

                       DATA                  data processing




USER
Behavior




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   9
4. add domain knowledge

                       DATA                  data processing




USER
Behavior


                                                        domain
                                                      knowledge
                                                        patterns
                                                        rules
                                                        keywords
                                                        lists
                                                        ...




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10
5. process, in real-time

                       DATA                  data processing

                                              recommendations
                                              semantic augmentation
                                              Analytics

USER
Behavior


                                                        domain
                                                      knowledge
                                                        patterns
                                                        rules
                                                        keywords
                                                        lists
                                                        ...




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11
6. augment data

                       DATA                  data processing

                                              recommendations
                                              semantic augmentation
                                              Analytics

USER
Behavior


                                                        domain
                                                      knowledge
                                                        patterns
                                                        rules
                                                        keywords
                                                        lists
                                                        ...




 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12
data insights
              SMARTER DATA                  data processing
                           s
                   relation
                                             recommendations
                                             semantic augmentation
                                             Analytics




                                                       domain
                                                     knowledge
                                                       patterns
                                                       rules
                                                       keywords
                                                       lists
                                                       ...




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   13
data insights
                    SMARTER DATA                  data processing
                                 s
                         relation
                                                   recommendations
                                                   semantic augmentation
                                                   Analytics




                                                             domain
                                                           knowledge
                                                             patterns
                                                             rules
                                                             keywords
                                                             lists

SMART DATA, at SCALE
                                                             ...




... and in real time
      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   14
stories

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
HYPER-PERSONAL
recommendations
                             NEWS

                                                     TOGETHERNESS
                                                     interestingness




                                                               organisations
                                                               names
                                                               locations
                                                               brands



news aggregator
scale
        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   16
up-selling
CROSS-SELLING
                           product
                           CATALOG
                                                      recommendedness
                                                      relatedness




                                                                product
                                                                families
                                                                related
                                                                activities
                                                                social graph


e-retail
real-time
         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
competitive
innovation
                            patents

                                                      (dis)SIMILARITY




                                                                companies
                                                                people
                                                                materials
                                                                processes



IP research
insights
         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18
outerthought

“ The world is moving from content as a cost
to data as an opportunity. We provide the tools
and the platform to let organisations maximally
benefit from the data they grow and collect. ”




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   19
Lily (now)

» Large-scale content storage, indexing and search
» Current pilots



    e-retail     mobile media         isp           e-gov        ip research


» up-to now: 3 man-years investment (since Sept/2009)



               IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   20
Lily 1.0 (CR)


                                  data
  data STORE          +         warehouse             +        analytics




                              real time
                }
                              Lily 2.0
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
lily USPs

» Integrated approach, one-stop-data-shop
  » No more flat file processing (Hadoop) ➙
    interactive database (HBase)                                        ➡       all data
» Real-time (vs. overnight)
  » instant feedback loops
                                                                        ➡       real-time
  » designed for on-line, interactive use
                                                                        ➡       easy
» Available in-house, SaaS possible
» Data Insights = data + customer retention


           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org        22
roadmap

» now: Lily 0.3
                                                                » Along the road:
» april 2011 : Lily 1.0
                                                                   Lily SaaS edition
» Q3 2011
  » real-time statistics + analytics
» Q2 2012 : Lily 2.0
  » real-time data processing engine
  » Data Insights



            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23
open source




» www.lilyproject.org
» docs.outerthought.org/lily-docs-current/




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24
Lily Core Concepts
» storage
 » HBase
 » repository model
 » versioning, varianting, mixins
» indexing
 » mapping
» search
 » SOLR


           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
falling in love with Hbase : phase 1

» automatic scaling to large data sets
» fault-tolerance
» flexible datamodel with sparse data
» commodity hardware
» efficient random access
» community-based open source
» Java if possible


         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   26
falling in love with Hbase : phase 2



» need for consistency
» atomic single-row updates
» M/R for index regeneration




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   27
falling in love with Hbase : phase 3


 HBase
» datamodel with column families and cell versioning
» ordered tables with range scans
» HDFS for blob storage
» Apache



        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   28
Lily Repository Model




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   29
Lily Datatypes




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   30
Mixins




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   31
Sample Lily Schema (excerpt)
                                                                 

{
namespaces:
{
                                                                 



name:
"b$name",




/*
Declaration
of
namespace
prefixes.
*/
                                                                 



valueType:
{
primitive:
"STRING"
},




"org.lilyproject.bookssample":
"b",
                                                                 



scope:
"versioned"




"org.lilyproject.vtag":
"vtag"
                                                                 

},


},
                                                                 

{
fieldTypes:
[
                                                                 



name:
"b$bio",


{
                                                                 



valueType:
{
primitive:
"STRING"
},




name:
"b$title",
                                                                 



scope:
"versioned"




valueType:
{
primitive:
"STRING"
},
                                                                 

},




scope:
"versioned"
                                                                 

{


},
                                                                 



name:
"vtag$last",


{
                                                                 



valueType:
{
primitive:
"LONG"
},




name:
"b$pages",
                                                                 



scope:
"non_versioned"




valueType:
{
primitive:
"INTEGER"
},
                                                                 

}




scope:
"versioned"
                                                                 

],


},
                                                                 recordTypes:
[


{
                                                                 

{




name:
"b$language",
                                                                 



name:
"b$Book",




valueType:
{
primitive:
"STRING"
},
                                                                 



fields:
[




scope:
"versioned"
                                                                 





{name:
"b$title",
mandatory:
true
},


},
                                                                 





{name:
"b$pages",
mandatory:
false
},


{
                                                                 





{name:
"b$language",
mandatory:
false
},




name:
"b$authors",
                                                                 





{name:
"b$authors",
mandatory:
false
},




valueType:
{
primitive:
"LINK",
multiValue:
true
},
                                                                 





{name:
"vtag$last",
mandatory:
false
}




scope:
"versioned"
                                                                 



]


},
                                                                 

},

                                                                 ...


                    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                     32
Lily Versioning




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   33
Flexible content model
» generic enough to accomodate many popular content
 schemas
 » HTML5, CMIS, RDF, NewsML, Dublin Core, ...
 » academically verified
 » not limited to ‘content applications’ only
» developer convenience
 » higher level constructs
 » schema reuse
 » versioning, linking, ...

         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   34
Lily Architecture
(deployment)




           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   35
Lily Architecture
                    (components)




                                   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   36
HBase RowLog Library


» need for sync/async operations
 » updating of secondary indexes (i.e. tables)
 » feeding of Indexer (= bridge to SOLR index maintenance)
» not: transactions
» need for distribution and durability




         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   37
HBase RowLog Library
» WAL                                                    » Queue
 » guaranteed execution of synchronous                      » triggering of async actions
   actions
                                                            » e.g. (re)index (updated) record with
 » call doesn’t return before secondary
                                                                SOLR back-end
   action finishes
                                                            » size depends on speed of back-end
 » e.g. update secondary index tables
                                                                process
 » if all goes well,
   size = #concurrent ops
 » useful outside of Lily context as well!




                IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            38
The Lily Indexer

                                                                                                   sharding towards
                  indexing of multiple   incremental index                          blob content
denormalization                                              batch index building                   multiple SOLR
                  versions of a record        updating                               extraction
                                                                                                       instances




                     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                        39
Indexing configuration (SOLR)
<schema name="example" version="1.2">

<types>
  [snipped: see SOLR example schema]
</types>

 <fields>
   <!-- Fields which are required by Lily -->
   <field name="@@key" type="string" indexed="true" stored="true" required="true"/>
   <field name="@@id" type="string" indexed="true" stored="true" required="true"/>
   <field name="@@vtag" type="string" indexed="true" stored="true" required="true"/>
   <field name="@@versionless" type="string" indexed="true" stored="true" required="false"/>

  <!-- Your own fields -->
  <field name="title" type="text" indexed="true" stored="true" required="false"/>
  <field name="authors" type="text" indexed="true" stored="true" required="false"
                                                                 multiValued="true"/>
</fields>

<uniqueKey>@@key</uniqueKey>

<defaultSearchField>title</defaultSearchField>

<solrQueryParser defaultOperator="OR"/>

</schema>




             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org         40
Indexer configuration (Lily)
<?xml version="1.0"?>
<indexer xmlns:b="org.lilyproject.bookssample">
  <cases>
    <case recordType="b:Book" variant="*" vtags="last" indexVersionless="true"/>
  </cases>

  <indexFields>
    <indexField name="title">
      <value>
        <field name="b:title"/>
      </value>
    </indexField>

    <indexField name="authors">
      <value>
        <deref>
          <follow field="b:authors"/>
          <field name="b:name"/>
        </deref>
      </value>
    </indexField>
  </indexFields>

</indexer>




             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   41
(opt.) Sharding configuration
{
  shardingKey: {
    value: {
      source: "variantProperty",
      property: "language"
    },
    type: "string"
  },

  mapping: {
    type: "list",
    entries: [
      { shard: "shard1", values: ["en", "it"] },
      { shard: "shard2", values: ["nl", "de", "es"] }
    ]
  }
}




             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   42
Lily API


» Java (using Avro)
  » http://docs.outerthought.org/lily-docs-current/g3/g1/390-lily.html

» REST (HTTP + JSON)
  » http://docs.outerthought.org/lily-docs-current/g3/g2/427-lily.html

» All docs
  » http://docs.outerthought.org/lily-docs-current/ext/toc/




              IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   43
Demo
» http://outerthought.blip.tv/file/4245615/




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   44
Lily and HBase

» adds high-level content model
 » data types
 » versioning
 » blob storage on HDFS
» focus on sparse (efficient) storage
» RowLog for synchronous cross-table updates and async
 message queues

        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   45
Lily and SOLR

» provides flexible mapping between HBase content
  model and SOLR index fields
» interactive and batch (M/R) index maintenance
» sharding
» use(s) SOLR as-is: loose, flexible, extensible coupling
» search access via SOLR (HTTP) API



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   46
Lily and CDH

» we intend to rely on CDH-‘blessed’ versions of HBase/
 HDFS/ZK
 » 700 patches and testing
» next: adopting similar distribution lay-out
» since we contribute patches to ASF HBase trunk, we would
  expect CDH to track closely (until HBase 1.0)
» some Lily users could be interested in ‘CDH-level’ services


        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   47
goodbye


» It’s open source !
» Content Repository: available now
  (Lily model + HBase + SOLR + RowLog)
» Lily 1.0 soon, will mainly focus on differentiating open
  source and enterprise edition
» “HBase is wa de max maat.”



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   48
Thank you !
                               for your attention
                               for your questions

                               » stevenn@outerthought.org

                               »           @stevenn

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

More Related Content

Similar to Huguk lily

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataNGDATA
 
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyHadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyCloudera, Inc.
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsMarcos Álvarez-Díaz
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of DataNGDATA
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
Big Data For Investment Research Management
Big Data For Investment Research ManagementBig Data For Investment Research Management
Big Data For Investment Research ManagementIDT Partners
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNGDATA
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011SEO CAMP
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopDataWorks Summit
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docbutest
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics CourseSyracuse University
 
Some Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information TechnologySome Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information TechnologyFranz-Josef Behr
 
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...lucenerevolution
 

Similar to Huguk lily (20)

Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart Data
 
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyHadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business Analytics
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
Big Data For Investment Research Management
Big Data For Investment Research ManagementBig Data For Investment Research Management
Big Data For Investment Research Management
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
Data mining
Data miningData mining
Data mining
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics Course
 
Some Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information TechnologySome Observations on Common Patterns in Information Technology
Some Observations on Common Patterns in Information Technology
 
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
Using the LucidWorks REST API to Support User-Configuration Big Data Search E...
 

More from Skills Matter

5 things cucumber is bad at by Richard Lawrence
5 things cucumber is bad at by Richard Lawrence5 things cucumber is bad at by Richard Lawrence
5 things cucumber is bad at by Richard LawrenceSkills Matter
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applicationsSkills Matter
 
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
Scala e xchange 2013 haoyi li on metascala a tiny diy jvmScala e xchange 2013 haoyi li on metascala a tiny diy jvm
Scala e xchange 2013 haoyi li on metascala a tiny diy jvmSkills Matter
 
Oscar reiken jr on our success at manheim
Oscar reiken jr on our success at manheimOscar reiken jr on our success at manheim
Oscar reiken jr on our success at manheimSkills Matter
 
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...Skills Matter
 
Cukeup nyc ian dees on elixir, erlang, and cucumberl
Cukeup nyc ian dees on elixir, erlang, and cucumberlCukeup nyc ian dees on elixir, erlang, and cucumberl
Cukeup nyc ian dees on elixir, erlang, and cucumberlSkills Matter
 
Cukeup nyc peter bell on getting started with cucumber.js
Cukeup nyc peter bell on getting started with cucumber.jsCukeup nyc peter bell on getting started with cucumber.js
Cukeup nyc peter bell on getting started with cucumber.jsSkills Matter
 
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...Skills Matter
 
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...Skills Matter
 
Progressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source worldProgressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source worldSkills Matter
 
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...Skills Matter
 
Dmitry mozorov on code quotations code as-data for f#
Dmitry mozorov on code quotations code as-data for f#Dmitry mozorov on code quotations code as-data for f#
Dmitry mozorov on code quotations code as-data for f#Skills Matter
 
A poet's guide_to_acceptance_testing
A poet's guide_to_acceptance_testingA poet's guide_to_acceptance_testing
A poet's guide_to_acceptance_testingSkills Matter
 
Russ miles-cloudfoundry-deep-dive
Russ miles-cloudfoundry-deep-diveRuss miles-cloudfoundry-deep-dive
Russ miles-cloudfoundry-deep-diveSkills Matter
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
I went to_a_communications_workshop_and_they_t
I went to_a_communications_workshop_and_they_tI went to_a_communications_workshop_and_they_t
I went to_a_communications_workshop_and_they_tSkills Matter
 

More from Skills Matter (20)

5 things cucumber is bad at by Richard Lawrence
5 things cucumber is bad at by Richard Lawrence5 things cucumber is bad at by Richard Lawrence
5 things cucumber is bad at by Richard Lawrence
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applications
 
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
Scala e xchange 2013 haoyi li on metascala a tiny diy jvmScala e xchange 2013 haoyi li on metascala a tiny diy jvm
Scala e xchange 2013 haoyi li on metascala a tiny diy jvm
 
Oscar reiken jr on our success at manheim
Oscar reiken jr on our success at manheimOscar reiken jr on our success at manheim
Oscar reiken jr on our success at manheim
 
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
Progressive f# tutorials nyc dmitry mozorov & jack pappas on code quotations ...
 
Cukeup nyc ian dees on elixir, erlang, and cucumberl
Cukeup nyc ian dees on elixir, erlang, and cucumberlCukeup nyc ian dees on elixir, erlang, and cucumberl
Cukeup nyc ian dees on elixir, erlang, and cucumberl
 
Cukeup nyc peter bell on getting started with cucumber.js
Cukeup nyc peter bell on getting started with cucumber.jsCukeup nyc peter bell on getting started with cucumber.js
Cukeup nyc peter bell on getting started with cucumber.js
 
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
Agile testing & bdd e xchange nyc 2013 jeffrey davidson & lav pathak & sam ho...
 
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
Progressive f# tutorials nyc rachel reese & phil trelford on try f# from zero...
 
Progressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source worldProgressive f# tutorials nyc don syme on keynote f# in the open source world
Progressive f# tutorials nyc don syme on keynote f# in the open source world
 
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
Agile testing & bdd e xchange nyc 2013 gojko adzic on bond villain guide to s...
 
Dmitry mozorov on code quotations code as-data for f#
Dmitry mozorov on code quotations code as-data for f#Dmitry mozorov on code quotations code as-data for f#
Dmitry mozorov on code quotations code as-data for f#
 
A poet's guide_to_acceptance_testing
A poet's guide_to_acceptance_testingA poet's guide_to_acceptance_testing
A poet's guide_to_acceptance_testing
 
Russ miles-cloudfoundry-deep-dive
Russ miles-cloudfoundry-deep-diveRuss miles-cloudfoundry-deep-dive
Russ miles-cloudfoundry-deep-dive
 
Serendipity-neo4j
Serendipity-neo4jSerendipity-neo4j
Serendipity-neo4j
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Plug 20110217
Plug   20110217Plug   20110217
Plug 20110217
 
Lug presentation
Lug presentationLug presentation
Lug presentation
 
I went to_a_communications_workshop_and_they_t
I went to_a_communications_workshop_and_they_tI went to_a_communications_workshop_and_they_t
I went to_a_communications_workshop_and_they_t
 
Plug saiku
Plug   saikuPlug   saiku
Plug saiku
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Huguk lily

  • 1. Lily Smart data at scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 2. big data, big problems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 3. MOORE vs data » IDC says Digital Universe will be 35 data Zettabytes by 2020 » 20% = enterprise data (structured, moore curated, $$$) » Facebook, Yahoo!, Google, Rapleaf, Amazon show us how the remaining 80% can be monetized » some of them even rent out their data platform » ... at the cost of infrastructure lock-in 1 Zettabyte = 1,000,000,000,000,000,000,000 bytes, or 1 billion terrabytes IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • 4. MOORE vs data data » coping with volume + need for timeliness = parallel processing moore » data becomes business-critical = resilience through distributed architectures » Hadoop, MapReduce, HBase: the future data platform IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  • 5. the CHALLENGES » process ALL data » process data in REAL-TIME » derive INSIGHTS » provide INSTANT FEEDBACK IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
  • 6. current thinking ETL data data STORE analytics warehouse batched, off-line, overnight IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • 7. 1. store and manage all YOUR data DATA IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • 8. 2. store user behaviour, nearby DATA USER Behavior IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
  • 9. 3. analyze usage patterns DATA data processing USER Behavior IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
  • 10. 4. add domain knowledge DATA data processing USER Behavior domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • 11. 5. process, in real-time DATA data processing recommendations semantic augmentation Analytics USER Behavior domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • 12. 6. augment data DATA data processing recommendations semantic augmentation Analytics USER Behavior domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • 13. data insights SMARTER DATA data processing s relation recommendations semantic augmentation Analytics domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • 14. data insights SMARTER DATA data processing s relation recommendations semantic augmentation Analytics domain knowledge patterns rules keywords lists SMART DATA, at SCALE ... ... and in real time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  • 15. stories IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 16. HYPER-PERSONAL recommendations NEWS TOGETHERNESS interestingness organisations names locations brands news aggregator scale IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • 17. up-selling CROSS-SELLING product CATALOG recommendedness relatedness product families related activities social graph e-retail real-time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • 18. competitive innovation patents (dis)SIMILARITY companies people materials processes IP research insights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • 19. outerthought “ The world is moving from content as a cost to data as an opportunity. We provide the tools and the platform to let organisations maximally benefit from the data they grow and collect. ” IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  • 20. Lily (now) » Large-scale content storage, indexing and search » Current pilots e-retail mobile media isp e-gov ip research » up-to now: 3 man-years investment (since Sept/2009) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
  • 21. Lily 1.0 (CR) data data STORE + warehouse + analytics real time } Lily 2.0 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • 22. lily USPs » Integrated approach, one-stop-data-shop » No more flat file processing (Hadoop) ➙ interactive database (HBase) ➡ all data » Real-time (vs. overnight) » instant feedback loops ➡ real-time » designed for on-line, interactive use ➡ easy » Available in-house, SaaS possible » Data Insights = data + customer retention IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  • 23. roadmap » now: Lily 0.3 » Along the road: » april 2011 : Lily 1.0 Lily SaaS edition » Q3 2011 » real-time statistics + analytics » Q2 2012 : Lily 2.0 » real-time data processing engine » Data Insights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • 24. open source » www.lilyproject.org » docs.outerthought.org/lily-docs-current/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • 25. Lily Core Concepts » storage » HBase » repository model » versioning, varianting, mixins » indexing » mapping » search » SOLR IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 26. falling in love with Hbase : phase 1 » automatic scaling to large data sets » fault-tolerance » flexible datamodel with sparse data » commodity hardware » efficient random access » community-based open source » Java if possible IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  • 27. falling in love with Hbase : phase 2 » need for consistency » atomic single-row updates » M/R for index regeneration IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  • 28. falling in love with Hbase : phase 3 HBase » datamodel with column families and cell versioning » ordered tables with range scans » HDFS for blob storage » Apache IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • 29. Lily Repository Model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
  • 30. Lily Datatypes IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  • 31. Mixins IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  • 32. Sample Lily Schema (excerpt) 

{ namespaces:
{ 



name:
"b$name", 



/*
Declaration
of
namespace
prefixes.
*/ 



valueType:
{
primitive:
"STRING"
}, 



"org.lilyproject.bookssample":
"b", 



scope:
"versioned" 



"org.lilyproject.vtag":
"vtag" 

}, 

}, 

{ fieldTypes:
[ 



name:
"b$bio", 

{ 



valueType:
{
primitive:
"STRING"
}, 



name:
"b$title", 



scope:
"versioned" 



valueType:
{
primitive:
"STRING"
}, 

}, 



scope:
"versioned" 

{ 

}, 



name:
"vtag$last", 

{ 



valueType:
{
primitive:
"LONG"
}, 



name:
"b$pages", 



scope:
"non_versioned" 



valueType:
{
primitive:
"INTEGER"
}, 

} 



scope:
"versioned" 

], 

}, recordTypes:
[ 

{ 

{ 



name:
"b$language", 



name:
"b$Book", 



valueType:
{
primitive:
"STRING"
}, 



fields:
[ 



scope:
"versioned" 





{name:
"b$title",
mandatory:
true
}, 

}, 





{name:
"b$pages",
mandatory:
false
}, 

{ 





{name:
"b$language",
mandatory:
false
}, 



name:
"b$authors", 





{name:
"b$authors",
mandatory:
false
}, 



valueType:
{
primitive:
"LINK",
multiValue:
true
}, 





{name:
"vtag$last",
mandatory:
false
} 



scope:
"versioned" 



] 

}, 

}, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  • 33. Lily Versioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  • 34. Flexible content model » generic enough to accomodate many popular content schemas » HTML5, CMIS, RDF, NewsML, Dublin Core, ... » academically verified » not limited to ‘content applications’ only » developer convenience » higher level constructs » schema reuse » versioning, linking, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  • 35. Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  • 36. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  • 37. HBase RowLog Library » need for sync/async operations » updating of secondary indexes (i.e. tables) » feeding of Indexer (= bridge to SOLR index maintenance) » not: transactions » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  • 38. HBase RowLog Library » WAL » Queue » guaranteed execution of synchronous » triggering of async actions actions » e.g. (re)index (updated) record with » call doesn’t return before secondary SOLR back-end action finishes » size depends on speed of back-end » e.g. update secondary index tables process » if all goes well, size = #concurrent ops » useful outside of Lily context as well! IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  • 39. The Lily Indexer sharding towards indexing of multiple incremental index blob content denormalization batch index building multiple SOLR versions of a record updating extraction instances IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
  • 40. Indexing configuration (SOLR) <schema name="example" version="1.2"> <types> [snipped: see SOLR example schema] </types> <fields> <!-- Fields which are required by Lily --> <field name="@@key" type="string" indexed="true" stored="true" required="true"/> <field name="@@id" type="string" indexed="true" stored="true" required="true"/> <field name="@@vtag" type="string" indexed="true" stored="true" required="true"/> <field name="@@versionless" type="string" indexed="true" stored="true" required="false"/> <!-- Your own fields --> <field name="title" type="text" indexed="true" stored="true" required="false"/> <field name="authors" type="text" indexed="true" stored="true" required="false" multiValued="true"/> </fields> <uniqueKey>@@key</uniqueKey> <defaultSearchField>title</defaultSearchField> <solrQueryParser defaultOperator="OR"/> </schema> IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
  • 41. Indexer configuration (Lily) <?xml version="1.0"?> <indexer xmlns:b="org.lilyproject.bookssample"> <cases> <case recordType="b:Book" variant="*" vtags="last" indexVersionless="true"/> </cases> <indexFields> <indexField name="title"> <value> <field name="b:title"/> </value> </indexField> <indexField name="authors"> <value> <deref> <follow field="b:authors"/> <field name="b:name"/> </deref> </value> </indexField> </indexFields> </indexer> IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 41
  • 42. (opt.) Sharding configuration {   shardingKey: {     value: {       source: "variantProperty",       property: "language"     },     type: "string"   },   mapping: {     type: "list",     entries: [       { shard: "shard1", values: ["en", "it"] },       { shard: "shard2", values: ["nl", "de", "es"] }     ]   } } IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 42
  • 43. Lily API » Java (using Avro) » http://docs.outerthought.org/lily-docs-current/g3/g1/390-lily.html » REST (HTTP + JSON) » http://docs.outerthought.org/lily-docs-current/g3/g2/427-lily.html » All docs » http://docs.outerthought.org/lily-docs-current/ext/toc/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
  • 44. Demo » http://outerthought.blip.tv/file/4245615/ IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
  • 45. Lily and HBase » adds high-level content model » data types » versioning » blob storage on HDFS » focus on sparse (efficient) storage » RowLog for synchronous cross-table updates and async message queues IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
  • 46. Lily and SOLR » provides flexible mapping between HBase content model and SOLR index fields » interactive and batch (M/R) index maintenance » sharding » use(s) SOLR as-is: loose, flexible, extensible coupling » search access via SOLR (HTTP) API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
  • 47. Lily and CDH » we intend to rely on CDH-‘blessed’ versions of HBase/ HDFS/ZK » 700 patches and testing » next: adopting similar distribution lay-out » since we contribute patches to ASF HBase trunk, we would expect CDH to track closely (until HBase 1.0) » some Lily users could be interested in ‘CDH-level’ services IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
  • 48. goodbye » It’s open source ! » Content Repository: available now (Lily model + HBase + SOLR + RowLog) » Lily 1.0 soon, will mainly focus on differentiating open source and enterprise edition » “HBase is wa de max maat.” IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
  • 49. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org