SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Welcome to the age of data!
                                 BIGDATA.BE

     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
who am i

» Steven Noels
» Founder & VP Product
 » Makers of Lily: Interactive Big Data
   platform
» Open Source / Apache Software
  Foundation
» co-founder bigdata.be


         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   2
Houston,
 we have
a problem.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
We’re
drowning.

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Drowning
in a
Sea
of
Data.
 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Mountains of
      Metadata.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
The firehose
  of UGC.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Still, we
can’t make
much sense
      of it.

   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
... and we
throw a lot of
       it away.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
We regard
DATA as cost.

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
But data is an
opportunity.

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
Think about it.

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
advertisements
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   13
recommendations
 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   14
fraud detection
 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   15
eyeballs
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   16
churn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
The future is
for
datanerds.
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
This is what Big
Data is about:
new insights,
new business.
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
3 issues for
BIG DATA
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
volume
                              need:
                              more
                             capacity
                                                            data




                                                                        moore
                                                                                     1
                                                                             time




         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
solution:
distributed
 systems                                                                      1
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   22
1
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23
distributed
systems are
                                                                              1
hard.
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
2
database

    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
2
database                     data warehouse

    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
2
database                     data warehouse                          analytics

    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org    25
data shuffling, data duplication
                                                                                   2
  database                     data warehouse                          analytics

      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org    25
“Top-performing
         organizations are twice
         as likely to apply
         analytics to activities.”
                                                                            3
         (MIT Sloan Management
         Review, Winter 2011)



IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   26
enter
 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   27
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   28
HBase




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   28
what is hadoop ?


           1 server                                                RAM
                                                                   CPU
                                                                  Disk


     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   29
RAM                                                                                 HBASE

CPU                                                             MAP/REDUCE

DISK                                                                               HDFS


                      many servers
       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            30
map/reduce




    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   31
map/reduce


» Batch-oriented
» Data locality (code is shipped around)
» Heavy parallellization
» Process management
» Append-only files




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   32
Hadoop ecosystem
» Hadoop Common                                             » Hive: A data warehouse infrastructure

» Subprojects                                                   that provides data summarization and
                                                                ad hoc querying.
  » Flume/SQOOP: Data collection systems
                                                            » MapReduce: A software framework for
    for large distributed systems.
                                                                distributed processing of large data
  » HBase: A scalable, distributed database                     sets on compute clusters.
    that supports structured data storage
                                                            » Pig: A high-level data-flow language
    for large/wide tables.
                                                                and execution framework for parallel
  » HDFS: A distributed file system that                         computation.
    provides high throughput access to
                                                            » ZooKeeper: A high-performance
    application data.
                                                                coordination service for distributed
                                                                applications.
                                                            » Mahout: machine learning libraries


                IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org              33
High-level data model / easy API                           indexes



    UI Framework                                  SDK
            (HUE)                               (HUE SDK)

                                                                         Search
                                                                                               Dev2Dev
  Workflow               Scheduling                    Metadata                               tutoring,
    (OOZIE)                  (oozie)                    (HIVE)                               integrated
                                                                                             deployment
                                                                                                 and
                         Languages /                                                         enterprise
    Data                  Compilers                       Fast       usage metrics,            support
Integration                (PIG, HIVE)                 Read/Write     analytics &
  (FLUME,                                                Access        recommen-
  SQOOP)                                                 (HBASE)        dations
                           (PIG, HIVE)



                                       Coordination
                                         (ZOOKEEPER)


                                                                            CDH
                 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                34
real-time big data architecture

                                              1. compensate for high latency of updates to serving layer
   speed layer                                2. fast, incremental algorithms
                                              3. batch layer eventually overrides speed layer
                        storm


                                              1. random access to batch views
   serving layer                              2. updated by batch layer



                                              1. store master dataset (append-only)
    batch layer                               2. compute arbitrary views




       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                           35
Hadoop, interactive.

Analytics                                 Interactics                                  (RDBMS)
 batch                                         interactive
 static files                                   data management

 1018                                          1015                                        109-12



               IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            36
news & media


            smart                              data management
            insights                                   indexing
                                                         search
commerce




                                                                              finance
           interactive
           audience                                        profile
           metrics                                     harvesting


                                   telecom




                             My baby: Lily.

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            37
The start of Lily.


IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   38
Thank you !
                               for your attention
                               for your questions

                               » steven.noels@outerthought.com

                               »           @stevenn

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Contenu connexe

Similaire à Welcome to the Age of Data

N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaNGDATA
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataNGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UKNGDATA
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is youDavide Carboni
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranJoseph Glorieux
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENEWorkshop
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...Henry Muccini
 
Possibilities of generative models
Possibilities of generative modelsPossibilities of generative models
Possibilities of generative modelsAlison B. Lowndes
 
big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork OCTO Technology Suisse
 

Similaire à Welcome to the Age of Data (20)

N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart Data
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
Huguk lily
Huguk lilyHuguk lily
Huguk lily
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is you
 
Cloud applications
Cloud applicationsCloud applications
Cloud applications
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écran
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
 
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
 
Possibilities of generative models
Possibilities of generative modelsPossibilities of generative models
Possibilities of generative models
 
big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork big data et data viz - du lac à votre écran - afterwork
big data et data viz - du lac à votre écran - afterwork
 
VoltDB on SolftLayer Cloud
VoltDB on SolftLayer CloudVoltDB on SolftLayer Cloud
VoltDB on SolftLayer Cloud
 

Plus de NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghentNGDATA
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at DevoxxNGDATA
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNGDATA
 

Plus de NGDATA (6)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 

Dernier

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Welcome to the Age of Data

  • 1. Welcome to the age of data! BIGDATA.BE IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 2. who am i » Steven Noels » Founder & VP Product » Makers of Lily: Interactive Big Data platform » Open Source / Apache Software Foundation » co-founder bigdata.be IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  • 3. Houston, we have a problem. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 4. We’re drowning. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 5. Drowning in a Sea of Data. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 6. Mountains of Metadata. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 7. The firehose of UGC. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 8. Still, we can’t make much sense of it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 9. ... and we throw a lot of it away. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 10. We regard DATA as cost. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 11. But data is an opportunity. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 12. Think about it. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 13. advertisements IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
  • 14. recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
  • 15. fraud detection IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
  • 16. eyeballs IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
  • 17. churn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • 18. The future is for datanerds. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 19. This is what Big Data is about: new insights, new business. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 20. 3 issues for BIG DATA IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 21. volume need: more capacity data moore 1 time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • 22. solution: distributed systems 1 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  • 23. 1 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • 24. distributed systems are 1 hard. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 25. 2 database IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 26. 2 database data warehouse IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 27. 2 database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 28. data shuffling, data duplication 2 database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 29. “Top-performing organizations are twice as likely to apply analytics to activities.” 3 (MIT Sloan Management Review, Winter 2011) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  • 30. enter IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
  • 31. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • 32. HBase IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • 33. what is hadoop ? 1 server RAM CPU Disk IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
  • 34. RAM HBASE CPU MAP/REDUCE DISK HDFS many servers IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
  • 35. map/reduce IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  • 36. map/reduce » Batch-oriented » Data locality (code is shipped around) » Heavy parallellization » Process management » Append-only files IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
  • 37. Hadoop ecosystem » Hadoop Common » Hive: A data warehouse infrastructure » Subprojects that provides data summarization and ad hoc querying. » Flume/SQOOP: Data collection systems » MapReduce: A software framework for for large distributed systems. distributed processing of large data » HBase: A scalable, distributed database sets on compute clusters. that supports structured data storage » Pig: A high-level data-flow language for large/wide tables. and execution framework for parallel » HDFS: A distributed file system that computation. provides high throughput access to » ZooKeeper: A high-performance application data. coordination service for distributed applications. » Mahout: machine learning libraries IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  • 38. High-level data model / easy API indexes UI Framework SDK (HUE) (HUE SDK) Search Dev2Dev Workflow Scheduling Metadata tutoring, (OOZIE) (oozie) (HIVE) integrated deployment and Languages / enterprise Data Compilers Fast usage metrics, support Integration (PIG, HIVE) Read/Write analytics & (FLUME, Access recommen- SQOOP) (HBASE) dations (PIG, HIVE) Coordination (ZOOKEEPER) CDH IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
  • 39. real-time big data architecture 1. compensate for high latency of updates to serving layer speed layer 2. fast, incremental algorithms 3. batch layer eventually overrides speed layer storm 1. random access to batch views serving layer 2. updated by batch layer 1. store master dataset (append-only) batch layer 2. compute arbitrary views IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  • 40. Hadoop, interactive. Analytics Interactics (RDBMS) batch interactive static files data management 1018 1015 109-12 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  • 41. news & media smart data management insights indexing search commerce finance interactive audience profile metrics harvesting telecom My baby: Lily. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  • 42. The start of Lily. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  • 43. Thank you ! for your attention for your questions » steven.noels@outerthought.com » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org