SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Small, Medium & Big Data
Pierre De Wilde
23 November 2012
ULB - MASTIC
http://mastic.ulb.ac.be
Sir Tim Berners-Lee




             http://www.w3.org/People/Berners-Lee/
Semantic Web Trends




        http://www.google.com/trends/explore#q=semantic%20web
Linked Data Trends




   http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
Linked Data Cloud




 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Semantic Web


               Semantic
                 URI, RDF(S), OWL, SPARQL



               Web
                 Scale ?
Web Scale


            Million of servers
            Billion of users
            Billion of objects


            => it's really Big
Big Data Trends




    http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
Big Data 3 V's




    It's not only about big volume of data...
V for ...




            Source: Anonymous
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
How Big is our Data?


        M     mega            million             106
        G     giga            billion             109
        T     tera            trillion            1012
        P     peta            quadrillion         1015
        E     exa             quintillion         1018
        Z     zetta           sextillion          1021
        Y     yotta           septillion          1024



            Check The Powers of Ten (1977) on YouTube
Big Data Sources


       Million of servers (logs)

       Billion of users (social networks)

       Billion of devices (smartphones)

       + Time/Space = Big Data
Big Data Examples


            Facebook collects 500 TB per day (1)

            Google processes 24 PB per day (2)

            We create 2.5 EB per day (3)




    (1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/
                       (2) http://en.wikipedia.org/wiki/Petabyte (2009)
                     (3) http://www-01.ibm.com/software/data/bigdata/
How Small is our Wisdom?

                           Wisdom




                        Knowledge



                      Information


                   Big Data

            Where is the wisdom we have lost in knowledge?
          Where is the knowledge we have lost in information?

                                        T. S. Eliot, The Rock
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
Scalability


        Scaling up and Scaling out

        Partitioning and Sharding
Relational Databases
RDBMS


        Row Store

        B-tree indexing

        SQL as query language
RDBMS issues


      Scale up (big servers)

      Schemaful (structured)

      Index-intensive (join)
NoSQL


        Scale out (commodity servers)

        Schemaless (semi-structured)

        Index-free adjacency (graph)
NoSQL databases




              Credit: Neo Technology
Key-Value Stores


       (Key:string) => Value

       fast read, low write latency

       used for sessions, carts




        Dynamo: Amazon’s Highly Available Key-value Store (2007)
Bigtable Clones


        Google's Distributed Storage System

        (row:string, col:string, ts:int64) => string

        used by Google & most companies




       Bigtable: A Distributed Storage System for Structured Data (2006)
Document Databases


       document-oriented (content query)

       semi-structured data (JSON)

       used for web apps
Graph Databases


       property graph

       index-free adjacency

       used for recommendations, social networks
Graph




        G = (V, E)
Property Graph




     A property graph is a directed, labeled, attributed graph
Graph Traversal


                              Gremlin is jumping

                              - from vertex to vertex
                              - from vertex to edge
                              - from edge to vertex




            https://github.com/tinkerpop/gremlin/wiki
DBpedia Traversal


                                 +                                 +
gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql")

gremlin> r = g.v('http://dbpedia.org/resource/Tim_Berners-Lee')

gremlin> r.out('http://www.w3.org/2000/01/rdf-schema#comment').has('lang','fr').value
==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeur
du World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nom
officiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium
(W3C), organisme qu'il a fondé.

gremlin> r.in('http://dbpedia.org/ontology/influenced')
==>v[http://dbpedia.org/resource/Paul_Otlet]

gremlin> r.in('http://dbpedia.org/ontology/influenced').out('http://dbpedia.org/ontology/influenced')
==>v[http://dbpedia.org/resource/Douglas_Engelbart]
==>v[http://dbpedia.org/resource/Ted_Nelson]
==>v[http://dbpedia.org/resource/Vannevar_Bush]
==>v[http://dbpedia.org/resource/Tim_Berners-Lee]
...
Triple/RDF Stores


        Subject-Predicate-Object

        SPARQL as query language

        AllegroGraph, OpenLink Virtuoso, ...
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
Big Data Processing



        Batch Processing
          MapReduce


        Interactive Analysis
          BigQuery
MapReduce




      MapReduce: Simplified Data Processing on Large Clusters (2004)
Apache Hadoop




        Distributed Data + MapReduce




                http://hadoop.apache.org/
Last Trends




   http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
NoSQL issues


       No Distributed Transactions

       No SQL as query language
NewSQL




    NoSQL + Distributed Transactions + SQL




         Spanner: Google's Globally-Distributed Database (2012)
Thank you




Credit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists

Contenu connexe

Tendances

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKristof Jozsa
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"Nicola Ferraro
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big DataYvette Teiken
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
Big data and data science
Big data and data scienceBig data and data science
Big data and data scienceSong Xue
 
Anita Graser: Analyzing Movment Data with MovingPandas
Anita Graser: Analyzing Movment Data  with MovingPandas Anita Graser: Analyzing Movment Data  with MovingPandas
Anita Graser: Analyzing Movment Data with MovingPandas Vienna Data Science Group
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Arohi Khandelwal
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudSteffen Staab
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 

Tendances (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Big data and data science
Big data and data scienceBig data and data science
Big data and data science
 
Anita Graser: Analyzing Movment Data with MovingPandas
Anita Graser: Analyzing Movment Data  with MovingPandas Anita Graser: Analyzing Movment Data  with MovingPandas
Anita Graser: Analyzing Movment Data with MovingPandas
 
Token
TokenToken
Token
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 

Similaire à Small, Medium and Big Data

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataMelissa Hornbostel
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)Emil Eifrem
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 

Similaire à Small, Medium and Big Data (20)

Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Small, Medium and Big Data

  • 1. Small, Medium & Big Data Pierre De Wilde 23 November 2012 ULB - MASTIC http://mastic.ulb.ac.be
  • 2. Sir Tim Berners-Lee http://www.w3.org/People/Berners-Lee/
  • 3. Semantic Web Trends http://www.google.com/trends/explore#q=semantic%20web
  • 4. Linked Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
  • 5. Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 6. Semantic Web Semantic URI, RDF(S), OWL, SPARQL Web Scale ?
  • 7. Web Scale Million of servers Billion of users Billion of objects => it's really Big
  • 8. Big Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
  • 9. Big Data 3 V's It's not only about big volume of data...
  • 10. V for ... Source: Anonymous
  • 11. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 12. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 13. How Big is our Data? M mega million 106 G giga billion 109 T tera trillion 1012 P peta quadrillion 1015 E exa quintillion 1018 Z zetta sextillion 1021 Y yotta septillion 1024 Check The Powers of Ten (1977) on YouTube
  • 14. Big Data Sources Million of servers (logs) Billion of users (social networks) Billion of devices (smartphones) + Time/Space = Big Data
  • 15. Big Data Examples Facebook collects 500 TB per day (1) Google processes 24 PB per day (2) We create 2.5 EB per day (3) (1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/ (2) http://en.wikipedia.org/wiki/Petabyte (2009) (3) http://www-01.ibm.com/software/data/bigdata/
  • 16. How Small is our Wisdom? Wisdom Knowledge Information Big Data Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot, The Rock
  • 17. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 18. Scalability Scaling up and Scaling out Partitioning and Sharding
  • 20. RDBMS Row Store B-tree indexing SQL as query language
  • 21. RDBMS issues Scale up (big servers) Schemaful (structured) Index-intensive (join)
  • 22. NoSQL Scale out (commodity servers) Schemaless (semi-structured) Index-free adjacency (graph)
  • 23. NoSQL databases Credit: Neo Technology
  • 24. Key-Value Stores (Key:string) => Value fast read, low write latency used for sessions, carts Dynamo: Amazon’s Highly Available Key-value Store (2007)
  • 25. Bigtable Clones Google's Distributed Storage System (row:string, col:string, ts:int64) => string used by Google & most companies Bigtable: A Distributed Storage System for Structured Data (2006)
  • 26. Document Databases document-oriented (content query) semi-structured data (JSON) used for web apps
  • 27. Graph Databases property graph index-free adjacency used for recommendations, social networks
  • 28. Graph G = (V, E)
  • 29. Property Graph A property graph is a directed, labeled, attributed graph
  • 30. Graph Traversal Gremlin is jumping - from vertex to vertex - from vertex to edge - from edge to vertex https://github.com/tinkerpop/gremlin/wiki
  • 31. DBpedia Traversal + + gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql") gremlin> r = g.v('http://dbpedia.org/resource/Tim_Berners-Lee') gremlin> r.out('http://www.w3.org/2000/01/rdf-schema#comment').has('lang','fr').value ==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeur du World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nom officiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium (W3C), organisme qu'il a fondé. gremlin> r.in('http://dbpedia.org/ontology/influenced') ==>v[http://dbpedia.org/resource/Paul_Otlet] gremlin> r.in('http://dbpedia.org/ontology/influenced').out('http://dbpedia.org/ontology/influenced') ==>v[http://dbpedia.org/resource/Douglas_Engelbart] ==>v[http://dbpedia.org/resource/Ted_Nelson] ==>v[http://dbpedia.org/resource/Vannevar_Bush] ==>v[http://dbpedia.org/resource/Tim_Berners-Lee] ...
  • 32. Triple/RDF Stores Subject-Predicate-Object SPARQL as query language AllegroGraph, OpenLink Virtuoso, ...
  • 33. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 34. Big Data Processing Batch Processing MapReduce Interactive Analysis BigQuery
  • 35. MapReduce MapReduce: Simplified Data Processing on Large Clusters (2004)
  • 36. Apache Hadoop Distributed Data + MapReduce http://hadoop.apache.org/
  • 37. Last Trends http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
  • 38. NoSQL issues No Distributed Transactions No SQL as query language
  • 39. NewSQL NoSQL + Distributed Transactions + SQL Spanner: Google's Globally-Distributed Database (2012)
  • 40. Thank you Credit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists