SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
The road lies plain before
                                                                       me;--'tis a theme
                                                                          Single and of
                                                                 determined bounds; …
                                                            - Wordsworth, The Prelude

                                                              m
                                                    pre ss.co
                                             . word            ol
                                     bl eclix         te Scho
                            p:/ /dou          Gr adua            2
                  ka  r, htt        val Post            l2 7,201
           n a San             r, Na              Apri
     Krish                 in a
                  st Sem
         hD   Gue
    00–P
EC40
What is
    Big
   Data ?	

                    Big
                   Data to
                   smart
                    data	

                                           Big
                                          Data
                                         Pipeline	

o  Agenda
   o  To cover the broad
      picture
   o  Touch upon
      instances of the         Analytics/                          Cloud
      technologies             Modeling
                                                   Analytic
                                   R
                                                  Algorithms    Architectures	

      employed
o  Of the Big Data                               Processing -     Storage -
   domain …                   Visualization
                                                   Hadoop           NOSQL
Thanks to …
The giants whose
 shoulders I am
  standing on 




                                                                            Special	
  Thanks	
  to:	
  
                                                         	
  	
  	
  Peter	
  Ateshian,	
  NPS	
  
                               	
  	
  	
  Prof	
  Murali	
  Tummala,	
  NPS	
  
                                              	
  	
  	
  Shirley	
  Bailes,O’Reilly	
  
                                                               	
  	
  	
  Ed	
  Dumbill,O’Reilly	
  
                                                                                 	
  	
  	
  Jeff	
  Barr,AWS	
  
                   	
  	
  	
  Jenny	
  Kohr	
  Chynoweth,AWS	
  
Porcelain vs. Plumbing
                              	


                     • The balance is always
                       interesting …	

                     • This talk has both	



• Would be happy to dive deep
  into plumbing topics like
  Hadoop, R, MongoDB,
  Cassandra et al…
EBC322	
  




①  Volume	

   o    Scale	
  
②  Velocity	

  o     Data	
  change	
  rate	
  vs.	
  decision	
  window	
  
③  Variety	

   o    Different	
  sources	
  &	
  formats	
  
   o    Structured	
  vs.	
  Unstructured	
  
④  Variability	

   o    Breadth	
  of	
  interpreta<on	
  &	
  
   o    Depth	
  of	
  analy<cs	
  

                                         hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/	
  
                                                           hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf	
  
                                 hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence	
  
EBC322	
  




①  Volume	

   o    Scale	
  
②  Velocity	

  o     Data	
  change	
  rate	
  vs.	
  decision	
  window	
  
③  Variety	

   o    Different	
  sources	
  &	
  formats	
  
   o    Structured	
  vs.	
  Unstructured	
  
④  Variability	

   o    Breadth	
  of	
  interpreta<on	
  &	
  
   o    Depth	
  of	
  analy<cs	
  

                                         hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/	
  
                                                           hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf	
  
                                 hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence	
  
EBC322	
  




①  Volume	

   o    Scale	
  
②  Velocity	

  o     Data	
  change	
  rate	
  vs.	
  decision	
  window	
  
③  Variety	

   o    Different	
  sources	
  &	
  formats	
  
   o    Structured	
  vs.	
  Unstructured	
  
④  Variability	

   o    Breadth	
  of	
  interpreta<on	
  &	
  
   o    Depth	
  of	
  analy<cs	
  

                                         hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/	
  
                                                           hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf	
  
                                 hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence	
  
EBC322	
  




①  Volume	

   o    Scale	
  
②  Velocity	

  o     Data	
  change	
  rate	
  vs.	
  decision	
  window	
  
③  Variety	

   o    Different	
  sources	
  &	
  formats	
  
   o    Structured	
  vs.	
  Unstructured	
  
④  Variability	

   o    Breadth	
  of	
  interpreta<on	
  &	
  
   o    Depth	
  of	
  analy<cs	
  

                                         hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/	
  
                                                           hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf	
  
                                 hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence	
  
EBC322	
  




①         Volume	

     o          Scale	
  
②         Velocity	

     o         Data	
  change	
  rate	
  vs.	
  decision	
  window	
  
③         Variety	

     o          Different	
  sources	
  &	
  formats	
  
     o          Structured	
  vs.	
  Unstructured	
  
④         Variability	

     o          Breadth	
  of	
  interpreta<on	
  &	
  
     o          Depth	
  of	
  analy<cs	
  

⑤  Contextual	

     o          Dynamic	
  variability	
  
     o          RecommendaWon	
  
⑥  Connectedness	

                                                                         hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/	
  
                                                                                           hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf	
  
•  “…	
  they	
  didn’t	
  need	
  a	
  genius,	
  …	
  but	
  build	
  the	
  world’s	
  most	
  impressive	
  
      dileKante	
  …	
  baKling	
  the	
  efficient	
  human	
  mind	
  with	
  spectacular	
  
      flamboyant	
  inefficiency”	
  –	
  Final	
  Jeopardy	
  by	
  Stephen	
  Baker	
  
   •  15	
  TB	
  memory,	
  across	
  90	
  IBM	
  760	
  servers,	
  in	
  10	
  racks	
  
   •  1	
  TB	
  of	
  dataset	
  
   •  200	
  Million	
  pages	
  processed	
  by	
  Hadoop	
  
   •  This	
  is	
  a	
  good	
  example	
  of	
  Connected	
  data	
  
          –  Contextual	
  w/	
  variability	
  
          –  Breath	
  of	
  interpretaWon	
  
          –  AnalyWcs	
  depth	
  


hKp://doubleclix.wordpress.com/2011/03/01/the-­‐educaWon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy
%E2%80%9D-­‐by-­‐stephen-­‐baker/	
  
hKp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/	
  
Ref:	
  hKp://www.ciol.com/News/News/News-­‐Reports/Vinod-­‐Khosla%E2%80%99s-­‐cool-­‐dozen-­‐tech-­‐innovaWons/156307/0/	
  
hKp://yourstory.in/2011/11/vinod-­‐khoslas-­‐keynote-­‐at-­‐nasscom-­‐product-­‐conclave-­‐reject-­‐punditry-­‐believe-­‐in-­‐an-­‐idea-­‐take-­‐risk-­‐and-­‐succeed/	
  
Ref:h&p:goo.gl/Mm83k	

                                                                                              Infer-ability	


                                                                            Model	

                            Internal	
  
                                                                                                                dashboards,	
  
                                                                                       Hand	
                   Tableau	
  
                                          Context	

                                   coded	
                  	
  
                                                                                       Programs,	
  
                             Connectedness	

                                          R,	
  Mahout,	
  
                                                                                       …	
  
                                                        SQL,	
  	
                     	
  
                    Variety	

                          BI	
  Tools,	
  
                                                        Hadoop,	
  
                                                        Pig,	
  Hive,	
  	
  
              Variability	

 SQL	
                      .NET	
  
                                                        Dryad,	
  
                               NOSQL,	
  
           Logs,	
                                      Various	
  
Velocity	

Scribe,	
           HDFS,	
  
                               XML,	
  
                                                        other	
  tools	
  
           Flume,	
  
                               =iles,	
  …	
  
Volume	

Storm,	
              	
  
           Hadoop
           …	
  




              Decomplexify!                      Contextualize!                 Network!           Reason!        Infer!
Twitter	

  §      200 million tweets/day	

  §      Peak 10,000/second	

  §      How would you handle the fire
          hose for social network analytics 	

                                            ?
                                                         AWS – 900 Billion objects!
                                    Zynga	

                                        §      “Analytics company, not a
                                                gaming company!”	

                                        §      Harvests data : 15 TB/day	

Storage	

                                    §    Test new features	

    §     4 U box = 40 TB,	

                §    Target advertising	

           1 PB = 25 boxes !	

    § 
                                        §      230 million players/month	

                                                                      hKp://goo.gl/dcBsQ	
  
•  6	
  Billion	
  Messages	
  per	
  
   day	
  
•  2	
  PB	
  (w/compression)	
  
   online	
  
•  6	
  PB	
  w/	
  replicaWon	
  
•  250	
  TB/Month	
  growth	
  
•  HBase	
  Infrastructure	
  
eBay	
  Extreme	
  
                                                                                                  AnalyWcs	
  
                                                                                                  Architecture	
  




                       50	
  TB/Day	
                                            Very	
  systemaWc	
  
                                          240	
  nodes,	
  84	
  PB	
            Diagram	
  speaks	
  volumes!	
  
Path	
  Analysis	
                        Teradata	
  InstallaWon	
  
A/B	
  TesWng	
                                             Ref:	
  hKp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf	
  
D3.js	
  
                                                                  Tableau	
  
                                                      R	
        Dashboard	
  
                                               Mahout	
  
                               Hadoop	
        BI	
  Tools	
       Predict,
                               Pig/Hive	
                        Recommend
               NOSQL	
                        Model &            & Visualize
              Cassandra	
          R	
  
                                              Reason
              MongoDB	
  
                              Transform
 Splunk	
       Hbase	
  
                              & Analyze
 Scribe	
       Neo4j	
  
 Flume	
  
 Storm	
        Store
                                          When I think of my own native land, !
Collect                                      In a moment I seem to be there; !
                                                But, alas! recollection at hand
                                              Soon hurries me back to despair.!
                                   - Cowper, The Solitude Of Alexander SelKirk!
NOSQL	
  



   Key	
  Value	
        Column	
           Document	
             Graph	
  


  In-­‐memory	
         SimpleDB	
           CouchDB	
              Neo4j	
  

 Memcached	
             Google	
  
                                            MongoDB	
              FlockDB	
  
                        BigTable	
  
  Disk	
  Based	
  
                          HBase	
         Lotus	
  Domino	
     InfiniteGraph	
  
     Redis	
  
                       Cassandra	
              Riak	
  
Tokyo	
  Cabinet	
  

   Dynamo	
            HyperTable	
  


  Voldemort	
           Azure	
  TS	
  
MapReduce




•  Data	
  parallelism	
  
•  Large	
  InstallaWons	
  (many	
  ~5000	
  node	
  clusters!)	
  
Sotware	
  As	
  A	
  Service	
  




Plasorm	
  As	
  A	
  Service	
  




Infrastructure	
  As	
  A	
  Service	
  




                                           19	
  
Amazon – Canonical Cloud

       •     S3	
  –	
  Blob	
  storage	
  
       •     Dynamo	
  DB	
  –	
  NOSQL	
  
       •     EMR	
  –	
  ElasWc	
  Map	
  Reduce	
  
       •     EC2	
  –	
  Compute	
  
       •     1%	
  of	
  Internet	
  traffic	
  
“Scalability is about building wider roads,
not about building faster cars” – Steve
Swartz	


hKp://blog.deepfield.net/2012/04/18/how-­‐big-­‐is-­‐amazons-­‐cloud/	
  
hKp://www.slideshare.net/AmazonWebServices/keynote-­‐your-­‐future-­‐with-­‐cloud-­‐compuWng-­‐dr-­‐werner-­‐vogels-­‐aws-­‐summit-­‐2012-­‐nyc	
  
EC2




                                               EC2



hKp://openclipart.org/detail/152311/internet-­‐cloud-­‐by-­‐b.gaulWer,hKp://openclipart.org/detail/17847	
  
•    Social	
  Network	
  Analysis	
  
   •    SenWment	
  Analysis	
  
   •    Brand	
  Strength	
  
   •    CitaWon/co-­‐citaWon	
  ≅	
  Followed	
  by/Also	
  Follows	
  
   •    Metrics	
  
                                                                  Tweets	
  
         –    Network	
  diameter,	
  	
                          Followers	
  
         –    Weak-­‐Wes,	
  	
                                   Follow/Unfollow	
  

         –    Erdös-­‐Renyi	
  model	
  &	
  	
  
         –    Kronecker	
  Graphs	
  


hKp://www.oscon.com/oscon2012/public/schedule/detail/23130	
  
Was it a vision, or a waking dream?!
Fled is that music:—do I wake or sleep?!
                  -Keats, Ode to a Nightingale!

Contenu connexe

Tendances

[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹台灣資料科學年會
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingThomas Delteil
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AITrey Grainger
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法台灣資料科學年會
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered SearchTrey Grainger
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User IntentTrey Grainger
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2Nadine Ludwig
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for MeaningTrey Grainger
 

Tendances (20)

[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
[系列活動] 無所不在的自然語言處理—基礎概念、技術與工具介紹
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
 
Big data
Big dataBig data
Big data
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 

En vedette

Netric for Publishers
Netric for PublishersNetric for Publishers
Netric for PublishersNetric
 
Resume A. Rinaldi - ENG
Resume A. Rinaldi - ENGResume A. Rinaldi - ENG
Resume A. Rinaldi - ENGArturo Rinaldi
 
Apartamento T1 Pipa Natal -T1 Apartment Pipa Natal
Apartamento T1 Pipa Natal -T1 Apartment Pipa NatalApartamento T1 Pipa Natal -T1 Apartment Pipa Natal
Apartamento T1 Pipa Natal -T1 Apartment Pipa Natalcarlosaugusto74
 
Thesis A. Rinaldi (PDF Slides)
Thesis A. Rinaldi (PDF Slides)Thesis A. Rinaldi (PDF Slides)
Thesis A. Rinaldi (PDF Slides)Arturo Rinaldi
 
Vasco da gama grace and sam2
Vasco da gama grace and sam2Vasco da gama grace and sam2
Vasco da gama grace and sam2guest893afef
 
ABC Breakfast Club m DanZafe: Effektiv lageroprydning
ABC Breakfast Club m DanZafe: Effektiv lageroprydningABC Breakfast Club m DanZafe: Effektiv lageroprydning
ABC Breakfast Club m DanZafe: Effektiv lageroprydningABC Softwork
 
Mobility as the New Innovation Driver in the Enterprises
Mobility as the New Innovation Driver in the EnterprisesMobility as the New Innovation Driver in the Enterprises
Mobility as the New Innovation Driver in the EnterprisesWinWire Technologies Inc
 
An independent view on the evolution of the Internet
An independent view on the evolution of the InternetAn independent view on the evolution of the Internet
An independent view on the evolution of the InternetOlivier Martin
 
Proposed New US fashion design law 2010
Proposed New US fashion design law 2010Proposed New US fashion design law 2010
Proposed New US fashion design law 2010Darrell Mottley
 
Feast of saint martin celebrated in santarcangelo by cassoli alberto 3 d
Feast of saint martin  celebrated in santarcangelo by cassoli alberto 3 dFeast of saint martin  celebrated in santarcangelo by cassoli alberto 3 d
Feast of saint martin celebrated in santarcangelo by cassoli alberto 3 dbarbelkarlsruhe
 
2015 AHP International Conference session - Operations Opportunities
2015 AHP International Conference session - Operations Opportunities 2015 AHP International Conference session - Operations Opportunities
2015 AHP International Conference session - Operations Opportunities Dan Lantz
 

En vedette (18)

Netric for Publishers
Netric for PublishersNetric for Publishers
Netric for Publishers
 
Resume A. Rinaldi - ENG
Resume A. Rinaldi - ENGResume A. Rinaldi - ENG
Resume A. Rinaldi - ENG
 
BEST gr-bertool
BEST gr-bertoolBEST gr-bertool
BEST gr-bertool
 
Apartamento T1 Pipa Natal -T1 Apartment Pipa Natal
Apartamento T1 Pipa Natal -T1 Apartment Pipa NatalApartamento T1 Pipa Natal -T1 Apartment Pipa Natal
Apartamento T1 Pipa Natal -T1 Apartment Pipa Natal
 
Vietnam Powerpoint
Vietnam PowerpointVietnam Powerpoint
Vietnam Powerpoint
 
Fluency
FluencyFluency
Fluency
 
Thesis A. Rinaldi (PDF Slides)
Thesis A. Rinaldi (PDF Slides)Thesis A. Rinaldi (PDF Slides)
Thesis A. Rinaldi (PDF Slides)
 
Vasco da gama grace and sam2
Vasco da gama grace and sam2Vasco da gama grace and sam2
Vasco da gama grace and sam2
 
ABC Breakfast Club m DanZafe: Effektiv lageroprydning
ABC Breakfast Club m DanZafe: Effektiv lageroprydningABC Breakfast Club m DanZafe: Effektiv lageroprydning
ABC Breakfast Club m DanZafe: Effektiv lageroprydning
 
Mobility as the New Innovation Driver in the Enterprises
Mobility as the New Innovation Driver in the EnterprisesMobility as the New Innovation Driver in the Enterprises
Mobility as the New Innovation Driver in the Enterprises
 
APEC TEL41 990510
APEC TEL41  990510APEC TEL41  990510
APEC TEL41 990510
 
An independent view on the evolution of the Internet
An independent view on the evolution of the InternetAn independent view on the evolution of the Internet
An independent view on the evolution of the Internet
 
Proposed New US fashion design law 2010
Proposed New US fashion design law 2010Proposed New US fashion design law 2010
Proposed New US fashion design law 2010
 
Global
GlobalGlobal
Global
 
Duduk
DudukDuduk
Duduk
 
Feast of saint martin celebrated in santarcangelo by cassoli alberto 3 d
Feast of saint martin  celebrated in santarcangelo by cassoli alberto 3 dFeast of saint martin  celebrated in santarcangelo by cassoli alberto 3 d
Feast of saint martin celebrated in santarcangelo by cassoli alberto 3 d
 
Brochure invest eng
Brochure invest engBrochure invest eng
Brochure invest eng
 
2015 AHP International Conference session - Operations Opportunities
2015 AHP International Conference session - Operations Opportunities 2015 AHP International Conference session - Operations Opportunities
2015 AHP International Conference session - Operations Opportunities
 

Similaire à Big Data Engineering - Top 10 Pragmatics

Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesPere Urbón-Bayes
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SWIvan Herman
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
The causes and consequences of too many bits
The causes and consequences of too many bitsThe causes and consequences of too many bits
The causes and consequences of too many bitsDipesh Lall
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicHKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicLinaro
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...eswcsummerschool
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceBasis Technology
 
May 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data LandscapeMay 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data LandscapeYahoo Developer Network
 

Similaire à Big Data Engineering - Top 10 Pragmatics (20)

The Art of Big Data
The Art of Big DataThe Art of Big Data
The Art of Big Data
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph Databases
 
Some news about the SW
Some news about the SWSome news about the SW
Some news about the SW
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
The causes and consequences of too many bits
The causes and consequences of too many bitsThe causes and consequences of too many bits
The causes and consequences of too many bits
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
When?
When?When?
When?
 
Parallel io
Parallel ioParallel io
Parallel io
 
Spark
SparkSpark
Spark
 
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan MilojicicHKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
HKG15-The Machine: A new kind of computer- Keynote by Dejan Milojicic
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Big Data Analytics V2
Big Data Analytics V2Big Data Analytics V2
Big Data Analytics V2
 
Hak intis2013
Hak intis2013Hak intis2013
Hak intis2013
 
STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
 
May 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data LandscapeMay 2012 HUG: The Changing Big Data Landscape
May 2012 HUG: The Changing Big Data Landscape
 

Plus de Krishna Sankar

Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data ScienceKrishna Sankar
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01Krishna Sankar
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Bayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive BayesBayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive BayesKrishna Sankar
 
AWS VPC distilled for MongoDB devOps
AWS VPC distilled for MongoDB devOpsAWS VPC distilled for MongoDB devOps
AWS VPC distilled for MongoDB devOpsKrishna Sankar
 
Scrum debrief to team
Scrum debrief to team Scrum debrief to team
Scrum debrief to team Krishna Sankar
 
Precision Time Synchronization
Precision Time SynchronizationPrecision Time Synchronization
Precision Time SynchronizationKrishna Sankar
 
The Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to KaggleThe Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to KaggleKrishna Sankar
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
Cloud Interoperability Demo at OGF29
Cloud Interoperability Demo at OGF29Cloud Interoperability Demo at OGF29
Cloud Interoperability Demo at OGF29Krishna Sankar
 
A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0Krishna Sankar
 

Plus de Krishna Sankar (15)

Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Architecture in action 01
Architecture in action 01Architecture in action 01
Architecture in action 01
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Bayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive BayesBayesian Machine Learning - Naive Bayes
Bayesian Machine Learning - Naive Bayes
 
AWS VPC distilled for MongoDB devOps
AWS VPC distilled for MongoDB devOpsAWS VPC distilled for MongoDB devOps
AWS VPC distilled for MongoDB devOps
 
Scrum debrief to team
Scrum debrief to team Scrum debrief to team
Scrum debrief to team
 
Precision Time Synchronization
Precision Time SynchronizationPrecision Time Synchronization
Precision Time Synchronization
 
The Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to KaggleThe Hitchhiker’s Guide to Kaggle
The Hitchhiker’s Guide to Kaggle
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
Cloud Interoperability Demo at OGF29
Cloud Interoperability Demo at OGF29Cloud Interoperability Demo at OGF29
Cloud Interoperability Demo at OGF29
 
A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Big Data Engineering - Top 10 Pragmatics

  • 1. The road lies plain before me;--'tis a theme Single and of determined bounds; … - Wordsworth, The Prelude m pre ss.co . word ol bl eclix te Scho p:/ /dou Gr adua 2 ka r, htt val Post l2 7,201 n a San r, Na Apri Krish in a st Sem hD Gue 00–P EC40
  • 2. What is Big Data ? Big Data to smart data Big Data Pipeline o  Agenda o  To cover the broad picture o  Touch upon instances of the Analytics/ Cloud technologies Modeling Analytic R Algorithms Architectures employed o  Of the Big Data Processing - Storage - domain … Visualization Hadoop NOSQL
  • 3. Thanks to … The giants whose shoulders I am standing on Special  Thanks  to:        Peter  Ateshian,  NPS        Prof  Murali  Tummala,  NPS        Shirley  Bailes,O’Reilly        Ed  Dumbill,O’Reilly        Jeff  Barr,AWS        Jenny  Kohr  Chynoweth,AWS  
  • 4. Porcelain vs. Plumbing • The balance is always interesting … • This talk has both • Would be happy to dive deep into plumbing topics like Hadoop, R, MongoDB, Cassandra et al…
  • 5. EBC322   ①  Volume o  Scale   ②  Velocity o  Data  change  rate  vs.  decision  window   ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured   ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  • 6. EBC322   ①  Volume o  Scale   ②  Velocity o  Data  change  rate  vs.  decision  window   ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured   ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  • 7. EBC322   ①  Volume o  Scale   ②  Velocity o  Data  change  rate  vs.  decision  window   ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured   ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  • 8. EBC322   ①  Volume o  Scale   ②  Velocity o  Data  change  rate  vs.  decision  window   ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured   ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  • 9. EBC322   ①  Volume o  Scale   ②  Velocity o  Data  change  rate  vs.  decision  window   ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured   ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   ⑤  Contextual o  Dynamic  variability   o  RecommendaWon   ⑥  Connectedness hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  • 10. •  “…  they  didn’t  need  a  genius,  …  but  build  the  world’s  most  impressive   dileKante  …  baKling  the  efficient  human  mind  with  spectacular   flamboyant  inefficiency”  –  Final  Jeopardy  by  Stephen  Baker   •  15  TB  memory,  across  90  IBM  760  servers,  in  10  racks   •  1  TB  of  dataset   •  200  Million  pages  processed  by  Hadoop   •  This  is  a  good  example  of  Connected  data   –  Contextual  w/  variability   –  Breath  of  interpretaWon   –  AnalyWcs  depth   hKp://doubleclix.wordpress.com/2011/03/01/the-­‐educaWon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy %E2%80%9D-­‐by-­‐stephen-­‐baker/   hKp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/  
  • 12. Ref:h&p:goo.gl/Mm83k Infer-ability Model Internal   dashboards,   Hand   Tableau   Context coded     Programs,   Connectedness R,  Mahout,   …   SQL,       Variety BI  Tools,   Hadoop,   Pig,  Hive,     Variability SQL   .NET   Dryad,   NOSQL,   Logs,   Various   Velocity Scribe,   HDFS,   XML,   other  tools   Flume,   =iles,  …   Volume Storm,     Hadoop …   Decomplexify! Contextualize! Network! Reason! Infer!
  • 13. Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire hose for social network analytics ? AWS – 900 Billion objects! Zynga §  “Analytics company, not a gaming company!” §  Harvests data : 15 TB/day Storage §  Test new features §  4 U box = 40 TB, §  Target advertising 1 PB = 25 boxes ! §  §  230 million players/month hKp://goo.gl/dcBsQ  
  • 14. •  6  Billion  Messages  per   day   •  2  PB  (w/compression)   online   •  6  PB  w/  replicaWon   •  250  TB/Month  growth   •  HBase  Infrastructure  
  • 15. eBay  Extreme   AnalyWcs   Architecture   50  TB/Day   Very  systemaWc   240  nodes,  84  PB   Diagram  speaks  volumes!   Path  Analysis   Teradata  InstallaWon   A/B  TesWng   Ref:  hKp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf  
  • 16. D3.js   Tableau   R   Dashboard   Mahout   Hadoop   BI  Tools   Predict, Pig/Hive   Recommend NOSQL   Model & & Visualize Cassandra   R   Reason MongoDB   Transform Splunk   Hbase   & Analyze Scribe   Neo4j   Flume   Storm   Store When I think of my own native land, ! Collect In a moment I seem to be there; ! But, alas! recollection at hand Soon hurries me back to despair.! - Cowper, The Solitude Of Alexander SelKirk!
  • 17. NOSQL   Key  Value   Column   Document   Graph   In-­‐memory   SimpleDB   CouchDB   Neo4j   Memcached   Google   MongoDB   FlockDB   BigTable   Disk  Based   HBase   Lotus  Domino   InfiniteGraph   Redis   Cassandra   Riak   Tokyo  Cabinet   Dynamo   HyperTable   Voldemort   Azure  TS  
  • 18. MapReduce •  Data  parallelism   •  Large  InstallaWons  (many  ~5000  node  clusters!)  
  • 19. Sotware  As  A  Service   Plasorm  As  A  Service   Infrastructure  As  A  Service   19  
  • 20.
  • 21. Amazon – Canonical Cloud •  S3  –  Blob  storage   •  Dynamo  DB  –  NOSQL   •  EMR  –  ElasWc  Map  Reduce   •  EC2  –  Compute   •  1%  of  Internet  traffic   “Scalability is about building wider roads, not about building faster cars” – Steve Swartz hKp://blog.deepfield.net/2012/04/18/how-­‐big-­‐is-­‐amazons-­‐cloud/  
  • 23. EC2 EC2 hKp://openclipart.org/detail/152311/internet-­‐cloud-­‐by-­‐b.gaulWer,hKp://openclipart.org/detail/17847  
  • 24. •  Social  Network  Analysis   •  SenWment  Analysis   •  Brand  Strength   •  CitaWon/co-­‐citaWon  ≅  Followed  by/Also  Follows   •  Metrics   Tweets   –  Network  diameter,     Followers   –  Weak-­‐Wes,     Follow/Unfollow   –  Erdös-­‐Renyi  model  &     –  Kronecker  Graphs   hKp://www.oscon.com/oscon2012/public/schedule/detail/23130  
  • 25. Was it a vision, or a waking dream?! Fled is that music:—do I wake or sleep?! -Keats, Ode to a Nightingale!