SlideShare une entreprise Scribd logo
Oct.	
  20th	
  2012@Rakuten	
  Technology	
  Conference	
  2012




              Realtime	
  deep	
  analytics	
  
                                           	
  
                    for	
  BigData 	
                        Daisuke	
  Okanohara	
  	
  
                                       	
  
                 Preferred	
  Infrastructure,	
  Inc. 	
  
                  co-­‐founder,	
  vice	
  president	
  
                      hillbig@preferred.jp	
  
Agenda	

l     Introduction	
  of	
  PFI	
  

l     Current	
  condition	
  of	
  BigData	
  Analysis	
  
	
  
l     Jubatus:	
  concept	
  and	
  characteristics	
  

l     Inside	
  Jubatus:	
  Update,	
  Analyze,	
  and	
  Mix	
  




                                               2
Preferred	
  Infrastructure	
  (PFI)

l     Founded:	
  March	
  2006	
  
l     Location:	
  Hongo,	
  Tokyo	
  
l     Employees:	
  26	
  
l     Our	
  mission:	
  	
  
       Bring	
  cutting-­‐edge	
  research	
  advances	
  to	
  the	
  real	
  
       world	
  
l     Our	
  products	
  :	
  
        l  Sedue	
  	
  	
  	
  	
  “Modern	
  search	
  engine”	
  
        l  Bazil	
  	
  	
  	
  	
  	
  	
  	
  “Machine	
  learning	
  for	
  everyone”	
  
        l  Jubatus	
  	
  “Realtime	
  deep	
  analytics	
  for	
  BigData”	
  
	
  
	
  
                                                       3
Preferred	
  Infrastructure	
  (contd.)

l    We	
  are	
  passionate	
  towards	
  developing	
  various	
  computer	
  
      science	
  technologies	
  
       l    machine	
  learning	
  
       l    natural	
  language	
  processing	
  
       l    distributed	
  systems	
  
       l    programming	
  languages	
  
       l    data	
  structures	
  
       l    algorithms,	
  etc…	
  
l    Out	
  team	
  includes	
  winners	
  of	
  various	
  programming	
  contests	
  
      and	
  red	
  coders	
  

l    Very	
  rapid	
  prototyping	
  and	
  developing	
  good	
  software	
  
                                                4
Agenda	

l     Introduction	
  of	
  PFI	
  

l     Current	
  condition	
  of	
  BigData	
  Analysis	
  
	
  
l     Jubatus:	
  concept	
  and	
  characteristics	
  

l     Inside	
  Jubatus:	
  Update,	
  Analyze,	
  and	
  Mix	
  




                                               5
BigData	
  !	

l    We	
  see	
  BigData	
  everywhere	
  
       l    3V	
  	
  “Volume”,	
  “Velocity”,	
  “Variety”	
  

l    Need	
  tools	
  for	
  analyzing	
  BigData	


                                        <Data	
  Types>
Text             Log         Image        Voice         Vision     Signal      Finance     Bio



People            PC         Mobile      Sensors        Cars       Factories    Web      Hospitals


                                        <Data	
  Sources>
                                                    6
Case	
  1.	
  SNS(Twitter・Facebook,	
  etc.)	
•  Jubatus	
  classifies	
  each	
  tweet	
  from	
  stream	
  (6000	
  tps)	
  
   	
  into	
  categories	
  according	
  to	
  tweet	
  contents	
  using	
  	
  
   machine	
  learning	
  technologies	
  
	
  




                                              7
Case	
  2.	
  Automobiles	

l    Services	
  
       l    Remote	
  maintenance	
  /	
  	
  security	
  
       l    Insurance:	
  Pay	
  As	
  You	
  Drive	
  ,	
  Pay	
  How	
  You	
  Drive	
  	
  	
  

l    Auto-­‐driving	
  cars	
  
       l    equipped	
  sensors:	
  radar,	
  lidar	
  (laser	
  radar)	
  ,	
  GPS,	
  cameras	
  
       l    E.	
  g.	
  Google	
  driverless	
  cars	
  
               l    In	
  Aug.	
  2012,	
  they	
  completed	
  480,000	
  km	
  test	
  drive	
  




                                                                 8
Case	
  2.	
  automobile	
  (contd.)	
  
  navigation	
  system	
  based	
  on	
  real-­‐time	
  traffic	
  updates	
  
  waze.com	
  




                                      9
Case	
  3.	
  	
  Infrastructures,	
  factories

l    Preventive	
  maintenance	
  for	
  NY	
  City	
  power	
  grid	
  
       l    Learning	
  prioritization	
  (supervised	
  ranking	
  or	
  MTBF)	
  of	
  
             candidates	
  using	
  approx.	
  300	
  summary	
  features	
  	
  
       l    The	
  results	
  are	
  enough	
  accurate	
  to	
  support	
  decision	
  making	
  




                                                                OA rate

                                                                =outage rate	

      “Machine	
  Learning	
  for	
  the	
  New	
  York	
  City	
  Power	
  Grid”,	
  	
  
      J.	
  IEEE	
  Trans.	
  PAMI,	
  2-­‐12,	
  	
   10
Case	
  3.	
  Infrastructures,	
  factories	
  (contd.)	
Benefit vs Cost for various replacement strategies analyzed by

machine learning

	




   “Machine	
  Learning	
  for	
  the	
  New	
  York	
  City	
  Power	
  Grid”,	
  	
  
   J.	
  IEEE	
  Trans.	
  PAMI,	
  2-­‐12,	
  	
   11
Case.	
  4	
  	
  Genome	
  Analysis	

l    Next	
  generation	
  sequencer	
  makes	
  big	
  changes	
  
       l  Human	
  genome	
  sequencing,	
  $3	
  billion/10	
  year	
  in	
  2001	
  

             	
  becomes	
  $7,700/1	
  day	
  in	
  2012	
  
l    GWAS	
  (Genome-­‐wide	
  association	
  study)	
  becomes	
  popular	
  
l    Big	
  impacts	
  in	
  many	
  fields:	
  Healthcare,	
  Agriculture,	
  Medicine	
  
       l  23andme	
  analyzes	
  users’	
  DNA	
  and	
  obtain	
  information	
  about	
  	
  their	
  

             ancestries,	
  health	
  and	
  genetic	
  traits	
  




                                                        12
Agenda	

l     Introduction	
  of	
  PFI	
  

l     Current	
  condition	
  of	
  BigData	
  Analysis	
  
	
  
l     Jubatus:	
  concept	
  and	
  characteristics	
  

l     Inside	
  Jubatus:	
  Update,	
  Analyze,	
  and	
  Mix	
  




                                              13
Increasing	
  demand	
  in	
  BigData	
  applications:	
  
     Higher	
  necessity	
  of	
  deeper	
  real-­‐time	
  analysis	
     l    Current:	
  simple	
  aggregation	
  and	
  pre-­‐defined	
  rule	
  processing	
  
           on	
  bigger	
  data	
  
            l    CEP,	
  Hadoop,	
  DSMS	
  

     l    Future:	
  deeper	
  analysis	
  for	
  rapid	
  decisions	
  and	
  actions	
  


Decision	
  Speed	


                                                                                      Jubatus
                                                 Hadoop
                          CEP
                                                                                                Deep	
  
     Reference:http://web.mit.edu/rudin/www/TPAMIPreprint.pdf	
  
                                                     14	
                                       analysis	
          	
   	
  http://www.computerworlduk.com/news/networking/3302464/   	
  
Jubatus: OSS platform for Big Data analytics	




l    Joint	
  development	
  of	
  PFI	
  and	
  NTT	
  laboratory	
  
       l    Project	
  started	
  in	
  April	
  2011	
  
l    Released	
  as	
  an	
  open	
  source	
  software	
  
       l    You	
  can	
  download	
  it	
  from:	
  http://github.com/jubatus/	
  



                                                             15
Key	
  technology:	
  Machine	
  learning	
  
	
l    We	
  need	
  rapid	
  decisions	
  under	
  uncertainties	
  
       l    Anomaly	
  detection	
  from	
  M2M	
  sensor	
  data	
  
       l    Energy	
  demand	
  forecast	
  /	
  Smart	
  grid	
  optimization	
  
       l    Security	
  monitoring	
  on	
  raw	
  Internet	
  traffic	
  
l    What	
  is	
  missing	
  for	
  fast	
  &	
  deep	
  analytics	
  on	
  BigData?	
  
       l          Online/real-­‐time	
  machine	
  learning	
  platform	
  
       	
  	
  +	
  Scale-­‐out	
  distributed	
  machine	
  learning	
  platform	
  



             1. Bigger data

               2. Real-time

      3. Deeper analysis
Online	
  machine	
  learning	
l     Batch	
  machine	
  learning	
  
        l  Scan	
  all	
  data	
  before	
  building	
  a	
  model	
  
        l  Analysis	
  can	
  be	
  available	
  after	
  all	
  data	
  is	
  prepared	
  


                                                                        Model
	
  
l     Online	
  machine	
  learning	
  
        l  Model	
  is	
  updated	
  instantaneously	
  by	
  each	
  data	
  sample	
  
        l  Online	
  models	
  converge	
  with	
  the	
  batch	
  models	
  
        l  the	
  convergence	
  is	
  very	
  fast,	
  appx.	
  100	
  times	
  faster	
  than	
  
            batch	
  	
  (1day	
  -­‐>	
  5	
  min.)	
  
                                                                                Model


                                                     17
Jubatus	
  employs	
  latest	
  online	
  machine	
  learning	

l    Advantages:	
  fast	
  and	
  memory-­‐efficient	
  
       l  Low	
  latency	
  &	
  high	
  throughput	
  
       l  No	
  need	
  for	
  large	
  dataset	
  storage	
  


l    Eg.	
  Online	
  learning	
  for	
  Linear	
  classification	
  
       l    Perceptron	
  (1958)	
  
       l    Passive	
  Aggressive	
  	
  (2003)	
                         Very	
  recent	
  
                                                                            progress
       l    Confidence	
  Weighted	
  Learning	
  	
  (2008)	
  
       l    AROW	
  (2009)	
  
       l    Normal	
  HERD	
  	
  (2010)	
  
       l    Soft	
  Confidence	
  Weighted	
  Learning	
  	
  (2012)	
  


                                                      18
Data	
  analysis	
  goes	
  Real-­‐time/Online	
  and	
  Large	
  scale	

     l    Jubatus	
  combines	
  them	
  into	
  a	
  unified	
  computation	
  
           framework	
                     Real-­‐time/	
  
                                                               Online
                         Online	
  ML	
  alg.	
                            Jubatus	
  	
  2011-­‐	
  
                         Structured	
  
                         Perceptron	
  2001	
  
                         PA	
  2003,	
  CW	
  2008	
  
                                                                                                                                Large	
  scale	
  
Small	
  scale	
  	
                                                                                                                  &	
  
Stand-­‐alone	
                                                                                                                 Distributed/	
  
                                                                                                                                  Parallel	
  
                         WEKA	
                                                               Mahout	
                          computing	
  
                           	
  	
  1993-­‐                                                    	
  	
  	
  	
  	
  2006-­‐	
  

                         SPSS	
  
                         	
  	
  	
  	
  	
  	
  1988-­‐	
  
                                                               Batch	
  
                                                                  19
What	
  Jubatus	
  currently	
  supports	

1.    Classification	
  (multi-­‐class)	
  
      l    Perceptron	
  /	
  PA	
  /	
  CW	
  /	
  AROW	
  
2.    Regression	
  
      l    PA-­‐based	
  regression	
  
3.    Nearest	
  neighbor	
                                        We	
  support	
  most	
  machine	
  	
  
      l    LSH	
  /	
  MinHash	
  /	
  Euclid	
  LSH	
            learning/data	
  mining	
  	
  
4.    Recommendation	
                                             technologies
      l    Based	
  on	
  nearest	
  neighbor	
  
5.    Anomaly	
  detection	
  
      l    LOF	
  based	
  on	
  nearest	
  neighbor	
  
6.    Graph	
  analysis	
  
      l    Shortest	
  path	
  /	
  Centrality	
  (PageRank)	
  
7.    Simple	
  statistics	
                                20
Hadoop	
  and	
  Mahout	
  are	
  not	
  good	
  for	
  online	
  learning	

l    Hadoop	
  
       l    Advantages	
  
              l    Many	
  extensions	
  for	
  a	
  variety	
  of	
  applications	
  
              l    Good	
  for	
  distributed	
  data	
  storing	
  and	
  aggregation	
  
       l    Disadvantages	
  
              l    No	
  direct	
  support	
  for	
  machine	
  learning	
  and	
  online	
  processing	
  
l    Mahout	
  
       l    Advantages	
  
              l    Popular	
  machine	
  learning	
  algorithms	
  are	
  implemented	
  
       l    Disadvantages	
  
              l    Some	
  implementations	
  are	
  less	
  mature	
  
              l    Still	
  not	
  capable	
  of	
  online	
  machine	
  learning	
  

                                                                21
Jubatus	
  vs.	
  Hadoop,	
  RDB,	
  and	
  Storm:	
  
    Advantage	
  in	
  online	
  AND	
  distributed	
  ML	
    l    Only	
  Jubatus	
  satisfies	
  both	
  of	
  them	
  at	
  the	
  same	
  time	
  

                                 Jubatus              Hadoop                 RDB           Storm
                Storing                                     ✓✓
                                         -                                      ✓              -
                BigData                                     HDFS
                 Batch                                        ✓             ✓✓
                                       ✓                                                       -
                learning                                    Mahout        SPSS, etc
                 Stream
                                       ✓                      -                  -             ✓✓
               processing
              Distributed                                     ✓
                                      ✓✓                                         -             -
               learning                                     Mahout
   High
         Online
importance	
                          ✓✓                      -                  -             -
                learning
                                                     22
Agenda	

l     Introduction	
  of	
  PFI	
  

l     Current	
  condition	
  of	
  BigData	
  Analysis	
  
	
  
l     Jubatus:	
  concept	
  and	
  characteristics	
  

l     Inside	
  Jubatus:	
  Update,	
  Analyze,	
  and	
  Mix	
  




                                              23
Distributed	
  online	
  learning	
  algorithm	
  is	
  not	
  trivial	

                Batch	
  learning	
                                Online	
  learning	

                     Learn	
                                              Learn        	
     	
  
                   the	
  update       Easy	
  to	
  parallelize     Model	
  update
                                                                          Learn
                  Model	
  update                                    Model	
  update
                                           Hard	
  to	
                   Learn
                     Learn	
  
                                          parallelize	
              Model	
  update
                   the	
  update
                                           due	
  to	
  
                                                                          Learn
                                      frequent	
  updates
  Time	
          Model	
  update                                    Model	
  update


l    Online	
  learning	
  requires	
  frequent	
  model	
  updates	
  
l    Naïve	
  distributed	
  architecture	
  leads	
  to	
  too	
  many	
  
      synchronization	
  operations	
  

                                                24
Solution:	
  Loose	
  model	
  sharing	

l    Jubatus	
  only	
  shares	
  the	
  local	
  models	
  in	
  a	
  loose	
  manner	
  
       l    Fact:	
  Model	
  size	
  <<	
  Data	
  size	
  
       l    does	
  not	
  share	
  data	
  sets	
  
       l    Unique	
  approach	
  compared	
  to	
  existing	
  framework	
  
l    Local	
  models	
  can	
  be	
  different	
  on	
  the	
  servers	
  
       l    Different	
  models	
  will	
  be	
  gradually	
  merged	
  


                              Model           Model            Model




                              Mixed	
         Mixed	
           Mixed	
  
                              model           model             model
Three	
  fundamental	
  operations	
  on	
  Jubatus:	
  
UPDATE,	
  ANALYZE,	
  and	
  MIX	
1.     UPDATE	
  
       l  Receive	
  a	
  sample,	
  learn	
  and	
  update	
  the	
  local	
  model	
  

2.     ANALYZE	
  
       l  Receive	
  a	
  sample,	
  apply	
  the	
  local	
  model,	
  return	
  the	
  result	
  

3.     MIX	
  (automatically	
  executed	
  in	
  backend)	
  
       l  Exchange	
  and	
  merge	
  the	
  local	
  models	
  between	
  servers	
  



l    C.f.	
  Map-­‐Shuffle-­‐Reduce	
  operations	
  on	
  Hadoop	
  
l    Algorithms	
  can	
  be	
  implemented	
  independently	
  from	
  
       l    Distribution	
  logic	
  
       l    Data	
  sharing	
  
       l    Failover
                                                      26
UPDATE	

   l    Each	
  data	
  sample	
  are	
  sent	
  to	
  one	
  (or	
  two)	
  server(s)	
  
   l    Local	
  models	
  are	
  updated	
  based	
  on	
  the	
  sample	
  
   l    Data	
  samples	
  are	
  NEVER	
  shared	



Distributed

randomly
                                                                      Local
or consistently 	
                                                                            Initial
                                                                               model
                                                                                              model
                                                                                 1

                                                                               Local
                                                                               model          Initial
                                                                                              model
                                                                                 2
                                                     27
MIX	

l    Each	
  server	
  sends	
  its	
  model	
  diff	
  (difference)	
  
l    Model	
  diffs	
  are	
  merged	
  and	
  distributed	
  	
  
l    Only	
  model	
  diffs	
  are	
  transmitted	



            Local      Model       Model
Initial                                                    Merged Initial     Mixed
model     -	
            model    =	
 diff       diff
                                                             diff +	
                                                                    model   =	
                                                                              model
              1           1          1    Merged
                                     +	
 =	
 diff
        Local          Model       Model
Initial                                                    Merged Initial     Mixed
model -	
 2
        model        =	
 diff       diff
                                                             diff +	
                                                                   model    =	
                                                                              model
                          2          2


                                               28
UPDATE	
  (iteration)	

   l    Each	
  server	
  starts	
  updating	
  from	
  the	
  mixed	
  model	
  
   l    The	
  mixed	
  model	
  improves	
  gradually	
  thanks	
  to	
  all	
  of	
  the	
  
         servers	
  



Distributed

randomly
                                                                   Local
or consistently 	
                                                                         Mixed
                                                                            model
                                                                                           model
                                                                              1

                                                                            Local
                                                                            model          Mixed
                                                                                           model
                                                                              2
                                                   29
ANALYZE	

   l    For	
  analysis,	
  each	
  sample	
  randomly	
  goes	
  to	
  a	
  server	
  
   l    Server	
  applies	
  the	
  current	
  mixed	
  model	
  to	
  the	
  sample	
  
          l    use	
  the	
  model	
  in	
  local	
  server	
  only,	
  doesn’t	
  communicate	
  
   l    The	
  results	
  are	
  returned	
  to	
  the	
  client	



Distributed

randomly
                                                                                        Mixed
                                                                                                 model

                                                   Return prediction
                                                                                                 Mixed
                                                                                                 model
                                                   Return prediction
                                                       30
Why	
  Jubatus	
  can	
  work	
  in	
  real-­‐time?	

1.	
  Focus	
  on	
  online	
  machine	
  learning	
  
      l    Make	
  online	
  machine	
  learning	
  algorithms	
  distributed	
  
2.	
  Update	
  locally	
  
      l    Online	
  training	
  without	
  communication	
  with	
  others	
  
3.	
  Mix	
  only	
  models	
  	
  
      l    Small	
  communication	
  cost,	
  low	
  latency,	
  good	
  performance	
  
      l    Advantage	
  compared	
  to	
  costly	
  Shuffle	
  in	
  MapReduce	
  	
  
4.	
  Analyze	
  locally	
  
      l    Each	
  server	
  has	
  mixed	
  model	
  and	
  need	
  not	
  to	
  communicate	
  
      l    Low	
  latency	
  for	
  making	
  predictions	
  
5.	
  Everything	
  in-­‐memory	
  
      l    Process	
  data	
  on-­‐the-­‐fly	
  
                                                   31
Summary	

l    Jubatus	
  is	
  the	
  first	
  OSS	
  platform	
  for	
  online	
  distributed	
  
      machine	
  learning	
  on	
  BigData	
  streams.	
  
l    Download	
  it	
  from	
  http://github.com/jubatus/	
  
l    We	
  welcome	
  your	
  contribution	
  and	
  collaboration	
  
                      1. Bigger data

                  2. More in real-time

                    3. Deep analysis
Copyright  ©  2006-‐‑‒2012  
Preferred  Infrastructure  All  Right  Reserved.

Contenu connexe

Tendances

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Sreedhar Chowdam
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
eXascale Infolab
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
Bart Vandewoestyne
 
BIG DATA
BIG DATABIG DATA
BIG DATA
Shashank Shetty
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
Arockiaraj Durairaj
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
Ajay Ohri
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
Gigaom
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
Natalino Busa
 

Tendances (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 

En vedette

Apache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, MiamiApache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, Miami
Makoto Yui
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
Data Analytics Practice at Paxcel
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at Paxcel
Pushpinder Singh
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and Kanban
Dimitri Ponomareff
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
Agile Testing Alliance
 

En vedette (7)

Apache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, MiamiApache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, Miami
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Data Analytics Practice at Paxcel
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at Paxcel
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and Kanban
 
BDaas- BigData as a service
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service
 

Similaire à Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012

Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
Usman Qayyum
 
Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
UKinItaly
 
Jubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB AsiaJubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB Asia
Preferred Networks
 
Io t research_arpanpal_iem
Io t research_arpanpal_iemIo t research_arpanpal_iem
Io t research_arpanpal_iem
Arpan Pal
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
Jeffrey Sica
 
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Alexandru Iosup
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
Odinot Stanislas
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
Capgemini
 
2018 learning approach-digitaltrends
2018 learning approach-digitaltrends2018 learning approach-digitaltrends
2018 learning approach-digitaltrends
Abhilash Gopalakrishnan
 
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
DataWorks Summit
 
How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?
Amazon Web Services
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
LeMeniz Infotech
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
Colin Clark
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
InnoTech
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
AIRCC Publishing Corporation
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
ijcsit
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
PayamBarnaghi
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
Jongwook Woo
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
Vladimir Bacvanski, PhD
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
DATAVERSITY
 

Similaire à Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012 (20)

Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
 
Big data analytics for transport
Big data analytics for transportBig data analytics for transport
Big data analytics for transport
 
Jubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB AsiaJubatus Invited Talk at XLDB Asia
Jubatus Invited Talk at XLDB Asia
 
Io t research_arpanpal_iem
Io t research_arpanpal_iemIo t research_arpanpal_iem
Io t research_arpanpal_iem
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
2018 learning approach-digitaltrends
2018 learning approach-digitaltrends2018 learning approach-digitaltrends
2018 learning approach-digitaltrends
 
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source EcosystemAccelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
Accelerating Real-Time Analytics Insights Through Hadoop Open Source Ecosystem
 
How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 

Plus de Preferred Networks

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
Preferred Networks
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Preferred Networks
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Preferred Networks
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
Preferred Networks
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Preferred Networks
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Preferred Networks
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
Preferred Networks
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Preferred Networks
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
Preferred Networks
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
Preferred Networks
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Preferred Networks
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Preferred Networks
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
Preferred Networks
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
Preferred Networks
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
Preferred Networks
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
Preferred Networks
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
Preferred Networks
 

Plus de Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 

Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012

  • 1. Oct.  20th  2012@Rakuten  Technology  Conference  2012 Realtime  deep  analytics     for  BigData Daisuke  Okanohara       Preferred  Infrastructure,  Inc.    co-­‐founder,  vice  president   hillbig@preferred.jp  
  • 2. Agenda l  Introduction  of  PFI   l  Current  condition  of  BigData  Analysis     l  Jubatus:  concept  and  characteristics   l  Inside  Jubatus:  Update,  Analyze,  and  Mix   2
  • 3. Preferred  Infrastructure  (PFI) l  Founded:  March  2006   l  Location:  Hongo,  Tokyo   l  Employees:  26   l  Our  mission:     Bring  cutting-­‐edge  research  advances  to  the  real   world   l  Our  products  :   l  Sedue          “Modern  search  engine”   l  Bazil                “Machine  learning  for  everyone”   l  Jubatus    “Realtime  deep  analytics  for  BigData”       3
  • 4. Preferred  Infrastructure  (contd.) l  We  are  passionate  towards  developing  various  computer   science  technologies   l  machine  learning   l  natural  language  processing   l  distributed  systems   l  programming  languages   l  data  structures   l  algorithms,  etc…   l  Out  team  includes  winners  of  various  programming  contests   and  red  coders   l  Very  rapid  prototyping  and  developing  good  software   4
  • 5. Agenda l  Introduction  of  PFI   l  Current  condition  of  BigData  Analysis     l  Jubatus:  concept  and  characteristics   l  Inside  Jubatus:  Update,  Analyze,  and  Mix   5
  • 6. BigData  ! l  We  see  BigData  everywhere   l  3V    “Volume”,  “Velocity”,  “Variety”   l  Need  tools  for  analyzing  BigData <Data  Types> Text Log Image Voice Vision Signal Finance Bio People PC Mobile Sensors Cars Factories Web Hospitals <Data  Sources> 6
  • 7. Case  1.  SNS(Twitter・Facebook,  etc.) •  Jubatus  classifies  each  tweet  from  stream  (6000  tps)    into  categories  according  to  tweet  contents  using     machine  learning  technologies     7
  • 8. Case  2.  Automobiles l  Services   l  Remote  maintenance  /    security   l  Insurance:  Pay  As  You  Drive  ,  Pay  How  You  Drive       l  Auto-­‐driving  cars   l  equipped  sensors:  radar,  lidar  (laser  radar)  ,  GPS,  cameras   l  E.  g.  Google  driverless  cars   l  In  Aug.  2012,  they  completed  480,000  km  test  drive   8
  • 9. Case  2.  automobile  (contd.)   navigation  system  based  on  real-­‐time  traffic  updates   waze.com   9
  • 10. Case  3.    Infrastructures,  factories l  Preventive  maintenance  for  NY  City  power  grid   l  Learning  prioritization  (supervised  ranking  or  MTBF)  of   candidates  using  approx.  300  summary  features     l  The  results  are  enough  accurate  to  support  decision  making   OA rate
 =outage rate “Machine  Learning  for  the  New  York  City  Power  Grid”,     J.  IEEE  Trans.  PAMI,  2-­‐12,     10
  • 11. Case  3.  Infrastructures,  factories  (contd.) Benefit vs Cost for various replacement strategies analyzed by
 machine learning
 “Machine  Learning  for  the  New  York  City  Power  Grid”,     J.  IEEE  Trans.  PAMI,  2-­‐12,     11
  • 12. Case.  4    Genome  Analysis l  Next  generation  sequencer  makes  big  changes   l  Human  genome  sequencing,  $3  billion/10  year  in  2001    becomes  $7,700/1  day  in  2012   l  GWAS  (Genome-­‐wide  association  study)  becomes  popular   l  Big  impacts  in  many  fields:  Healthcare,  Agriculture,  Medicine   l  23andme  analyzes  users’  DNA  and  obtain  information  about    their   ancestries,  health  and  genetic  traits   12
  • 13. Agenda l  Introduction  of  PFI   l  Current  condition  of  BigData  Analysis     l  Jubatus:  concept  and  characteristics   l  Inside  Jubatus:  Update,  Analyze,  and  Mix   13
  • 14. Increasing  demand  in  BigData  applications:   Higher  necessity  of  deeper  real-­‐time  analysis l  Current:  simple  aggregation  and  pre-­‐defined  rule  processing   on  bigger  data   l  CEP,  Hadoop,  DSMS   l  Future:  deeper  analysis  for  rapid  decisions  and  actions   Decision  Speed Jubatus Hadoop CEP Deep   Reference:http://web.mit.edu/rudin/www/TPAMIPreprint.pdf   14 analysis    http://www.computerworlduk.com/news/networking/3302464/  
  • 15. Jubatus: OSS platform for Big Data analytics l  Joint  development  of  PFI  and  NTT  laboratory   l  Project  started  in  April  2011   l  Released  as  an  open  source  software   l  You  can  download  it  from:  http://github.com/jubatus/   15
  • 16. Key  technology:  Machine  learning   l  We  need  rapid  decisions  under  uncertainties   l  Anomaly  detection  from  M2M  sensor  data   l  Energy  demand  forecast  /  Smart  grid  optimization   l  Security  monitoring  on  raw  Internet  traffic   l  What  is  missing  for  fast  &  deep  analytics  on  BigData?   l  Online/real-­‐time  machine  learning  platform      +  Scale-­‐out  distributed  machine  learning  platform   1. Bigger data 2. Real-time 3. Deeper analysis
  • 17. Online  machine  learning l  Batch  machine  learning   l  Scan  all  data  before  building  a  model   l  Analysis  can  be  available  after  all  data  is  prepared   Model   l  Online  machine  learning   l  Model  is  updated  instantaneously  by  each  data  sample   l  Online  models  converge  with  the  batch  models   l  the  convergence  is  very  fast,  appx.  100  times  faster  than   batch    (1day  -­‐>  5  min.)   Model 17
  • 18. Jubatus  employs  latest  online  machine  learning l  Advantages:  fast  and  memory-­‐efficient   l  Low  latency  &  high  throughput   l  No  need  for  large  dataset  storage   l  Eg.  Online  learning  for  Linear  classification   l  Perceptron  (1958)   l  Passive  Aggressive    (2003)   Very  recent   progress l  Confidence  Weighted  Learning    (2008)   l  AROW  (2009)   l  Normal  HERD    (2010)   l  Soft  Confidence  Weighted  Learning    (2012)   18
  • 19. Data  analysis  goes  Real-­‐time/Online  and  Large  scale l  Jubatus  combines  them  into  a  unified  computation   framework Real-­‐time/   Online Online  ML  alg.   Jubatus    2011-­‐   Structured   Perceptron  2001   PA  2003,  CW  2008   Large  scale   Small  scale     &   Stand-­‐alone   Distributed/   Parallel   WEKA   Mahout   computing         1993-­‐          2006-­‐   SPSS              1988-­‐   Batch   19
  • 20. What  Jubatus  currently  supports 1.  Classification  (multi-­‐class)   l  Perceptron  /  PA  /  CW  /  AROW   2.  Regression   l  PA-­‐based  regression   3.  Nearest  neighbor   We  support  most  machine     l  LSH  /  MinHash  /  Euclid  LSH   learning/data  mining     4.  Recommendation   technologies l  Based  on  nearest  neighbor   5.  Anomaly  detection   l  LOF  based  on  nearest  neighbor   6.  Graph  analysis   l  Shortest  path  /  Centrality  (PageRank)   7.  Simple  statistics   20
  • 21. Hadoop  and  Mahout  are  not  good  for  online  learning l  Hadoop   l  Advantages   l  Many  extensions  for  a  variety  of  applications   l  Good  for  distributed  data  storing  and  aggregation   l  Disadvantages   l  No  direct  support  for  machine  learning  and  online  processing   l  Mahout   l  Advantages   l  Popular  machine  learning  algorithms  are  implemented   l  Disadvantages   l  Some  implementations  are  less  mature   l  Still  not  capable  of  online  machine  learning   21
  • 22. Jubatus  vs.  Hadoop,  RDB,  and  Storm:   Advantage  in  online  AND  distributed  ML l  Only  Jubatus  satisfies  both  of  them  at  the  same  time   Jubatus Hadoop RDB Storm Storing ✓✓ - ✓ - BigData HDFS Batch ✓ ✓✓ ✓ - learning Mahout SPSS, etc Stream ✓ - - ✓✓ processing Distributed ✓ ✓✓ - - learning Mahout High
 Online importance ✓✓ - - - learning 22
  • 23. Agenda l  Introduction  of  PFI   l  Current  condition  of  BigData  Analysis     l  Jubatus:  concept  and  characteristics   l  Inside  Jubatus:  Update,  Analyze,  and  Mix   23
  • 24. Distributed  online  learning  algorithm  is  not  trivial Batch  learning Online  learning Learn   Learn     the  update Easy  to  parallelize Model  update Learn Model  update Model  update Hard  to   Learn Learn   parallelize   Model  update the  update due  to   Learn frequent  updates Time Model  update Model  update l  Online  learning  requires  frequent  model  updates   l  Naïve  distributed  architecture  leads  to  too  many   synchronization  operations   24
  • 25. Solution:  Loose  model  sharing l  Jubatus  only  shares  the  local  models  in  a  loose  manner   l  Fact:  Model  size  <<  Data  size   l  does  not  share  data  sets   l  Unique  approach  compared  to  existing  framework   l  Local  models  can  be  different  on  the  servers   l  Different  models  will  be  gradually  merged   Model Model Model Mixed   Mixed   Mixed   model model model
  • 26. Three  fundamental  operations  on  Jubatus:   UPDATE,  ANALYZE,  and  MIX 1.  UPDATE   l  Receive  a  sample,  learn  and  update  the  local  model   2.  ANALYZE   l  Receive  a  sample,  apply  the  local  model,  return  the  result   3.  MIX  (automatically  executed  in  backend)   l  Exchange  and  merge  the  local  models  between  servers   l  C.f.  Map-­‐Shuffle-­‐Reduce  operations  on  Hadoop   l  Algorithms  can  be  implemented  independently  from   l  Distribution  logic   l  Data  sharing   l  Failover 26
  • 27. UPDATE l  Each  data  sample  are  sent  to  one  (or  two)  server(s)   l  Local  models  are  updated  based  on  the  sample   l  Data  samples  are  NEVER  shared Distributed
 randomly Local or consistently Initial model model 1 Local model Initial model 2 27
  • 28. MIX l  Each  server  sends  its  model  diff  (difference)   l  Model  diffs  are  merged  and  distributed     l  Only  model  diffs  are  transmitted Local Model Model Initial Merged Initial Mixed model - model = diff diff diff + model = model 1 1 1 Merged + = diff Local Model Model Initial Merged Initial Mixed model - 2 model = diff diff diff + model = model 2 2 28
  • 29. UPDATE  (iteration) l  Each  server  starts  updating  from  the  mixed  model   l  The  mixed  model  improves  gradually  thanks  to  all  of  the   servers   Distributed
 randomly Local or consistently Mixed model model 1 Local model Mixed model 2 29
  • 30. ANALYZE l  For  analysis,  each  sample  randomly  goes  to  a  server   l  Server  applies  the  current  mixed  model  to  the  sample   l  use  the  model  in  local  server  only,  doesn’t  communicate   l  The  results  are  returned  to  the  client Distributed
 randomly Mixed model Return prediction Mixed model Return prediction 30
  • 31. Why  Jubatus  can  work  in  real-­‐time? 1.  Focus  on  online  machine  learning   l  Make  online  machine  learning  algorithms  distributed   2.  Update  locally   l  Online  training  without  communication  with  others   3.  Mix  only  models     l  Small  communication  cost,  low  latency,  good  performance   l  Advantage  compared  to  costly  Shuffle  in  MapReduce     4.  Analyze  locally   l  Each  server  has  mixed  model  and  need  not  to  communicate   l  Low  latency  for  making  predictions   5.  Everything  in-­‐memory   l  Process  data  on-­‐the-­‐fly   31
  • 32. Summary l  Jubatus  is  the  first  OSS  platform  for  online  distributed   machine  learning  on  BigData  streams.   l  Download  it  from  http://github.com/jubatus/   l  We  welcome  your  contribution  and  collaboration   1. Bigger data 2. More in real-time 3. Deep analysis
  • 33. Copyright  ©  2006-‐‑‒2012   Preferred  Infrastructure  All  Right  Reserved.