SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Hive at Last.fm!



Omar Ali - Data Developer!
March 2012!
Overview!

•  Hadoop at Last.fm"
•  Hive"
•  Examples"



What I want to show you:"
•  How it fits with a Hadoop infrastructure"
•  Typical workflow with Hive"
•  Ease of use for experiments and prototypes!
Hadoop!
•    Brief overview of our infrastructure"
•    How we use it"
"
Hadoop!
64 node cluster "
!
	
  
Charts!

                                                                                                                                                                                                   




                                                                                                                                                                                                       




                                                                                                                                                                                                                                                           




                                                                                                                                                                                                                                      




                                                                                                                                                                                  




                                                                                                                                                                                            




                                                                                                                                                      




                                                                                                                                                                                           




                                                                                                                                                                                 




                                                                                                                                                                                   




                                                                                                                                      




                                                                                                                                 




                                                                                                                                  
Hive!
•    What is Hive?"
•    How does it fit in with the rest of our system?"
•    Using existing data in Hive"
•    Example query"
What is Hive?!

•      Data Warehouse"
•      You see your data in the form of tables"
•      Query language very similar to SQL"


     hive>	
  show	
  tables	
  like	
       hive>	
  describe	
  
     'omar_charts_*';	
                      omar_charts_tagcloud_album;	
  	
  	
  	
  	
  	
  	
  
     OK	
                                    OK	
  
     omar_charts_globaltags_album	
          albumid	
  int	
  
     omar_charts_globaltags_artist	
         tagid	
  	
  	
  int	
  
     omar_charts_globaltags_track	
          weight	
  	
  double	
  
     omar_charts_tagcloud_album	
  
     omar_charts_tagcloud_artist	
  
     omar_charts_tagcloud_track	
  
What is a table?!

              Standard !                                External!
                  !                                         "
•  Metadata stored by Hive"              •  Metadata stored by Hive"

•  Table data stored by Hive"            •  Table data referenced by Hive"

•  Deleting the table deletes the data   •  Deleting the table only deletes the
   and the metadata!                        metadata!
What is a table?!

              Standard !                                External!
                  !                                         "
•  Metadata stored by Hive"              •  Metadata stored by Hive"

•  Table data stored by Hive"            •  Table data referenced by Hive"

•  Deleting the table deletes the data   •  Deleting the table only deletes the
   and the metadata!                        metadata!




            Database	
  Tables	
                          Log	
  Files	
  
Example: scrobbles
                                                        !

Scrobble	
  Log:	
  
13364451	
  30886670	
  217803052	
  358001787	
  0	
  0	
  0	
  1	
  0	
  0	
  1319068581	
  
42875138	
  1717	
  3776668	
  4641276	
  0	
  0	
  0	
  1	
  0	
  0	
  1319068445	
  
43108664	
  1003811	
  2237730	
  1019632	
  0	
  0	
  0	
  1	
  0	
  0	
  1319068783	
  
36107186	
  1033304	
  2393940	
  13409429	
  0	
  0	
  0	
  0	
  0	
  1	
  1319068524	
  
23842745	
  1261965	
  2349564	
  14091069	
  0	
  0	
  0	
  0	
  0	
  1	
  1319068594	
  


Directory	
  Structure:	
  
/data/submissions/2002/01/01	
  
...	
  
/data/submissions/2012/03/20	
  
/data/submissions/2012/03/21	
  
A Hive Query!
select	
  
	
  	
  	
  	
  track.title,	
  size(collect_set(s.userid))	
  as	
  reach	
  
from	
  
	
  	
  	
  	
  meta_track	
  track	
  
	
  	
  	
  	
  join	
  data_submissions	
  s	
  on	
  (s.trackid	
  =	
  track.id)	
  
where	
  
	
  	
  	
  	
  s.insertdate	
  =	
  "2012-­‐03-­‐01”	
  and	
  (s.scrobble	
  +	
  s.listen	
  >	
  0)	
  
	
  	
  	
  	
  and	
  s.artistid	
  =	
  57976724	
  -­‐-­‐	
  Lana	
  Del	
  Rey	
  
group	
  by	
  
	
  	
  	
  	
  track.title	
  
order	
  by	
  
	
  	
  	
  	
  reach	
  desc	
  
limit	
  5;	
  
A Hive Query!
select	
  
	
  	
  	
  	
  track.title,	
  size(collect_set(s.userid))	
  as	
  reach	
  
from	
  
	
  	
  	
  	
  meta_track	
  track	
  
	
  	
  	
  	
  join	
  data_submissions	
  s	
  on	
  (s.trackid	
  =	
  track.id)	
  
where	
  
	
  	
  	
  	
  s.insertdate	
  =	
  "2012-­‐03-­‐01”	
  and	
  (s.scrobble	
  +	
  s.listen	
  >	
  0)	
  
	
  	
  	
  	
  and	
  s.artistid	
  =	
  57976724	
  -­‐-­‐	
  Lana	
  Del	
  Rey	
  
group	
  by	
  
	
  	
  	
  	
  track.title	
  
order	
  by	
  
	
  	
  	
  	
  reach	
  desc	
  
limit	
  5;	
  
Total	
  MapReduce	
  jobs	
  =	
  3	
  
Launching	
  Job	
  1	
  out	
  of	
  3	
  
Number	
  of	
  reduce	
  tasks	
  not	
  specified.	
  Estimated	
  from	
  input	
  data	
  size:	
  52	
  
2012-­‐03-­‐19	
  23:28:58,613	
  Stage-­‐1	
  map	
  =	
  0%,	
  	
  reduce	
  =	
  0%	
  
2012-­‐03-­‐19	
  23:29:08,765	
  Stage-­‐1	
  map	
  =	
  3%,	
  	
  reduce	
  =	
  0%	
  
2012-­‐03-­‐19	
  23:29:10,794	
  Stage-­‐1	
  map	
  =	
  9%,	
  	
  reduce	
  =	
  0%	
  
A Hive Query!
select	
  
	
  	
  	
  	
  track.title,	
  size(collect_set(s.userid))	
  as	
  reach	
  
from	
  
	
  	
  	
  	
  meta_track	
  track	
  
	
  	
  	
  	
  join	
  data_submissions	
  s	
  on	
  (s.trackid	
  =	
  track.id)	
  
where	
  
	
  	
  	
  	
  s.insertdate	
  =	
  "2012-­‐03-­‐01”	
  and	
  (s.scrobble	
  +	
  s.listen	
  >	
  0)	
  
	
  	
  	
  	
  and	
  s.artistid	
  =	
  57976724	
  -­‐-­‐	
  Lana	
  Del	
  Rey	
  
group	
  by	
  
	
  	
  	
  	
  track.title	
  
order	
  by	
  
	
  	
  	
  	
  reach	
  desc	
  
limit	
  5;	
  
Born	
  to	
  Die 	
         	
  10765	
  
Video	
  Games 	
            	
  9382	
  
Off	
  to	
  the	
  Races    	
  6569	
  
Blue	
  Jeans         	
     	
  6266	
  
National	
  Anthem           	
  5795	
                                                                       ~300	
  seconds	
  
Examples!
•    Trends in UK Listening"
•    Hadoop User Group Charts"
Trends in UK Listening!
Trends in UK Listening!
Trends in UK Listening!
select	
  
	
  	
  artistid,	
  hourOfDay,	
  
	
  	
  meanPlays,	
  stdPlays,	
  meanReach,	
  stdReach,	
  hoursInExistence,	
  
	
  	
  meanPlays	
  /	
  sqrt(hoursInExistence)	
  as	
  stdErrPlays,	
  	
  
	
  	
  meanReach	
  /	
  sqrt(hoursInExistence)	
  as	
  stdErrReach	
  
from	
  
	
  	
  (select	
  
	
  	
  	
  	
  artistCounts.artistid	
  as	
  artistid,	
  artistCounts.hourOfDay,	
  
	
  	
  	
  	
  avg(artistCounts.plays)	
  as	
  meanPlays,	
  stddev_samp(artistCounts.plays)	
  as	
  stdPlays,	
  	
  
	
  	
  	
  	
  avg(artistCounts.reach)	
  as	
  meanReach,	
  stddev_samp(artistCounts.reach)	
  as	
  stdReach,	
  
	
  	
  	
  	
  size(collect_set(concat(artistCounts.insertdate,	
  hourOfDay)))	
  as	
  hoursInExistence	
  
	
  	
  from	
  
	
  	
  	
  	
  (select	
  	
  
	
  	
  	
  	
  	
  	
  artistid,	
  insertdate,	
  hour(from_unixtime(unixtime))	
  as	
  hourOfDay,	
  	
  
	
  	
  	
  	
  	
  	
  count(*)	
  as	
  plays,	
  size(collect_set(s.userid))	
  as	
  reach	
  
	
  	
  	
  	
  from	
  
	
  	
  	
  	
  	
  	
  lookups_userid_geo	
  g	
  
	
  	
  	
  	
  	
  	
  join	
  data_submissions	
  s	
  on	
  (g.userid	
  =	
  s.userid)	
  
	
  	
  	
  	
  where	
  
	
  	
  	
  	
  	
  	
  insertdate	
  >=	
  '2011-­‐01-­‐01'	
  and	
  insertdate	
  <	
  '2012-­‐01-­‐01'	
  
	
  	
  	
  	
  	
  	
  and	
  (listen	
  +	
  scrobble)	
  >	
  0	
  	
  
	
  	
  	
  	
  	
  	
  and	
  lower(g.countrycode)	
  =	
  'gb'	
  
	
  	
  	
  	
  group	
  by	
  
	
  	
  	
  	
  	
  	
  artistid,	
  insertdate,	
  hour(from_unixtime(unixtime))	
  
	
  	
  	
  	
  )	
  artistCounts	
  
	
  	
  group	
  by	
  
	
  	
  	
  	
  artistCounts.artistid,	
  artistCounts.hourOfDay	
  
	
  	
  )	
  artistStats	
  
where	
  
	
  	
  meanReach	
  >	
  25;	
  
select	
  
	
  	
  artistid,	
  hourOfDay,	
  
	
  	
  meanPlays,	
  stdPlays,	
  meanReach,	
  stdReach,	
  hoursInExistence,	
  
	
  	
  meanPlays	
  /	
  sqrt(hoursInExistence)	
  as	
  stdErrPlays,	
  	
  
	
  	
  meanReach	
  /	
  sqrt(hoursInExistence)	
  as	
  stdErrReach	
  
from	
  
	
  	
  (select	
  
	
  	
  	
  	
  artistCounts.artistid	
  as	
  artistid,	
  artistCounts.hourOfDay,	
  
	
  	
  	
  	
  avg(artistCounts.plays)	
  as	
  meanPlays,	
  stddev_samp(artistCounts.plays)	
  as	
  stdPlays,	
  	
  
	
  	
  	
  	
  avg(artistCounts.reach)	
  as	
  meanReach,	
  stddev_samp(artistCounts.reach)	
  as	
  stdReach,	
  
	
  	
  	
  	
  size(collect_set(concat(artistCounts.insertdate,	
  hourOfDay)))	
  as	
  hoursInExistence	
  
	
  	
  from	
  
	
  	
  	
  	
  (select	
  	
  
	
  	
  	
  	
  	
  	
  artistid,	
  insertdate,	
  hour(from_unixtime(unixtime))	
  as	
  hourOfDay,	
  	
  
	
  	
  	
  	
  	
  	
  count(*)	
  as	
  plays,	
  size(collect_set(s.userid))	
  as	
  reach	
  
	
  	
  	
  	
  from	
  
	
  	
  	
  	
  	
  	
  lookups_userid_geo	
  g	
  
	
  	
  	
  	
  	
  	
  join	
  data_submissions	
  s	
  on	
  (g.userid	
  =	
  s.userid)	
  
	
  	
  	
  	
  where	
  
	
  	
  	
  	
  	
  	
  insertdate	
  >=	
  '2011-­‐01-­‐01'	
  and	
  insertdate	
  <	
  '2012-­‐01-­‐01'	
  
	
  	
  	
  	
  	
  	
  and	
  (listen	
  +	
  scrobble)	
  >	
  0	
  	
  
	
  	
  	
  	
  	
  	
  and	
  lower(g.countrycode)	
  =	
  'gb'	
  
	
  	
  	
  	
  group	
  by	
  
	
  	
  	
  	
  	
  	
  artistid,	
  insertdate,	
  hour(from_unixtime(unixtime))	
  
	
  	
  	
  	
  )	
  artistCounts	
  
	
  	
  group	
  by	
  
	
  	
  	
  	
  artistCounts.artistid,	
  artistCounts.hourOfDay	
  
	
  	
  )	
  artistStats	
  
where	
  
	
  	
  meanReach	
  >	
  25;	
  
select	
  
	
  	
  artistid,	
  hourOfDay,	
  
	
  	
  meanPlays,	
  stdPlays,	
  meanReach,	
  stdReach,	
  hoursInExistence,	
  
	
  	
  meanPlays	
  /	
  sqrt(hoursInExistence)	
  as	
  stdErrPlays,	
  	
  
	
  	
  meanReach	
  /	
  sqrt(hoursInExistence)	
  as	
  stdErrReach	
  
from	
  
	
  	
  (select	
  
	
  	
  	
  	
  artistCounts.artistid	
  as	
  artistid,	
  artistCounts.hourOfDay,	
  
	
  	
  	
  	
  avg(artistCounts.plays)	
  as	
  meanPlays,	
  stddev_samp(artistCounts.plays)	
  as	
  stdPlays,	
  	
  
	
  	
  	
  	
  avg(artistCounts.reach)	
  as	
  meanReach,	
  stddev_samp(artistCounts.reach)	
  as	
  stdReach,	
  
	
  	
  	
  	
  size(collect_set(concat(artistCounts.insertdate,	
  hourOfDay)))	
  as	
  hoursInExistence	
  
	
  	
  from	
  
	
  	
  	
  	
  (select	
  	
  
	
  	
  	
  	
  	
  	
  artistid,	
  insertdate,	
  hour(from_unixtime(unixtime))	
  as	
  hourOfDay,	
  	
  
	
  	
  	
  	
  	
  	
  count(*)	
  as	
  plays,	
  size(collect_set(s.userid))	
  as	
  reach	
  
	
  	
  	
  	
  from	
  
	
  	
  	
  	
  	
  	
  lookups_userid_geo	
  g	
  
	
  	
  	
  	
  	
  	
  join	
  data_submissions	
  s	
  on	
  (g.userid	
  =	
  s.userid)	
  
	
  	
  	
  	
  where	
  
	
  	
  	
  	
  	
  	
  insertdate	
  >=	
  '2011-­‐01-­‐01'	
  and	
  insertdate	
  <	
  '2012-­‐01-­‐01'	
  
	
  	
  	
  	
  	
  	
  and	
  (listen	
  +	
  scrobble)	
  >	
  0	
  	
  
	
  	
  	
  	
  	
  	
  and	
  lower(g.countrycode)	
  =	
  'gb'	
  
	
  	
  	
  	
  group	
  by	
  
	
  	
  	
  	
  	
  	
  artistid,	
  insertdate,	
  hour(from_unixtime(unixtime))	
  
	
  	
  	
  	
  )	
  artistCounts	
  
	
  	
  group	
  by	
  
	
  	
  	
  	
  artistCounts.artistid,	
  artistCounts.hourOfDay	
  
	
  	
  )	
  artistStats	
  
where	
  
	
  	
  meanReach	
  >	
  25;	
  
So far
                                                                             !

•    Test data: listening statistics for each artist, in each hour of the day"
•    Base data: averaged hourly statistics for each artist"

•    Next step: compare them"
Comparison!

select	
  	
  
	
  	
  test.artistid,	
  	
  
	
  	
  test.meanReach,	
  base.meanReach,	
  
	
  	
  test.stdReach,	
  base.stdReach,	
  
	
  	
  test.stdErrReach,	
  base.stdErrReach,	
  
	
  	
  (test.meanReach	
  -­‐	
  base.meanReach)	
  /	
  (base.stdReach)	
  as	
  zScore,	
  
	
  	
  (test.meanReach	
  -­‐	
  base.meanReach)	
  /	
  (base.stdErrReach	
  *	
  test.stdErrReach)	
  as	
  	
  	
  	
  	
   	
  
          	
  deviation	
  
from	
  
	
  	
  omar_uk_artist_base	
  base	
  
	
  	
  join	
  omar_uk_artist_hours	
  test	
  on	
  (base.artistid	
  =	
  test.artistid)	
  
where	
  
	
  	
  test.hourOfDay	
  =	
  15	
  
order	
  by	
  
	
  	
  deviation	
  desc	
  
limit	
  5;	
  
Trends in UK Listening!
Summary!

•  Hive is easy to use"
•  It sits comfortably on top of a Hadoop infrastructure"
•  Familiar if you know SQL"
•  Can ask big questions"
•  Can ask wide ranging questions"
•  Allows analyses that would otherwise need a lot of
   preliminary work "
"
HUG Charts!
Any Questions?!

Contenu connexe

Tendances

American english-file-1-student-book
American english-file-1-student-bookAmerican english-file-1-student-book
American english-file-1-student-bookNadia Agusto
 
Tafseer Ibn-e-Katheer Part 7 (urdu)
Tafseer Ibn-e-Katheer Part 7 (urdu)Tafseer Ibn-e-Katheer Part 7 (urdu)
Tafseer Ibn-e-Katheer Part 7 (urdu)World
 
Ontario ombudsman annualreport0910-en-web
Ontario ombudsman annualreport0910-en-webOntario ombudsman annualreport0910-en-web
Ontario ombudsman annualreport0910-en-webROSEMARY DECAIRES
 
Finger knitting jp
Finger knitting jpFinger knitting jp
Finger knitting jppjmanley41
 
Fkr Shoppersstopbandra
Fkr ShoppersstopbandraFkr Shoppersstopbandra
Fkr Shoppersstopbandrafkr12358
 
LANDSCAPE CONSTRUCTION STUDY PROJECT OF INTERNATIONAL OUTDOOR GARDEN ...
LANDSCAPE   CONSTRUCTION  STUDY  PROJECT  OF  INTERNATIONAL  OUTDOOR  GARDEN ...LANDSCAPE   CONSTRUCTION  STUDY  PROJECT  OF  INTERNATIONAL  OUTDOOR  GARDEN ...
LANDSCAPE CONSTRUCTION STUDY PROJECT OF INTERNATIONAL OUTDOOR GARDEN ...aekapon
 
02 Cryptography History-v1.0
02 Cryptography History-v1.002 Cryptography History-v1.0
02 Cryptography History-v1.0Vahab Mahboubi
 
Presentation 2
Presentation 2Presentation 2
Presentation 2kunmo
 
Artifacts and Symbols of everyday life from Kerala
Artifacts and Symbols of everyday life from KeralaArtifacts and Symbols of everyday life from Kerala
Artifacts and Symbols of everyday life from KeralaAnand Nair
 
Metro Lagos: No Man's Land
Metro Lagos: No Man's LandMetro Lagos: No Man's Land
Metro Lagos: No Man's LandMojisola Adigun
 
God"s Plan Of Salvation
God"s Plan Of SalvationGod"s Plan Of Salvation
God"s Plan Of Salvationgracego2
 
12 chapter
12 chapter12 chapter
12 chaptermadhuvel
 
Mon of presentation ctl project mongolia
Mon of presentation ctl project mongoliaMon of presentation ctl project mongolia
Mon of presentation ctl project mongoliaNomio ND
 
MONSTER IN SUMMER
MONSTER IN SUMMERMONSTER IN SUMMER
MONSTER IN SUMMERsansourcing
 

Tendances (19)

American english-file-1-student-book
American english-file-1-student-bookAmerican english-file-1-student-book
American english-file-1-student-book
 
Tafseer Ibn-e-Katheer Part 7 (urdu)
Tafseer Ibn-e-Katheer Part 7 (urdu)Tafseer Ibn-e-Katheer Part 7 (urdu)
Tafseer Ibn-e-Katheer Part 7 (urdu)
 
Bs 8118-1
Bs 8118-1Bs 8118-1
Bs 8118-1
 
K S - ZG - SDEX 71
K S - ZG - SDEX 71K S - ZG - SDEX 71
K S - ZG - SDEX 71
 
Ontario ombudsman annualreport0910-en-web
Ontario ombudsman annualreport0910-en-webOntario ombudsman annualreport0910-en-web
Ontario ombudsman annualreport0910-en-web
 
Finger knitting jp
Finger knitting jpFinger knitting jp
Finger knitting jp
 
Fkr Shoppersstopbandra
Fkr ShoppersstopbandraFkr Shoppersstopbandra
Fkr Shoppersstopbandra
 
LANDSCAPE CONSTRUCTION STUDY PROJECT OF INTERNATIONAL OUTDOOR GARDEN ...
LANDSCAPE   CONSTRUCTION  STUDY  PROJECT  OF  INTERNATIONAL  OUTDOOR  GARDEN ...LANDSCAPE   CONSTRUCTION  STUDY  PROJECT  OF  INTERNATIONAL  OUTDOOR  GARDEN ...
LANDSCAPE CONSTRUCTION STUDY PROJECT OF INTERNATIONAL OUTDOOR GARDEN ...
 
02 Cryptography History-v1.0
02 Cryptography History-v1.002 Cryptography History-v1.0
02 Cryptography History-v1.0
 
New Deck
New DeckNew Deck
New Deck
 
Presentation 2
Presentation 2Presentation 2
Presentation 2
 
Artifacts and Symbols of everyday life from Kerala
Artifacts and Symbols of everyday life from KeralaArtifacts and Symbols of everyday life from Kerala
Artifacts and Symbols of everyday life from Kerala
 
Metro Lagos: No Man's Land
Metro Lagos: No Man's LandMetro Lagos: No Man's Land
Metro Lagos: No Man's Land
 
God"s Plan Of Salvation
God"s Plan Of SalvationGod"s Plan Of Salvation
God"s Plan Of Salvation
 
Sol Hotels by Meliá Cuba
Sol Hotels by Meliá CubaSol Hotels by Meliá Cuba
Sol Hotels by Meliá Cuba
 
Do you know?
Do you know? Do you know?
Do you know?
 
12 chapter
12 chapter12 chapter
12 chapter
 
Mon of presentation ctl project mongolia
Mon of presentation ctl project mongoliaMon of presentation ctl project mongolia
Mon of presentation ctl project mongolia
 
MONSTER IN SUMMER
MONSTER IN SUMMERMONSTER IN SUMMER
MONSTER IN SUMMER
 

Similaire à Hive at Last.fm

Software Quality Analysis with Alitheia Core
Software Quality Analysis with Alitheia CoreSoftware Quality Analysis with Alitheia Core
Software Quality Analysis with Alitheia CoreGeorgios Gousios
 
4 bis 6 leipzig
4 bis 6 leipzig4 bis 6 leipzig
4 bis 6 leipzigbfnd
 
Boom startup overview
Boom startup overviewBoom startup overview
Boom startup overviewbjb84
 
왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)
왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)
왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)Jae-min Sung
 
SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)
SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)
SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)Cytech
 
Digital Typography: Font Management - ebookcraft 2016 - Charles Nix
Digital Typography: Font Management - ebookcraft 2016 - Charles NixDigital Typography: Font Management - ebookcraft 2016 - Charles Nix
Digital Typography: Font Management - ebookcraft 2016 - Charles NixBookNet Canada
 
情報発信・受信の新しいツール
情報発信・受信の新しいツール情報発信・受信の新しいツール
情報発信・受信の新しいツールkey-cc yamaguchiintlab
 
Wow Ppt With Numbers
Wow Ppt With NumbersWow Ppt With Numbers
Wow Ppt With Numbersromertz
 
110917 트리즈 강의 서면_pdf
110917 트리즈 강의 서면_pdf110917 트리즈 강의 서면_pdf
110917 트리즈 강의 서면_pdf형희 김
 
Zipcar (HBR Case Study)
Zipcar (HBR Case Study)Zipcar (HBR Case Study)
Zipcar (HBR Case Study)Daniel Zhao
 
Arch Final Resume Ag
Arch Final Resume AgArch Final Resume Ag
Arch Final Resume Agannegrima
 
Workforce Needs of the California Solar Industry
Workforce Needs of the California Solar IndustryWorkforce Needs of the California Solar Industry
Workforce Needs of the California Solar IndustryJoel West
 
스토리텔링 프리젠테이션 Prezi
스토리텔링 프리젠테이션 Prezi 스토리텔링 프리젠테이션 Prezi
스토리텔링 프리젠테이션 Prezi Jinho Jung
 
Visual Merchandising Portfolio
Visual Merchandising PortfolioVisual Merchandising Portfolio
Visual Merchandising PortfolioAdrianneMarieMoll
 

Similaire à Hive at Last.fm (20)

Software Quality Analysis with Alitheia Core
Software Quality Analysis with Alitheia CoreSoftware Quality Analysis with Alitheia Core
Software Quality Analysis with Alitheia Core
 
ARCHITECTURAL ORDERS
ARCHITECTURAL ORDERSARCHITECTURAL ORDERS
ARCHITECTURAL ORDERS
 
Chap Drive 1
Chap Drive 1Chap Drive 1
Chap Drive 1
 
4 bis 6 leipzig
4 bis 6 leipzig4 bis 6 leipzig
4 bis 6 leipzig
 
Boom startup overview
Boom startup overviewBoom startup overview
Boom startup overview
 
왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)
왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)
왕초보를 위한 아이패드 설명서(아이패드로 무엇을 할 수 있을까?)
 
SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)
SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)
SNSとソーシャルアプリケーション/ソーシャルアプリ勉強会(第2回)
 
Digital Typography: Font Management - ebookcraft 2016 - Charles Nix
Digital Typography: Font Management - ebookcraft 2016 - Charles NixDigital Typography: Font Management - ebookcraft 2016 - Charles Nix
Digital Typography: Font Management - ebookcraft 2016 - Charles Nix
 
情報発信・受信の新しいツール
情報発信・受信の新しいツール情報発信・受信の新しいツール
情報発信・受信の新しいツール
 
Orac value of foods
Orac value of foodsOrac value of foods
Orac value of foods
 
Retension Process Black Belt Project Storyboard
Retension Process Black Belt Project StoryboardRetension Process Black Belt Project Storyboard
Retension Process Black Belt Project Storyboard
 
Lean Six Sigma Black Belt Project Storyboard - The Retension Process
Lean Six Sigma Black Belt Project Storyboard - The Retension ProcessLean Six Sigma Black Belt Project Storyboard - The Retension Process
Lean Six Sigma Black Belt Project Storyboard - The Retension Process
 
Wow Ppt With Numbers
Wow Ppt With NumbersWow Ppt With Numbers
Wow Ppt With Numbers
 
110917 트리즈 강의 서면_pdf
110917 트리즈 강의 서면_pdf110917 트리즈 강의 서면_pdf
110917 트리즈 강의 서면_pdf
 
Zipcar (HBR Case Study)
Zipcar (HBR Case Study)Zipcar (HBR Case Study)
Zipcar (HBR Case Study)
 
Arch Final Resume Ag
Arch Final Resume AgArch Final Resume Ag
Arch Final Resume Ag
 
Workforce Needs of the California Solar Industry
Workforce Needs of the California Solar IndustryWorkforce Needs of the California Solar Industry
Workforce Needs of the California Solar Industry
 
스토리텔링 프리젠테이션 Prezi
스토리텔링 프리젠테이션 Prezi 스토리텔링 프리젠테이션 Prezi
스토리텔링 프리젠테이션 Prezi
 
Visual Merchandising Portfolio
Visual Merchandising PortfolioVisual Merchandising Portfolio
Visual Merchandising Portfolio
 
Liberty Overview 11 22 09 Dreamquest
Liberty Overview 11 22 09 DreamquestLiberty Overview 11 22 09 Dreamquest
Liberty Overview 11 22 09 Dreamquest
 

Plus de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Plus de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Dernier

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Dernier (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Hive at Last.fm

  • 1. Hive at Last.fm! Omar Ali - Data Developer! March 2012!
  • 2. Overview! •  Hadoop at Last.fm" •  Hive" •  Examples" What I want to show you:" •  How it fits with a Hadoop infrastructure" •  Typical workflow with Hive" •  Ease of use for experiments and prototypes!
  • 3. Hadoop! •  Brief overview of our infrastructure" •  How we use it" "
  • 5. Charts!                                                      
  • 6.
  • 7. Hive! •  What is Hive?" •  How does it fit in with the rest of our system?" •  Using existing data in Hive" •  Example query"
  • 8. What is Hive?! •  Data Warehouse" •  You see your data in the form of tables" •  Query language very similar to SQL" hive>  show  tables  like   hive>  describe   'omar_charts_*';   omar_charts_tagcloud_album;               OK   OK   omar_charts_globaltags_album   albumid  int   omar_charts_globaltags_artist   tagid      int   omar_charts_globaltags_track   weight    double   omar_charts_tagcloud_album   omar_charts_tagcloud_artist   omar_charts_tagcloud_track  
  • 9. What is a table?! Standard ! External! ! " •  Metadata stored by Hive" •  Metadata stored by Hive" •  Table data stored by Hive" •  Table data referenced by Hive" •  Deleting the table deletes the data •  Deleting the table only deletes the and the metadata! metadata!
  • 10. What is a table?! Standard ! External! ! " •  Metadata stored by Hive" •  Metadata stored by Hive" •  Table data stored by Hive" •  Table data referenced by Hive" •  Deleting the table deletes the data •  Deleting the table only deletes the and the metadata! metadata! Database  Tables   Log  Files  
  • 11. Example: scrobbles ! Scrobble  Log:   13364451  30886670  217803052  358001787  0  0  0  1  0  0  1319068581   42875138  1717  3776668  4641276  0  0  0  1  0  0  1319068445   43108664  1003811  2237730  1019632  0  0  0  1  0  0  1319068783   36107186  1033304  2393940  13409429  0  0  0  0  0  1  1319068524   23842745  1261965  2349564  14091069  0  0  0  0  0  1  1319068594   Directory  Structure:   /data/submissions/2002/01/01   ...   /data/submissions/2012/03/20   /data/submissions/2012/03/21  
  • 12. A Hive Query! select          track.title,  size(collect_set(s.userid))  as  reach   from          meta_track  track          join  data_submissions  s  on  (s.trackid  =  track.id)   where          s.insertdate  =  "2012-­‐03-­‐01”  and  (s.scrobble  +  s.listen  >  0)          and  s.artistid  =  57976724  -­‐-­‐  Lana  Del  Rey   group  by          track.title   order  by          reach  desc   limit  5;  
  • 13. A Hive Query! select          track.title,  size(collect_set(s.userid))  as  reach   from          meta_track  track          join  data_submissions  s  on  (s.trackid  =  track.id)   where          s.insertdate  =  "2012-­‐03-­‐01”  and  (s.scrobble  +  s.listen  >  0)          and  s.artistid  =  57976724  -­‐-­‐  Lana  Del  Rey   group  by          track.title   order  by          reach  desc   limit  5;   Total  MapReduce  jobs  =  3   Launching  Job  1  out  of  3   Number  of  reduce  tasks  not  specified.  Estimated  from  input  data  size:  52   2012-­‐03-­‐19  23:28:58,613  Stage-­‐1  map  =  0%,    reduce  =  0%   2012-­‐03-­‐19  23:29:08,765  Stage-­‐1  map  =  3%,    reduce  =  0%   2012-­‐03-­‐19  23:29:10,794  Stage-­‐1  map  =  9%,    reduce  =  0%  
  • 14. A Hive Query! select          track.title,  size(collect_set(s.userid))  as  reach   from          meta_track  track          join  data_submissions  s  on  (s.trackid  =  track.id)   where          s.insertdate  =  "2012-­‐03-­‐01”  and  (s.scrobble  +  s.listen  >  0)          and  s.artistid  =  57976724  -­‐-­‐  Lana  Del  Rey   group  by          track.title   order  by          reach  desc   limit  5;   Born  to  Die    10765   Video  Games    9382   Off  to  the  Races  6569   Blue  Jeans    6266   National  Anthem  5795   ~300  seconds  
  • 15. Examples! •  Trends in UK Listening" •  Hadoop User Group Charts"
  • 16. Trends in UK Listening!
  • 17. Trends in UK Listening!
  • 18. Trends in UK Listening!
  • 19. select      artistid,  hourOfDay,      meanPlays,  stdPlays,  meanReach,  stdReach,  hoursInExistence,      meanPlays  /  sqrt(hoursInExistence)  as  stdErrPlays,        meanReach  /  sqrt(hoursInExistence)  as  stdErrReach   from      (select          artistCounts.artistid  as  artistid,  artistCounts.hourOfDay,          avg(artistCounts.plays)  as  meanPlays,  stddev_samp(artistCounts.plays)  as  stdPlays,            avg(artistCounts.reach)  as  meanReach,  stddev_samp(artistCounts.reach)  as  stdReach,          size(collect_set(concat(artistCounts.insertdate,  hourOfDay)))  as  hoursInExistence      from          (select                artistid,  insertdate,  hour(from_unixtime(unixtime))  as  hourOfDay,                count(*)  as  plays,  size(collect_set(s.userid))  as  reach          from              lookups_userid_geo  g              join  data_submissions  s  on  (g.userid  =  s.userid)          where              insertdate  >=  '2011-­‐01-­‐01'  and  insertdate  <  '2012-­‐01-­‐01'              and  (listen  +  scrobble)  >  0                and  lower(g.countrycode)  =  'gb'          group  by              artistid,  insertdate,  hour(from_unixtime(unixtime))          )  artistCounts      group  by          artistCounts.artistid,  artistCounts.hourOfDay      )  artistStats   where      meanReach  >  25;  
  • 20. select      artistid,  hourOfDay,      meanPlays,  stdPlays,  meanReach,  stdReach,  hoursInExistence,      meanPlays  /  sqrt(hoursInExistence)  as  stdErrPlays,        meanReach  /  sqrt(hoursInExistence)  as  stdErrReach   from      (select          artistCounts.artistid  as  artistid,  artistCounts.hourOfDay,          avg(artistCounts.plays)  as  meanPlays,  stddev_samp(artistCounts.plays)  as  stdPlays,            avg(artistCounts.reach)  as  meanReach,  stddev_samp(artistCounts.reach)  as  stdReach,          size(collect_set(concat(artistCounts.insertdate,  hourOfDay)))  as  hoursInExistence      from          (select                artistid,  insertdate,  hour(from_unixtime(unixtime))  as  hourOfDay,                count(*)  as  plays,  size(collect_set(s.userid))  as  reach          from              lookups_userid_geo  g              join  data_submissions  s  on  (g.userid  =  s.userid)          where              insertdate  >=  '2011-­‐01-­‐01'  and  insertdate  <  '2012-­‐01-­‐01'              and  (listen  +  scrobble)  >  0                and  lower(g.countrycode)  =  'gb'          group  by              artistid,  insertdate,  hour(from_unixtime(unixtime))          )  artistCounts      group  by          artistCounts.artistid,  artistCounts.hourOfDay      )  artistStats   where      meanReach  >  25;  
  • 21. select      artistid,  hourOfDay,      meanPlays,  stdPlays,  meanReach,  stdReach,  hoursInExistence,      meanPlays  /  sqrt(hoursInExistence)  as  stdErrPlays,        meanReach  /  sqrt(hoursInExistence)  as  stdErrReach   from      (select          artistCounts.artistid  as  artistid,  artistCounts.hourOfDay,          avg(artistCounts.plays)  as  meanPlays,  stddev_samp(artistCounts.plays)  as  stdPlays,            avg(artistCounts.reach)  as  meanReach,  stddev_samp(artistCounts.reach)  as  stdReach,          size(collect_set(concat(artistCounts.insertdate,  hourOfDay)))  as  hoursInExistence      from          (select                artistid,  insertdate,  hour(from_unixtime(unixtime))  as  hourOfDay,                count(*)  as  plays,  size(collect_set(s.userid))  as  reach          from              lookups_userid_geo  g              join  data_submissions  s  on  (g.userid  =  s.userid)          where              insertdate  >=  '2011-­‐01-­‐01'  and  insertdate  <  '2012-­‐01-­‐01'              and  (listen  +  scrobble)  >  0                and  lower(g.countrycode)  =  'gb'          group  by              artistid,  insertdate,  hour(from_unixtime(unixtime))          )  artistCounts      group  by          artistCounts.artistid,  artistCounts.hourOfDay      )  artistStats   where      meanReach  >  25;  
  • 22. So far ! •  Test data: listening statistics for each artist, in each hour of the day" •  Base data: averaged hourly statistics for each artist" •  Next step: compare them"
  • 23. Comparison! select        test.artistid,        test.meanReach,  base.meanReach,      test.stdReach,  base.stdReach,      test.stdErrReach,  base.stdErrReach,      (test.meanReach  -­‐  base.meanReach)  /  (base.stdReach)  as  zScore,      (test.meanReach  -­‐  base.meanReach)  /  (base.stdErrReach  *  test.stdErrReach)  as              deviation   from      omar_uk_artist_base  base      join  omar_uk_artist_hours  test  on  (base.artistid  =  test.artistid)   where      test.hourOfDay  =  15   order  by      deviation  desc   limit  5;  
  • 24. Trends in UK Listening!
  • 25. Summary! •  Hive is easy to use" •  It sits comfortably on top of a Hadoop infrastructure" •  Familiar if you know SQL" •  Can ask big questions" •  Can ask wide ranging questions" •  Allows analyses that would otherwise need a lot of preliminary work " "