Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Imhotep Workshop 
http://go.indeed.com/iws
engineering.indeed.com/talks
@IndeedEng Workshop: 
Interactive Analytics 
with Imhotep
Tom Bergman 
Product Manager
Is anybody there???
Does this thing work???
Are we winning???
Harder questions???
What is Imhotep? 
Imhotep is Indeed’s highly scalable 
open-source analytics platform.
Imhotep open source included: 
● Imhotep Daemons 
● Imhotep Query Language (IQL) 
● IQL Web Client 
● TSV/CSV Uploader
What does Imhotep do? 
● Easy upload & compression 
● Fast, Interactive queries
Imhotep Philosophy
Interactive 
● Quickly refine your questions
Time to the right question 
SOME TIME LATER… 
Oh, bummer. Wrong question. Let’s try again. 
Nope. Nope. YES! 
Next questio...
Ask all the questions! 
Cool! Really? 
Wow... Awesome 
Oh… Ah! INSIGHT! …
Ground Truth 
● Data should not be down-sampled
Show me the data 
● Web-based to facilitate sharing
Cache Rules Everything Around Me 
● Instantaneous sharing
Easy Access 
● Easily queryable
Imhotep Data Structures 
● Dataset > 
● Document > 
● Field > 
● Term > 
● DB Table 
● Denormalized Row 
● Column 
● Value
Imhotep Query Language 
(IQL)
IQL - Imhotep Query Language 
Expressive SQL-like language for 
aggregate analytics.
IQL queries - requirements 
Dataset 
Date range
IQL queries - optional 
Dataset 
Date range 
Filters 
Group by 
Metrics
IQL - Dataset 
from searchresults 
‘2013-12-05’ 
‘2013-12-10’ 
where country=ie 
and jobagedays<1 
group by time(1d) 
sele...
IQL - Date Range 
from searchresults 
‘2013-12-05’ 
‘2013-12-10’ 
where country=ie 
and jobagedays<1 
group by time(1d) 
s...
IQL - Filters 
from searchresults 
‘2013-12-05’ 
‘2013-12-10’ 
where country=ie 
and jobagedays<1 
group by time(1d) 
sele...
IQL - Group by 
from searchresults 
‘2013-12-05’ 
‘2013-12-10’ 
where country=ie 
and jobagedays<1 
group by time(1d) 
sel...
IQL - Metrics 
from searchresults 
‘2013-12-05’ 
‘2013-12-10’ 
where country=ie 
and jobagedays<1 
group by time(1d) 
sele...
IQL - Example Results
Extract Transform Load 
(ETL)
Extract 
● Up to your data schema
Transform 
● De-Normalize Data 
● Formating
Example Datasets 
● Jobsearch 
● Ad Clicks 
● Resume Contacts
Load 
● TSV/CSV Uploader
Load 
● TSV/CSV Uploader 
● Java API
Inverted Index 
● Massive compression 
● Fast Boolean Search
Opensource Package 
● Run on any* computer/s 
● Can deploy to AWS using handy 
AWS CloudFormation script
CloudFormation Setup 
● Create S3 buckets 
● Create EC2 Key Pair 
● Run CloudFormation script
DEMO
Q&A
Helpful Workshop Links 
http://go.indeed.com/iws
[@IndeedEng] Imhotep Workshop
[@IndeedEng] Imhotep Workshop
[@IndeedEng] Imhotep Workshop
[@IndeedEng] Imhotep Workshop
Prochain SlideShare
Chargement dans…5
×

[@IndeedEng] Imhotep Workshop

Link to video: http://youtu.be/LBDZFtqL-ck?list=UURVEh0SlyrZNTeIbEDwj3wQ

We are excited to announce the open source availability of Imhotep, the interactive data analytics platform that powers data-driven decision making at Indeed.

In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. Next we showed how our engineering and product organizations use Imhotep to focus on key metrics at scale. During this session, Product Manager Tom Bergman provided examples of valuable insights that can be gained by using Imhotep. After the presentation, attendees explored their own data in Imhotep. Product engineers were on hand to answer questions.

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

[@IndeedEng] Imhotep Workshop

  1. 1. Imhotep Workshop http://go.indeed.com/iws
  2. 2. engineering.indeed.com/talks
  3. 3. @IndeedEng Workshop: Interactive Analytics with Imhotep
  4. 4. Tom Bergman Product Manager
  5. 5. Is anybody there???
  6. 6. Does this thing work???
  7. 7. Are we winning???
  8. 8. Harder questions???
  9. 9. What is Imhotep? Imhotep is Indeed’s highly scalable open-source analytics platform.
  10. 10. Imhotep open source included: ● Imhotep Daemons ● Imhotep Query Language (IQL) ● IQL Web Client ● TSV/CSV Uploader
  11. 11. What does Imhotep do? ● Easy upload & compression ● Fast, Interactive queries
  12. 12. Imhotep Philosophy
  13. 13. Interactive ● Quickly refine your questions
  14. 14. Time to the right question SOME TIME LATER… Oh, bummer. Wrong question. Let’s try again. Nope. Nope. YES! Next question? Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
  15. 15. Ask all the questions! Cool! Really? Wow... Awesome Oh… Ah! INSIGHT! …
  16. 16. Ground Truth ● Data should not be down-sampled
  17. 17. Show me the data ● Web-based to facilitate sharing
  18. 18. Cache Rules Everything Around Me ● Instantaneous sharing
  19. 19. Easy Access ● Easily queryable
  20. 20. Imhotep Data Structures ● Dataset > ● Document > ● Field > ● Term > ● DB Table ● Denormalized Row ● Column ● Value
  21. 21. Imhotep Query Language (IQL)
  22. 22. IQL - Imhotep Query Language Expressive SQL-like language for aggregate analytics.
  23. 23. IQL queries - requirements Dataset Date range
  24. 24. IQL queries - optional Dataset Date range Filters Group by Metrics
  25. 25. IQL - Dataset from searchresults ‘2013-12-05’ ‘2013-12-10’ where country=ie and jobagedays<1 group by time(1d) select clicked, count() Dataset
  26. 26. IQL - Date Range from searchresults ‘2013-12-05’ ‘2013-12-10’ where country=ie and jobagedays<1 group by time(1d) select clicked, count() Date Range
  27. 27. IQL - Filters from searchresults ‘2013-12-05’ ‘2013-12-10’ where country=ie and jobagedays<1 group by time(1d) select clicked, count() Filters
  28. 28. IQL - Group by from searchresults ‘2013-12-05’ ‘2013-12-10’ where country=ie and jobagedays<1 group by time(1d) select clicked, count() Groups
  29. 29. IQL - Metrics from searchresults ‘2013-12-05’ ‘2013-12-10’ where country=ie and jobagedays<1 group by time(1d) select clicked, count() Metrics
  30. 30. IQL - Example Results
  31. 31. Extract Transform Load (ETL)
  32. 32. Extract ● Up to your data schema
  33. 33. Transform ● De-Normalize Data ● Formating
  34. 34. Example Datasets ● Jobsearch ● Ad Clicks ● Resume Contacts
  35. 35. Load ● TSV/CSV Uploader
  36. 36. Load ● TSV/CSV Uploader ● Java API
  37. 37. Inverted Index ● Massive compression ● Fast Boolean Search
  38. 38. Opensource Package ● Run on any* computer/s ● Can deploy to AWS using handy AWS CloudFormation script
  39. 39. CloudFormation Setup ● Create S3 buckets ● Create EC2 Key Pair ● Run CloudFormation script
  40. 40. DEMO
  41. 41. Q&A
  42. 42. Helpful Workshop Links http://go.indeed.com/iws

×