Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop

4 002 vues

Publié le

Publié dans : Technologie
  • Soyez le premier à commenter

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop

  1. 1. Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster and Jonathan Seidman Chicago Data Summit April 26 | 2011
  2. 2. Who We Are <ul><li>Robert Lancaster </li></ul><ul><ul><li>Solutions Architect, Hotel Supply Team </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>@rob1lancaster </li></ul></ul><ul><li>Jonathan Seidman </li></ul><ul><ul><li>Lead Engineer, Business Intelligence/Big Data Team </li></ul></ul><ul><ul><li>Co-founder/organizer of Chicago Hadoop User Group (http://www.meetup.com/Chicago-area-Hadoop-User-Group-CHUG ) </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>@jseidman </li></ul></ul>page
  3. 3. page Launched: 2001, Chicago, IL
  4. 4. page Why are we using Hadoop? Stop me if you’ve heard this before…
  5. 5. <ul><li>On Orbitz alone we do millions of searches and transactions daily, which leads to hundreds of gigabytes of log data every day. </li></ul>page
  6. 6. page Hadoop provides us with efficient, economical, scalable, and reliable storage and processing of these large amounts of data. $ per TB
  7. 7. And… page Hadoop places no constraints on how data is processed.
  8. 8. Before Hadoop page
  9. 9. page With Hadoop
  10. 10. page Access to this non-transactional data enables a number of applications…
  11. 11. Optimizing Hotel Search page
  12. 12. Recommendations page
  13. 13. Page Performance Tracking page
  14. 14. Cache Analysis page A small number of queries (3%) make up more than a third of search volume.
  15. 15. User Segmentation page
  16. 16. All of this is great, but… <ul><li>Most of these efforts are driven by development teams. </li></ul><ul><li>The challenge now is to unlock the value in this data by making it more available to the rest of the organization. </li></ul>page
  17. 17. page “ Given the ubiquity of data in modern organizations, a data warehouse can keep pace today only by being “magnetic”: attracting all the data sources that crop up within an organization regardless of data quality niceties.”* *MAD Skills: New Analysis Practices for Big Data
  18. 18. page In a better world…
  19. 19. Integrating Hadoop with the Enterprise Data Warehouse Robert Lancaster and Jonathan Seidman Chicago Data Summit April 26 | 2011
  20. 20. page The goal is a unified view of the data, allowing us to use the power of our existing tools for reporting and analysis.
  21. 21. page BI vendors are working on integration with Hadoop…
  22. 22. And one more reporting tool… page
  23. 23. Example Processing Pipeline for Web Analytics Data page
  24. 24. Aggregating data for import into Data Warehouse page
  25. 25. page Example Use Case: Beta Data Processing
  26. 26. Example Use Case – Beta Data Processing page
  27. 27. Example Use Case – Beta Data Processing Output page
  28. 28. page Example Use Case: RCDC Processing
  29. 29. Example Use Case – RCDC Processing page
  30. 30. page Example Use Case: Click Data Processing
  31. 31. Click Data Processing – Current DW Processing page Web Server Logs ETL DW Data Cleansing (Stored procedure) DW Web Server Web Servers 3 hours 2 hours ~20% original data size
  32. 32. Click Data Processing – New Hadoop Processing page Web Server Logs HDFS Data Cleansing (MapReduce) DW Web Server Web Servers
  33. 33. Conclusions <ul><li>Market is still immature, but Hadoop has already become a valuable business intelligence tool, and will become an increasingly important part of a BI infrastructure. </li></ul><ul><li>Hadoop won’t replace your EDW, but any organization with a large EDW should at least be exploring Hadoop as a complement to their BI infrastructure. </li></ul><ul><li>Use Hadoop to offload the time and resource intensive processing of large data sets so you can free up your data warehouse to serve user needs. </li></ul><ul><li>The challenge now is making Hadoop more accessible to non-developers. Vendors are addressing this, so expect rapid advancements in Hadoop accessibility. </li></ul>page
  34. 34. Oh, and also… <ul><li>Orbitz is looking for a Lead Engineer for the BI/Big Data team. </li></ul><ul><li>Go to http://careers.orbitz.com / and search for IRC19035. </li></ul>page
  35. 35. References <ul><li>MAD Skills: New Analysis Practices for Big Data, Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph Hellerstein, and Caleb Welton, 2009 </li></ul>page

×