Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

The Role of Data Wrangling in Driving Hadoop Adoption

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 43 Publicité

The Role of Data Wrangling in Driving Hadoop Adoption

Télécharger pour lire hors ligne

The Briefing Room with Mark Madsen and Trifacta
Live Webcast September 1, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=eb655874d04ba7d560be87a9d906dd2fd

Like all enterprise software solutions, Hadoop must deliver business value in order to be a success. Much of the innovation around the big data industry these days therefore addresses usability. While there will always be a technical side to the Hadoop equation, the need for user-friendly tools to manage the data will continue to focus on business users. That’s why self-service data preparation or "data wrangling" is a serious and growing trend, one which promises to move Hadoop beyond the early adopter phase and more into the mainstream of business.

Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain why business users will play an increasingly important role in the evolution of big data. He’ll be briefed by Trifacta's Will Davis and Alon Bartur, who will demonstrate how Trifacta's solution empowers business users to “wrangle" data of all shapes and sizes faster and easier than ever before. They’ll discuss why a new approach to accessing and preparing diverse data is required and how it can accelerate and broaden the use of big data within organizations.

Visit InsideAnalysis.com for more information.

The Briefing Room with Mark Madsen and Trifacta
Live Webcast September 1, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=eb655874d04ba7d560be87a9d906dd2fd

Like all enterprise software solutions, Hadoop must deliver business value in order to be a success. Much of the innovation around the big data industry these days therefore addresses usability. While there will always be a technical side to the Hadoop equation, the need for user-friendly tools to manage the data will continue to focus on business users. That’s why self-service data preparation or "data wrangling" is a serious and growing trend, one which promises to move Hadoop beyond the early adopter phase and more into the mainstream of business.

Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain why business users will play an increasingly important role in the evolution of big data. He’ll be briefed by Trifacta's Will Davis and Alon Bartur, who will demonstrate how Trifacta's solution empowers business users to “wrangle" data of all shapes and sizes faster and easier than ever before. They’ll discuss why a new approach to accessing and preparing diverse data is required and how it can accelerate and broaden the use of big data within organizations.

Visit InsideAnalysis.com for more information.

Publicité
Publicité

Plus De Contenu Connexe

Publicité

Similaire à The Role of Data Wrangling in Driving Hadoop Adoption (20)

Plus par Inside Analysis (20)

Publicité

Plus récents (20)

The Role of Data Wrangling in Driving Hadoop Adoption

  1. 1. Grab some coffee and enjoy the pre-­show banter before the top of the hour!
  2. 2. The Briefing Room The Role of Data Wrangling in Driving Hadoop Adoption
  3. 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  4. 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  5. 5. Twitter Tag: #briefr The Briefing Room Topics September: HADOOP 2.0 October: DATA MANAGEMENT November: ANALYTICS
  6. 6. Twitter Tag: #briefr The Briefing Room The Great Divide Ø Close the Gap Ø Empower Business Users Ø Shift Focus of IT Ø Developers are Third Leg
  7. 7. Twitter Tag: #briefr The Briefing Room Analyst: Mark Madsen Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor to Forbes Online and on the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit http:// ThirdNature.net
  8. 8. Twitter Tag: #briefr The Briefing Room Trifacta Trifacta offers a platform for data transformation and preparation   The interface is rich in visualization and provides a productive data wrangling capability   The platform also includes access to raw data in Hadoop, providing analysts and data scientists with secure, governed data
  9. 9. Twitter Tag: #briefr The Briefing Room Guests: Will Davis Director of Product Marketing, Trifacta Alon Bartur Principal Product Manager, Trifacta
  10. 10. Trifacta: The Role of Data Wrangling In Driving Hadoop Adoption
  11. 11. Variety = Data is Messy
  12. 12. When Data is Messy… Analysis is More Complicated Question Analysis Insight
  13. 13. Messy Data Requires Data Wrangling Question Analyze InsightDiscover Structure Clean Enrich Distill Data Wrangling
  14. 14. The Bottleneck DATA PRODUCT Simplicity DATA SOURCE Complexity
  15. 15. The Bottleneck on Hadoop Ingestion Storage Processing IT ANALYSIS & CONSUMPTION LOBBusiness System Data Machine Generated Data Third Party Data Java Python R Pig etc…How do you move from here? To here? 80% of the work in any data project is preparing the data for analysis
  16. 16. Breakdown of Communication Between IT & LOB LOB IT How can I access the data in Hadoop? What do you want to analyze? I can’t tell you until I see the data – let me see the data first. I can’t just point you to the raw data – you’ll need to tell me.
  17. 17. Conventional Approaches Inhibit User Empowerment Hand-Coding Technical Workflow Mapping
  18. 18. Bringing Hadoop to an Analyst’s Fingertips ““ JOHN, DATA ANALYST I want direct access to the raw data so I can actually see the content of different datasets to define my analytic requirements. Wrangle Data Using This?
  19. 19. 10 Empowering Analysts Requires a New User Experience
  20. 20. It’s All About The Experience Interact Predict Preview
  21. 21. 12 Demo
  22. 22. Analyst Workflow on Hadoop 13 Register Hadoop Data Sets in Trifacta 1. HDFS Visualize, Interact & Define Tr ansformation Script 2. HDFS Execute Script on Entirety of Dat a Set at Scale in Hadoop 3. HDFS Execution in Pig or Spark Analytic ToolsAnalytic Tools Select Transformation Output Format & Location 4. Analytic ToolsHadoop HDFS Parquet or Avro Table in HCatalog Tableau R Etc…
  23. 23. QUESTIONS?
  24. 24. SIGN UP FOR A FREE TRIAL AT TRIFACTA.COM/TRIAL THANK YOU!
  25. 25. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Mark Madsen
  26. 26. © Third Nature Inc. Analyst  comments  and  ques0ons  
  27. 27. Copyright  Third  Nature,  Inc.   Ideas  about  how  we  make  data  available  are  changing   Making  data  available  is  not  the  same  as  enabling  its  use  
  28. 28. Copyright  Third  Nature,  Inc.   From  scarcity  to  abundance   All  the  data   Common,  typed,  tabular  data   The  bo9leneck  is  us  
  29. 29. Copyright  Third  Nature,  Inc.   The  old  problem  was  access,  the  new  problem  is  analysis  
  30. 30. © Third Nature Inc. Changed  design  assump=on:  analysis  isn’t  read-­‐only   The  results  of  analysis   can,  o=en  do,  feed  back   into  the  system  from   which  they  originate.     Much  of  the  data  is  being   read,  wri9en  and   processed  in  real  @me.     Our  design  point  in  IT   was  not  changing  tables   and  ephemeral  pa9erns.  
  31. 31. Copyright  Third  Nature,  Inc.   Schema In  a  repor=ng  world  data  and  processing  are  bounded   No consideration for feedback loops and change Processing only happens here Carefully controlled SQL only access Nobodycreates newinformation Sources few and well understood Complex DI is controlled by IT Schemas are few and designed Tools are authorized, few in number and kind One way flow
  32. 32. Copyright  Third  Nature,  Inc.   In  an  analysis  world  flow  is  unbounded  and  con=nuous   Feedback loops allowed End-of-analysis dataset may be start of a BI dataset Continuous data integration and delivery Files are back as both input and storage Minimal barrier of / control on collection Areas of provisioned data Any shape in, rectangles out
  33. 33. Copyright  Third  Nature,  Inc.   The  model  and  reality  of  ETL:  one-­‐way  pipes   DI BI Our methods tell us that data integration and analysis are separate, and schema comes first as the point of synchronization between them. Schema
  34. 34. Copyright  Third  Nature,  Inc.   Schema Data  isn’t  just  source  or  target,  it’s  a  con=nuum   Unusable data that needs engineering: ETL Data that can be used : BI Fuzzy areas of data that need engineering and / or composing: exploration, blending & discovery
  35. 35. Copyright  Third  Nature,  Inc.   Food  supply  chain:  an  analogy  for  data   Mul@ple  contexts  of  use,  differing  quality  levels  
  36. 36. Copyright  Third  Nature,  Inc.   Tools  were  designed  with  data  model  assump=ons   Sourcedata,modelcomplexity SimpleComplex Target data model complexity Simple Complex Blending Selectively linking and changing data, producing a simpler data model as output ETL Multiple complex source models, large complex target model Application integration Basic movement of data from one place to another, minimal changes to data Processing & Analytics Deriving new data from a relatively simple dataset (like an event stream)
  37. 37. Copyright  Third  Nature,  Inc.   Some  ques=ons  to  start  discussion   1.  Who  is  this  product  aimed  at:  end  users,  analysts    or  the   people  who  get  and  manage  data  for  others?   2.  Can  you  get  data  from  places  other  than  Hadoop?   3.  How  do  you  deal  with  WYSIWYG  data  prepara@on  when  the   dataset  is  very  large?   4.  How  well  does  it  handle  small  datasets?   5.  How  do  you  take  something  from  one-­‐@me-­‐process  to  a   repeatably  executed  process  in  a  produc@on  environment?   6.  What  analysis  tool  integra@on  is  available?   7.  What    maintenance  features  are  available?  
  38. 38. Copyright  Third  Nature,  Inc.   CC  Image  AIribu=ons   Thanks  to  the  people  who  supplied  the  crea@ve  commons  licensed  images  used  in  this  presenta@on:     Tokyo    forum  -­‐  h9p://flickr.com/photos/fukagawa/2004106475/   klein_bo9le_red.jpg  -­‐  h9p://flickr.com/photos/sveinhal/2081201200/   donuts_4_views.jpg  -­‐  h9p://www.flickr.com/photos/le_hibou/76718773/                                    
  39. 39. Copyright  Third  Nature,  Inc.   About  the  Presenter   Mark  Madsen  is  president  of  Third   Nature,  a  technology  research  and   consul@ng  firm  focused  on  business   intelligence,  data  integra@on  and  data   management.  Mark  is  an  award-­‐winning   author,  architect  and  CTO  whose  work   has  been  featured  in  numerous  industry   publica@ons.  Over  the  past  ten  years   Mark  received  awards  for  his  work  from   the  American  Produc@vity  &  Quality   Center,  TDWI,  and  the  Smithsonian   Ins@tute.  He  is  an  interna@onal  speaker,   a  contributor  to  Forbes  Online  and  on   the  O’Reilly  Strata  program  commi9ee.   For  more  informa@on  or  to  contact   Mark,  follow  @markmadsen  on  Twi9er   or  visit    h9p://ThirdNature.net    
  40. 40. Copyright  Third  Nature,  Inc.   About  Third  Nature   Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, information strategy and data management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place. Our goal is to help organizations solve problems using data. We offer education, consulting and research services to support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.
  41. 41. Twitter Tag: #briefr The Briefing Room
  42. 42. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com September: HADOOP 2.0 October: DATA MANAGEMENT November: ANALYTICS
  43. 43. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons and "Grand Canyon view from Pima Point 2010" by Chensiyuan - Own work. Licensed under GFDL via Commons - https://commons.wikimedia.org/wiki/File:Grand_Canyon_view_from_Pima_Point_2010.jpg#/media/ File:Grand_Canyon_view_from_Pima_Point_2010.jpg

×