Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 33 Publicité

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016

Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.

For more information, visit http://casertaconcepts.com/

Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.

For more information, visit http://casertaconcepts.com/

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (15)

Publicité

Similaire à Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016 (20)

Publicité

Plus récents (20)

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016

  1. 1. Joe Caserta President September 30, 2016, Javits Center, New York City Building new Data Ecosystem for Customer Analytics
  2. 2. Caserta Timeline Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Data Analysis, Data Warehousing and Business Intelligence since 1996 Began consulting database programing and data modeling 25+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise magazine Launched Data Science, Data Interaction and Cloud practices Laser focus on extending Data Analytics with Big Data solutions 1986 2004 1996 2009 2001 2013 2012 2014 Dedicated to Data Governance Techniques on Big Data (Innovation) Awarded Top 20 Big Data Companies 2016 Top 20 Most Powerful Big Data consulting firms Launched Big Data Warehousing (BDW) Meetup NYC: 2,000+ Members 2016 Awarded Fastest Growing Big Data Companies 2016 Established best practices for big data ecosystem implementations
  3. 3. About Caserta Concepts – Consulting Data Innovation and Modern Data Engineering – Award-winning company – Internationally recognized work force – Strategy, Architecture, Implementation, Governance – Innovation Partner – Strategic Consulting – Advanced Architecture – Build & Deploy • Leader in Enterprise Data Solutions – Big Data Analytics – Data Warehousing – Business Intelligence Data Science Cloud Computing Data Governance
  4. 4. Awards & Recognition Top 10 Fastest Growing Big Data Companies 2016
  5. 5. Our Partners
  6. 6. Use Case Objectives • Cross-Channel Behavior Tracking / Identity Resolution • Access to Atomic Level Transaction Data • Blend Data Assets • Fast Data Onboarding and Discovery • Ensure Data Quality • Single platform / Landscape simplification • Understand Path-to-Purchase Customer Journey • Improve Customer Experience & Increase Sales
  7. 7. The Recipe: Build a Dynamic Data Platform OLD WAY: • Structure Data Ingest Data Analyze Data • Fixed Capacity • Monolith NEW WAY: • Ingest Data Analyze Data Structure Data • Dynamic Capacity • Ecosystem RECIPE: • Cloud • Data Lake • Holistic Architecture & Framework
  8. 8. Analytics: The Whole Brain Challenge Front Back Analytics Oriented • Data Science • Research Process Oriented • Data Governance • Compliance Operations Oriented • Shared Services • Data Engineering Revenue Oriented • Revenue Goals • Monetizing Data
  9. 9. Chief Data Organization (Oversight) Vertical Business Area [Sales/Finance/Marketing/Operations/Customer Svc] Product Owner SCRUM Master Development Team Business Subject Matter Expertise Data Librarian/Data Stewardship Data Science/ Statistical Skills Data Engineering / Architecture Presentation/ BI Report Development Skills Data Quality Assurance DevOps IT Organization (Oversight) Enterprise Data Architect Solution Engineers Data Integration Practice User Experience Practice QA Practice Operations Practice Advanced Analytics Business Analysts Data Analysts Data Scientists Statisticians Data Engineers Planning Organization Project Managers Data Organization Data Gov Coordinator Data Librarians Data Stewards It Takes a Village!
  10. 10. Unexpected Reaction to Change
  11. 11. Global economics Intensity of competition Reduce costs Move to cross-functional teams New executive leadership Speed of technical change Social trends and changes Period of time in present role Status & perks of office/dept under threat No apparent reasons for proposed changes Lack of understanding of proposed changes Fear of inability to cope with new technology Concern over job security Forces for Change Forces Resisting Change Status Quo Moving the Status Quo http://www.change-management-coach.com/force-field-analysis.html
  12. 12. The Data Lake Paradigm Technology: • Scalable distributed storage S3 • Pluggable fit-for-purpose processing EMR • Consistent extensible framework Spark • Dimensional Data Warehouse Redshift Functional Capabilities: • Remove barriers between data ingestion and analysis • Democratize Data with Just Enough Data Governance
  13. 13. Why AWS?
  14. 14. Why Spark? “Big Box” tools vs ROI? – Prohibitively expensive limited by licensing $$$ – Typically limited to the scalability of a single server We Spark! • Development local or distributed is identical • Beautiful high level API’s • Full universe of Python modules • Open source and Free • Blazing fast! • Databricks cloud makes it easier Spark has become our default processing engine for a myriad of engineering & science problems
  15. 15. Ingest Raw Data Organize, Define, Complete Munging, Blending Machine Learning Data Quality and Monitoring Metadata, ILM , Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Big Data Ware house Data Science Workspace Data Lake – Integrated Sandbox Landing Area – Source Data in “Full Fidelity” Usage Pattern Data Governance Metadata, ILM, Security Data Architecture is the new Data Model
  16. 16. Data Lifecycle Ingest TransformAnalyze Output Understand Problem Ingest Data Explore and Understand Data Clean and Shape Data Evaluate Data Create and build Models Communic ate Results Deliver & Deploy Model Data Engineer Architect how data is organized & ensure operability Data Scientist Deep analytics and modeling for hidden insights Business Analyst Work with data to apply insights to business strategy App Developer Integrates data & insights with existing or new applications
  17. 17. Type Comments Single Touch Rules-Based Statistically Driven Assign the credit to the first or last exposure Assign the credit to each interaction based on business rules Assign the credit to interactions based on data-driven model Ad-Click Mailing MailingE-mail E-mailAd-Click Ad-Click 100% 33% 33% 33% 27% 49% 24% - Last touch only - Ignores bulk of customer journey - Undervalues other interactions and influencers - Subjective - Assigns arbitrary values to each interaction - Lacks analytics rigor to determine weights ü Looks at full behavior patterns ü Consider all touch points ü Can apply different models for best results ü Use data to find correlations between touch points (winning combinations) Path-to-Purchase Methods
  18. 18. Unifying the Customer Across Channels Customer Data Integration (CDI): Match and manage customer information from all available sources Marketing channels: DMP, Salesforce, Adobe, Social, Direct Mail, Call Center, CRM In other words… We need to figure out how to LINK people across systems!
  19. 19. Mastering Master Data is Still MDM Standardize Match Survivorship Validate
  20. 20. Standardization and Matching Cleanse and Parse: • Names • Resolve nicknames • Create deterministic hash, phonetic representation • Addresses • Emails • Phone Numbers Matching: Join based on combinations of cleansed and standardized data to create match results: Spark map operations: • Data cleansing, transformation, and standardization – Address Parsing: usaddress, postal- address, etc – Name Hashing: fuzzy, etc – Genderization: sexmachine, etc
  21. 21. Mastering Unmanageable Source Data Reveal • Wait for the customer to “reveal” themselves • Create link between anonymous self and known profile Vector • May need behavioral statistical profiling • Compare use vectors Rebuild • Recluster all prior activities • Rebuild the Graph
  22. 22. The Matching Process The matching process output gives us the relationships between customers: Great, but it’s not very useable, you need to traverse the dataset to find out 1234 and 1235 are the same person (and this is a trivial case) And we need to cluster and identify our survivors (vertex) xid yid match_type 1234 4849 phone 4849 5499 email 5499 1235 address 4849 7788 cookie 5499 7788 cookie 4849 1234 phone
  23. 23. Graph to the Rescue 1234 4849 5499 7788 We just need to import our edges into a graph and “dump” out communities Don’t think table… think Graph! These matches are actually communities 1235
  24. 24. Connected Components algorithm labels each connected component of the graph with the ID of its lowest-numbered vertex This lowest number vertex can serve as our “survivor” (not field survivorship) Connected Components xid yid 1234 4849 1234 5499 1234 1235 1234 7788 1234 7788 1234 1234
  25. 25. Identity Resolution Process
  26. 26. Data Ecosystem Architecture ODS ETL/ID Res
  27. 27. The Notebook is the ETL Tool
  28. 28. The Notebook is the Data Science Tool
  29. 29. The Redshift DW is still Dimensional
  30. 30. Use Graph for Data Lineage
  31. 31. The Goal: Top Conversion Paths Source: Accelerom AG, Zurich
  32. 32. • Data Lake: S3 / NoSQL or SQL as Needed • Identity Resolution: Spark / Graph Frames • ETL: Spark / Airflow • Path-to-Purchase: S3 / Spark / MLlib • Data Warehouse: RedShift • Shared Interface: Notebooks (Jupyter or Databricks) • Business Intelligence: Tableau Recap
  33. 33. Joe Caserta President, Caserta Concepts joe@casertaconcepts.com @joe_Caserta • Award-winning company • Transformative Data Strategies • Modern Data Engineering • Advanced Architecture • Innovation Partner • Strategic Consulting • Advanced Technical Design • Build & Deploy Solutions • BDW Meetup • New York City • 3,000+ members • Knowledge sharing Data is not important, it’s what you do with it that’s important! Thank You

×