Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Satyam open analytics nyc

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Candor  - open analytics nyc
Candor - open analytics nyc
Chargement dans…3
×

Consultez-les par la suite

1 sur 24 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à Satyam open analytics nyc (20)

Plus par Open Analytics (20)

Publicité

Plus récents (20)

Satyam open analytics nyc

  1. 1. 1BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy BIG DATA ANALYTICS & PITFALLS TO AVOID Dr. Satyam Priyadarshy June 17, 2013 – New York City
  2. 2. 2BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy Agenda  The Big Data buzz word is creating a lot of confusion for companies. One needs to understand Big Data within their context, and the 7V’s of Big Data along with the KARMA score to avoid some of the serious pitfalls in leveraging Big Data. Case Study will be presented in how to drive value out of Big Data, in a meaningful manner
  3. 3. 3BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy BIG DATA Buzz - Should Business Care? Big Data future is bright. Organizations that can effectively leverage Big Data without sinking in the Big Data Hole will realize additional business value, a loyal customer base and increased profits. 2.5 Exa bytes of new data/day generated What we know? A top business priority Big opportunities available Everyone is talking about it But... Emerging technology helps Adds value definitely Definition, Leverage is not clear Big challenges for companies The path to execute is less understood Realization is complex but getting easier Expertise is demand but supply is short
  4. 4. 4BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy BIG DATA - 7 V’s that describe VELOCITY Moving away from batch processing to real-time addition of massive data for near real-time analysis VARIETY Structured and unstructured data - e.g. POS data, Sensor Data, transaction data, call center data, supply chain data, new media data, etc. VERACITY Reliability and predictability of ‘not so’ precise data types. E.g. Sentiment data, Weather data and its impact on business. VOLUME The ever growing data form Terra bytes to Peta bytes to Zetta bytes Big Data definition is evolving. The origin of word dates back to 1990. Typically 4 V’s defined Big Data, but I strongly recommend the 7 V’s that describe Big Data. (Source: chiefknowledgeguru.com) 80% of data generated is unstructured VALUE Unless value is realized, Big Data is a just Big Hole VIRTUAL Data resides in virtual environment - e.g. POS, Private and Public Clouds, Geo- located, inside and outside firewalls VARIATION No single configuration of the 6 V’s below fits everyone. There is variation for each business.
  5. 5. 5BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy KARMA matters Knowledge • Business, Te chnology, P eople Strategy • Big Data Sources, Lif ecycle • Re-invest based on actions Action • Scalable Architecture, In frastructure, To ols & Technology, Res ources • Mining the Big Data with targeted and open mind to find Gold and other items Recognition • Revenue By Sell New Insights • Increase Profit Margins • Add new features to products & services Market • Grow Share • Customer Centricity Advance • Innovate with help of Big analytics • Gather even more Big Data and keep going through this cycle
  6. 6. 6BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy KARMA SCORE is calculated using maturity level of these capabilities • Parallel Processing, API, Query, Reporting • Data Mining, Analytics, Patte rn, Statistics • Machine Learning, Inference predictions • Tools, Technologies, Hu man Resources • Service to support business – Data, Information, Knowled ge, Process • Presentation – Visualization, Mobility, Colla boration, Exploration • Actions – Improve Product/Services, Grow Revenue/Profits, Agility • Collection of Raw Data, Structured &Unstructured, Discovery, Staging • Extract, Load, Transform • Data Connectors, Access, Use, Move • Data Storage: Hadoop, NoSQL, Key-value, MPP, In-memory, blobs, etc. • Policy, Privacy, Security, Met adata, Risk, Total cost of ownership, Access control • Data Lifecycle, Data Assets, SLA, ROI, ROA, Data Quality • Physical Store, Virtual Storage, Encryption, Maskin g, Archive, Disaster Recovery Data Governance and Management Big Data Big Math and Big Analytics Big Value, Big Actions
  7. 7. 7BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy What ever your KARMA Score is? One can leverage Big Data eventually The Great Enabler is OPEN SOURCE Revolution In the last decade or so.
  8. 8. 8BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy In a Zoo In an Open Environment OPEN SOURCE Creates a HAPPY, FLOURISHING Environment
  9. 9. 9BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy Open Source – Key Characteristics FREE (*) NOT CAGED, NOT BLACK BOX MODIFICATIONS ALLOWED MODIFIED VERSIONS REDISTRBUTABLE LIVES IN HARMONY WITH OTHERS
  10. 10. 10BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy Open Source – BIG DATA PLAYERS THESE TOOLS ENABLE YOU TO DIG THE GOLD IN BIG DATA (This is not a comprehensive list of tools/technologies)
  11. 11. 11BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy ACTION for finding the GOLD PROBLEM SOLVING OPERATIONAL STRATEGIC – FUTURISTICBasic Analytics Advanced Analytics Holistic Analytics GO FOR THE GOLD ADDRESSES Current Concerns Reduce Costs Eliminate Issues ADDRESSES GROWTH Customer Centric Easily Incorporate New Data Innovation Related Emerging Trends Adoption BIG DATA, BIG MATH, BIG ANALYTICS Descriptive Statistics Inferential Statistics
  12. 12. 12BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy THAT’S A GOLD MINE
  13. 13. 13BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy WHAT’S IN A GOLD MINE? Gold Suite BASE Suite Iron- Manganese Suite Gold Arsenic Mercury Tungsten Silver Copper Lead Zinc Bismuth Cadmium Molybdenum Silver Iron Manganese Cobalt Nickel Yttrium To GET GOLD ONE HAS TO DIG DEEPER IF YOU FOUND SILVER WHILE DIGGING FOR GOLD WHAT WOULD YOU DO?
  14. 14. 14BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy CASE STUDY – DDoS Attack PROBLEM BIG ANALYTICS THE GOLD KNOWLEDGE ACTIONS RECOGINITION Source of attacks identified • After integrating • Distributed targets • Multiple attack types • Slow performance over binary data sets • A step closer to solution, but requires more work to get it near real-time for actionable insights. • Feedback loop to known datasets to enhance the predictability and performance 45 days later It’s Science not BI DNS Servers are persistently attacked to create DdoS Attacks. Can we predict? CHALLENGES: • 7+ TB / Day • Varied Formats based on Request and type of attacks Hadoop based data storage APPROACH • Hive / MapR queries and R for statistical analysis • Interconnection of data with known “data” sources for identification • Tableau and (Open source DS3.js and Ploticus) for Visualization • Iteratively optimized queries for speed
  15. 15. 15BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy CASE STUDY- DDoS Attack – Pattern Based Study -200 -100 0 100 200 300 400 500 600 700 800 900 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Single Day - Outlier Events - 10K Size :: Zones Hit from Multiple Sources -200 -100 0 100 200 300 400 500 600 700 800 900 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Single Day - Outlier Events - 2K Size :: Zones Hit from Multiple Sources ABC.TLD ABC.TLD SB GOLD.TLD TrafficVolume Unique ZRatio AFTER DIGGING FURTHER Unique ZRatio
  16. 16. 16BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy PITFALLS… Lack of knowledge – Tools, Data Science Too Much Data. Initially most of it was discarded HOW TO OVERCOME  Deploy Hadoop Clusters with cheap storage and store with best possible compression BIG DATA PITFALLS  Expert, Education, Execution Big Data can help MOST BUSINESSES Executives Not Sure Belief Big DATA has all the answers  The Whole Mine is NOT GOLD.. Shows insights and coach  Education, Best Practices and Insights after mining and find useful patterns initially
  17. 17. 17BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy PITFALLS… Silo Culture Multiple copies of ‘same’ data in different formats HOW TO OVERCOME  Keep Raw Data (along with DR site), Transform during Analysis BIG DATA PITFALLS  Devastating for companies. Single Source of Truth Key to Success Big Data can help MOST BUSINESSES Well Established Enterprise Data Warehouse Intuition Based Culture  Can only focus on Gold, if you find Silver and other precious metal, you miss the mark. Show Insights and Move On To Gold  Keep it for Simple, Operational Analytics, Augment with Big Data for Innovation and Future Growth
  18. 18. 18BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy Simple way to see some Big Data Challenges • Data acquisition • Storage • Processing 1st • Data transport & dissemination • Data management & curation • Big Analytics – Tools, Technology, Know-How 2nd • Privacy, Security and Disaster Recovey • Technical/Scientific Talent • Cost of all of the above 3rd
  19. 19. 19BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy KARMA matters Knowledge • Business, Te chnology, P eople Strategy • Big Data Sources, Lif ecycle • Re-invest based on actions Action • Scalable Architecture, In frastructure, To ols & Technology, Res ources • Mining the Big Data with targeted and open mind to find Gold and other items Recognition • Revenue By Sell New Insights • Increase Profit Margins • Add new features to products & services Market • Grow Share • Customer Centricity Advance • Innovate with help of Big analytics • Gather even more Big Data and keep going through this cycle
  20. 20. 20BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy THANK YOU UNDERSTAND YOUR BIG DATA KARMA SCORE AND Understand the Big Picture, THE Direction and LEAD Helps Build Strong Foundation Focus on OUR MOST VALUED CUSTOMES INCREASE PROFITABiLITY
  21. 21. 21BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy Appendix
  22. 22. 22BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy The Pitfalls for Adopting Big Data  The Big Data Definition of 4 V’s – Velocity, Volume, Variety, Veracity is incomplete.  The Belief that Big Data solves everything for Everyone.  Big Data is Abound, but Dimensions of it are to be understood  The Loudest Often Wins (LOW) or the highest paid person’s opinion (HIPPO) prevails  Data Driven approach trumps intuition is a hard nut to crack. Really!!  Data for Data’s Sake  Talent Gap  Data, Data Everywhere  Infighting  Aiming Too High  Reference: Wall Street Journal March 11, 2013 on page R4
  23. 23. 23BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy Time Management (By Frederick Winslow Taylor) Zero Defects Analysis and Pacing of Assemby Line (Ford) Statistical Process Control (Walter Shewhart) Operational Research Popularized (Royal Air Force) Social Network Analysis Business Intelligence Term coined (H. P. Luhn) Artificial Intelligence (John McCarthy) Exploratory Data Analysis - visualization (John Turkey) Business Intellgience Popularized (Gartner) Expert Systems (using AI) The Visual Display of Quantitative Information (Edward Tufte) Data Mining (part of AI) and Web analytics Big Analytics 1890 1920 1950 1980 2010 Brief History of Analytics
  24. 24. 24BIG DATA ANALYTICS & PITFALLS TO AVOID © Dr. Satyam Priyadarshy DEFINITIONS of Analytics for Business  ANALYTICS – Any data-driven process that provides insights  ADVANCED ANALYTICS – Helps understanding cause-effect relationship, prediction of future events, best possible action • BIG ANALYTICS FOR BUSINESS – Relevant for the business, actionable insights for increasing revenue/profit, value measurement and leverages “Big Data”.

×