Publicité
Publicité

Contenu connexe

Similaire à Fast and Furious: From POC to an Enterprise Big Data Stack in 2014(20)

Publicité

Plus de MapR Technologies(20)

Publicité

Fast and Furious: From POC to an Enterprise Big Data Stack in 2014

  1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  2. © 2014 MapR Technologies 2
  3. Bill Peterson @thebillp
  4. What We hope to accomplish today... &
  5. What is driving the momentum for big data?
  6. The Growth of the Digital Universe is Accelerating 2009 .8 ZB 2013 4.5 ZB 2020 40 ZB Source: IDC Digital Universe Study 2013 *Forrester: Forrester Research Inc, Forrsights Business Intelligence Big Data Survey Q3 2012 Only 12%* of an enterprise’s data being used currently 12%
  7. Why do I need to invest in big data initiatives?
  8. © 2013 Forrester Research, Inc. Reproduction Prohibited 8 Source: Forrsights Software Survey, Q4 2013, Base: 2,074 IT executives and technology decision-makers Please rank the following technologies according to their importance and investment within your firm? Thank goodness executives and technology decision-makers are talking about data again.
  9. © 2013 Forrester Research, Inc. Reproduction Prohibited 9 Source: A commissioned study conducted by Forrester Consulting on behalf of Savvis, October 2013 70% of IT decision-makers say that Big Data analytics is a key priority now or in one year. Today 41% In a year 29% In 2 years 22% In 3 years 6% Don't know 2% “Which of the following time frames best completes the statement below?” Big Data analytics is/will be a key priority at our organization...
  10. Where do I locate my big data solutions?
  11. SOCIAL MEDIA DATA EMAILS / DOCUMENTS MACHINE/ SENSOR DATA LOG DATA GEOSPATIAL DATA IMAGES VIDEO AUDIO TRANSACTIONS FREE-FORM TEXT Financial Planning Distribution Sales Marketing Supply Chain Customer Support
  12. How Do I Do It?
  13. The complete big data analytics lifecycle must be addressed Source: A CentturyLink Technology Solutions adaptation from Forrester Research, Inc. The Future of Customer Data Management, March 6, 2013
  14. Data Lake / Data Refinery Risk, Fraud, Compliance Network Monitoring Real-Time Recommendations / Offers Sentiment and Social Graph Analysis Machine Generated Data Analysis Common Big Data Use Cases Marketing Campaign Analysis Customer Churn Analysis Customer Experience Analysis
  15. Break Down Data Silos Create Data Archive Gain Business Insights Data Lake Analytics A Basic Use Case is a Common Starting Point
  16. Monitoring,Management, OrchestrationandProvisioning INFRASTRUCTURE LAYER Compute Storage Network The Enterprise Big Data Model DATA LAYER Data Integration Tools Hadoop Enterprise Data Warehouse DBMS External Data Sources NoSQL INSIGHT LAYER Data Discovery Tools BI Data Science Visual- ization Horizontal / Vertical Analytics Marketing, Sales Execution, and Operations Apps Real-Time Analytics Streaming Applications
  17. Big Data is Ideal for Managed Services • Standard services to reduce costs • Diverse range of use cases, data types • Add-on products and services • Flexible commercial models • Enterprise data integration HadoopHadoop Network ServicesNetwork Services Strategy and Professional ServicesStrategy and Professional Services Infrastructure-as-a-ServiceInfrastructure-as-a-Service Big Data Environment Planning and Implementation Big Data Environment Planning and Implementation Hadoop ManagementHadoop Management Analytics (Custom,OTS)Analytics (Custom,OTS)
  18. Case Study Business Challenge • Manage collection, storage and manipulation of data • Complexity and cost of big data as an in- house solution • Provide more value to customers Benefits • Fast information transfer • Easy-to-use web-based information delivery • Customer can make business decisions based on new data sources • Customer increases efficiencies based on historical data Agricultural Equipment Manufacturing Company
  19. Steve Wooledge @swooledge
  20. © 2014 MapR Technologies 20 Big Data is Overwhelming Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  21. © 2014 MapR Technologies 21 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1 • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions An Enterprise Big Data Stack Must Deliver Enterprise Functionality… 2 Interoperability 1 Reliability and DR 4 Supports transactions and analytics 3 High performance Keys for Production Success
  22. © 2014 MapR Technologies 22 More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud detection means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability Companies With an Enterprise Big Data Stack are Leading Their Industry
  23. © 2014 MapR Technologies 23 There are Lots of Technologies and Confusion
  24. © 2014 MapR Technologies 24 There is Some Agreement on Key Functional Layers The Functional Big Data Stack Analytics Layer Data Science Reporting Business Applications Data Layer Infrastructure Layer Network Disk O/S RDBMS Files NoSQL Compute Operations Layer Web Mobile Storage Layer Distributed Files Systems / NFS / NAS / EXT Security LDAP/PAM/Kerberos
  25. © 2014 MapR Technologies 25 MapR Distribution for Hadoop: Open Data Platform Real-time applications NFS for file-based applications Hadoop APIs for Hadoop applications ODBC & JDBC for SQL-based applications Mission critical and SLA dependent applications Infrastructure Layer Network Disk O/S Compute
  26. © 2014 MapR Technologies 26 MapR Distribution for HadoopManagementManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue * Certification/support planned for 2014
  27. © 2014 MapR Technologies 27 MapR Distribution for HadoopManagementManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue * Certification/support planned for 2014 • High availability • Data protection • Disaster recovery • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • 2X to 7X higher performance • Consistent, low latency
  28. © 2014 MapR Technologies 28 • Automated stateful failover • Automated re-replication • Self-healing • Rolling upgrades • No lost jobs or data • 99999s of uptime Dependable Operations: Lights Out, Data Center Ready • Business continuity with snapshots and mirrors • Recover to a point in time • End-to-end check summing • Strong consistency • Mirror across sites to meet recovery time objectives 2 Dependable StorageReliable Compute
  29. © 2014 MapR Technologies 29 Production Success at Scale 100B AD AUCTIONS per day 20M SONGS 45M SHOPPERS analyzed each month Fortune 100 Retailer 104M CARD MEMBERS Fortune 100 Financial Services 1.2B PEOPLE
  30. © 2014 MapR Technologies 30 HP: Clickstream Analysis HP optimizes customer experience on corporate website • Increase conversion on website through real-time, relevant responses • Improve customer retention through interactive, personalized experiences • Needed to store and analyze 5 years of clickstream generated on hp.com • Required faster response times—queries took days with legacy RDBMS • Complex analytics were impossible because of diverse data formats • MapR manages 5 PB of data on dual 46-node clusters with 20 TB/node • Clickstream data collected in Hadoop, analyzed in HP Vertica, direct query for business metrics • HP chose MapR for performance, high availability, disaster recovery, manageability, knowledge base and future road map OBJECTIVES CHALLENGES SOLUTION • 10% increase in conversion of shoppers to buyers (over industry standard) • 40% increase in efficiency for analysts • Analyst queries that used to take 24 hours to process now take 15 seconds Business Impact
  31. © 2014 MapR Technologies 31 Cisco: Global Security Intelligence Operations (MSSP) Operational and analytical security applications on one platform • To protect customer networks through early-warning intelligence & vulnerability analysis • To better react to evolving security threats in real-time • Collect additional telemetry data from customers' firewalls, intrusion prevention systems • Different analytical teams derived security intelligence in silos and lacked synergy • Inability to scale with existing infrastructure to a million events per second from nearly 100 different channels over tens of thousands of distributed sensors OBJECTIVES CHALLENGES SOLUTION Business Impact • All analytic teams leverage a common platform leading to operational efficiencies • Capability to scale - aggregating and analyzing millions of data points in real time • Update customer networks with new threat footprints within a 2 to 5 minute window • MapR M7: Central hub for all of the security analytics teams • Stream, interactive, graph and batch processing on MapR with the flexibility to perform closed-loop analytics across these functions in real time • Key Features: Scale, enterprise-grade, operational efficiency and high performance
  32. © 2014 MapR Technologies 32 Cisco SIO Hadoop Stack SENSOR DATA FIREWALL LOGS INTRUSION PROTECTION SYSTEM LOGS Globally Dispersed Datacenters SECURITY APPLIANCE LOGS SQL Queries and Reporting Batch Processing Graph Processing New Threat Footprint within 2-5 min Closed-Loop Operations Benefits: Unified Platform for Analytics Low Operational Costs Faster Response Times Better Algorithms MapR Distribution for Hadoop 1 million events/sec. Over 100 channels Spark Streaming for known threats & aggregation Mahout, MLLib Shark, Impala GraphX & TitanDB
  33. © 2014 MapR Technologies 33 Committed to our Customers’ Success Educational Services Professional Services Customer Support Core Hadoop Services Data Engineering Advanced Analytics M7/HBase Practice Hadoop engineering experts provide 24x7x365 global coverage Instructor-led courses & Web-based training for Hadoop cluster administration, HBase & MapReduce programming and more Data Engineering Data Science
  34. © 2014 MapR Technologies 34 Getting Started: MapR Sandbox for Hadoop  Complete MapR distribution for Hadoop  Free download  Most advanced distribution  Tutorials and advanced user interfaces  MapR Control System (MCS) for administrators  Hadoop User Experience (HUE) for developers  Point-and-click tutorials  Fully configured in virtual machines  Supports VMware and VirtualBox  Drag-and-drop data movement The Fastest On-Ramp to Hadoop
  35. © 2014 MapR Technologies 35 Call To Action • Read our joint blog: – http://www.mapr.com/blog/mapr-centurylink-technology-solutions-partnering-a • Check out our video: – https://www.youtube.com/watch?v=BQMkLtOWqC4 • Try before you buy: – http://www.mapr.com/products/mapr-sandbox-hadoop • Have us come in for a chat – Contact Bill or Steve
  36. questions? YOU’VE GOT answers WE’VE GOTlong rambling responses that sound like @thebillp or william.peterson@savvis.com @swooledge or swooledge@maprtech.com

Notes de l'éditeur

  1. Thank goodness people, including CxOs, are talking about data again.
  2. Base: 51 North American enterprise IT decision-makers at firms that have adopted or are currently conducting a proof of concept of Big Data. Source: A commissioned study conducted by Forrester Consulting on behalf of Savvis, October 2013
  3. A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text) People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left) You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision making The amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
  4. The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system. Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, but For Hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and more Most importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
  5. The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to compete Google pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the market Yahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIN and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and more What makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example). In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
  6. The infrastructure layer is often overlooked, but is critical to supporting any successful big data stack The storage layer is key to the functionality and performance of the data layer The data layer brings together different technologies and data sources – but integration remains a pain point The end user interacts with the analytics layer to analyze the data and extract business insights Perfect time to tee up MapR as differentiator for operations Hadoop doesn’t fit neatly in one layer of the stack Hadoop is emerging as its own technology stack, that spans analytics, data and storage MapR Hadoop has the broadest span of any distribution Lowest level support in storage, managing disk spindles directly to optimize speed Most comprehensive support for open source projects for analytics or data management Differentiated M7 tables functionality that improves the latency and stability of HBASE
  7. MapR’s innovations have also expanded the use cases that are possible with Hadoop. Not only do we support the full Hadoop API set. MapR provides support for NFS so any file-based application can access the cluster with no changes or rewrites required. MapR provides ODBC support, so any database application or SQL-based tool can access and manipulate data in a MapR cluster. MapR supports real-time streaming access. This greatly expands the applications that are possible with Hadoop moving beyond a batch limitation. Finally, the full HA, DR and data protection capabilities of MapR allow mission critical apps to be deployed safely and allows administrators to meet stringent SLA targets.
  8. The power of MapR begins with the power of open source innovation and community participation. In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop) In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management. MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
  9. The power of MapR begins with the power of open source innovation and community participation. In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop) In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management. MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
  10. With MapR Hadoop is Lights out Data Center Ready MapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported. All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.
  11. HP.com has a case study dedicated to clickstream analytics. It talks about “Apache Hadoop”, which is actually MapR http://www.vertica.com/wp-content/uploads/2013/02/HP_BigData_casestudy.pdf Objectives: - How to make HP.com better and more sticky, to improve cross-sell and upsell. - Improved ability to identify and correct issues with website hardware or software, which reduces risks of degraded customer  experience and lost sales Improved ability to deliver interactive, personalized website experience, which improves sales conversions and drives sales and revenue “We capture 11 to 12 billion clicks per month,” Lormand says. To fully support trending and comparative analysis, HP must store around five years’ worth of clickstream data; analysts typically want to work with about 15 months’ worth at a time to perform year over year trend analysis. This allows the analysts to account for seasonality and show correlation to previous year’s traffic. Now HP is better equipped to improve its website functionality and architecture. It can more easily correlate events across its server farms, for example, which will allow it to identify and isolate anomalies that will yield insights Into how website functionality is affecting user interactions. “Our HP Vertica solution gives us a true, end-to-end picture of our environment,” says Lormand. “And because it gives us faster results, we can respond to issues more quickly.” HP will be able to better tailor its website interactivity to the needs of individual visitors, delivering a more precise and granular shopping experience. In the past, for example, the site guided visitors to information on the basis of broad categories. If the visitor seemed to fit the profile of a typical retail customer, that visitor would be guided to one set of solutions. Visitors fitting the profile of a home office user would be led to a different subset of products. But some visitors don’t always fit neatly into these categories. Now, thanks to the insight gained via the HP Vertica solution, HP can build website functionality that ensures the site responds appropriately to all kinds of visitors. And this, in turn, will enhance visitor satisfaction and improve sales conversion rates. FULLMAPR HP CASE STUDY at http://www.mapr.com/sites/default/files/mapr_case_study_hp_4.pdf HP leverages MapR as a low-cost, massive storage platform to integrate, consolidate, and analyze data from multiple sources. Its “data lake” has enabled the development of new solutions and client offerings, which are helping to improve the overall HP customer experience across all touch-points. After a comprehensive evaluation of Hadoop vendors, HP selected MapR as the clear choice for its performance, high availability, disaster recovery, manageability, and scalability.
  12. http://www.datanami.com/datanami/2014-02-21/a_peek_inside_cisco_s_hadoop_security_machine.html
  13. 20 TB per day
  14. At MapR, our main focus is not to sell you services, but to make your Hadoop and big data projects successful. This is before, during, and after you go into production Before: Education services provide instructor-led courses in a variety of formats. We have 3-day training on Hadoop development and administration, but what is DIFFERENT is we have web-based training, as well. Unlike other companies that want to make money on services and continuous education, we are primarily a product company. With WBT, you can get more people up to speed on Hadoop at their own pace, without high travel expenses. During: We also have both data science and data engineering teams to help with any phase of your project. Use case discovery, Implementation, Data Migration, data modeling, machine learning, HBase Schema design, Application Analysis, and performance tuning. Again, our focus is not to build and manage your cluster, but do knowledge transfer, help with the heavy lifting, and make you self-sustaining After: Support - MapR is focused on raising the bar for product and support capability in the world of Apache Hadoop. MapR delivers highly available resources to assist in all aspects of product deployment and usage. We created both a breakthrough product and support team to deliver the high level of mission critical support you expect. MapR's support team offers: 24x 7 community, phone and email support options – staffing in San Jose, India, and Japan. On-demand patches and proactive update notification Online incident submission and response License Management On-site installation and training Local language support
Publicité