This document discusses the changing landscape of data management as the volume of data grows exponentially. It introduces the concept of "Total Data" which advocates a flexible approach to data management that processes all applicable data across operational databases, data warehouses, Hadoop, and archives. The trends driving more data include greater understanding of data's value, improved processing capabilities, and the rise of machine-generated data. New approaches are needed to virtually access and analyze large datasets at lower costs. RainStor provides a specialized database that can reduce, retain, and retrieve large volumes of historical structured data at 10x lower costs than alternatives.
DevoxxFR 2024 Reproducible Builds with Apache Maven
Smarter Management for Your Data Growth
1. Smarter Management for Your Data Growth Retain Critical Data Online At A Fraction of The Cost April 2011
2. Introductions Changing Data Management Landscape & Trends From Operational to Analytical Cloud and Hadoop Where do They Fit? RainStor and How it Works Analytics Data Retention Use-case Economics Q&A Matt Aslett, The 451 Group Deirdre Mahon, VP Marketing – RainStor Ramon Chen, VP Product Management - RainStor Agenda
4. 451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments. The 451 Group Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research. The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities. TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide. ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
5. Overview The changing data management landscape One overarching trend: Total Data Impacting four technology areas: Operational database Analytic database Data archiving Machine-generated data The trends driving data management 5
6. Trends driving data management The volume, variety and velocity of data has never been greater and is growing The value of data has never been better understood The capabilities for processing data have never been better Higher processor performance and density are enabling advanced processing on commodity hardware Software enhancements designed to make best use of processing performance and scalable architecture Advanced and in-database analytics bring processing to the data, reducing latency and improving efficiency The data deluge problem is also a big data opportunity 6
7. Introducing Total Data A concept define by The 451 Group to describe new approaches to data management – beyond restrictive silos Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques Processing any data that might be applicable to analytics in the operational database, data warehouse, or Hadoop, or archive Structured, semi-structured or unstructured Relational or non-relational, on-premise or in the cloud Inspired by ‘Total Football’ 7
8. Total Football meets Total Data “You make space, you come into space. And if the ball doesn’t come, you leave this space and another player will come into it.” BernadusHulshoff, Ajax 1966-77 Abandonment of restrictive (self-imposed) rules about individual roles and responsibility Enabled and relied on fluidity and flexibility to respond to changing requirements Reliant on, and exploited, improved performance levels 8
13. Infrastructure primarily exists to support the data/application layerEnterprise app Operationaldatabase Data cleansing/sampling/MDM EDW Data archive Infrastructure
17. Polyglot persistence – use the most appropriate data storage for the applicationEnterprise app Reporting/BI Reporting/BI Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database EDW Data archive Infrastructure
20. Data warehouse administrators are fighting a losing battle for controlEnterprise app Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
21.
22. Advanced in-database analytics bring processing to the data, reducing latency and improving efficiencyEnterprise app Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
23.
24. Taking further advantage of hardware economicsEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
25.
26. Greater acceptance that the EDW is part of a broader data analytics architectureEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
27. Data location, data location, data location Not the end of the EDW, but the EDW is one of many sources of BI, rather than the only source of BI The issue of data location becomes paramount Choose the right storage technology – software and hardware EDW, Hadoop or archive On-premise or on the cloud Memory, disk or SSD Understand the requirements: Value and temperature of the data Ensure data can be queried using existing tools/skills Cost 15
28. EDW requirements/characteristics High performance query/analysis response Ability to support multiple users concurrently Capacity for multi-terabyte storage and scale Fast data load and staging for data transformation Ability to operate with BI/analytics tools Security and governance Cost - $20k-$50k per TB Alternatives Do nothing and suffer the consequences Deploy appliances and/or Hadoop for specific use-cases Offload to an online repository 16
31. Previously little need for querying/analyticsEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data archive Infrastructure
32.
33. Focus shifts on to how to enable querying easily and cost effectively
34. Becomes an online repository for historical dataEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Infrastructure
35.
36. “Machine generated data” an untapped source of dataEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Infrastructure
37.
38. Likely to transform into data-generating and data-processing infrastructure as analytics capabilities are applied directly to the data sourceEnterprise app Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase EDW Data repository Datastructure
39.
40.
41. Greater opportunities for business intelligenceEnterprise app Hadoop/DW Data archive Analytic DB Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase Analytic database Analytic database Analyticdatabase EDW Cloud Infrastructure Data repository Datastructure
42. Data location, data location, data location Avoid data movement and duplication – retain governance Virtual data marts and data clouds Data virtualization to provide access to multiple data sources 23
43. Data virtualization 24 Enterprise app Hadoop/DW Data archive Analytic DB Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Analytic database Analytic database Analyticdatabase Analytic database Analytic database Analyticdatabase EDW Cloud Infrastructure Data repository Datastructure
44. Data virtualization 25 Enterprise app Analytic DB Hadoop/DW Data archive Reporting/BI Reporting/BI Reporting Reporting Reporting Reporting Reporting Reporting Reporting Reporting Distributed data Datavirtualization Data cleansing/sampling/MDM Hadoop Operational database Operational database Operational database Operational database Virtualdata mart Virtualdata mart Virtualdata mart Virtualdata mart Virtualdata mart Virtualdata mart EDW Cloud Infrastructure Data repository Datastructure
45. Who is RainStor? Specialized database for cost effective reduction, retention & on-demand retrieval of historical structured data At 10x Less Cost OEM Partner Model Cloud or On-premise
59. ISSBig Data Volumes - Needs to be online & Query-able Found the needle – where’s the haystack? Volumes are rising- Regulated - Infrastructure needs - Reaching Telco-scale Multi- billions of records Strict Compliance RDBMS’s Break Analytics Required 10’s of Petabytes Retained
60. How Does RainStor Do It? Reduce SIZE: Massive de-dupe ~97% savings in storage HARDWARE: On commodity server/disk infrastructure RESOURCES: Without specialist DBA support Retain PRESERVED: Massive record volumes in original form IMMUTABLE: Tamper proofed with audit trail CONFIGURABLE: With retention & expiry policies Retrieve STANDARDS: SQL & BI tools via ODBC/JDBC PERFORMANT: Fast queries for large complex data sets FLEXIBLE: With schema evolution & point-in-time access
64. Fast Queries in stored format without re-inflation.Smith Pharma Peter $40,000 Pharma Smith $40,000 Peter Finance Paul $35,000 Pharma Smith $40,000 Peter Finance Paul Brown $35,000 John
65.
66. Run query on RainStor and import results to data warehouse
70. Add more data sources for broader analysis50 Quarters Source DB e.g. Oracle Analytics/DW 5 Quarters
71. RainStor Cloud 2. Encrypted data stored in private containers ensuring security and easy management. 1. Compressed de-duplicated data sent to the cloud resulting in quicker and cheaper uploads. VM Software Appliance Amazon Send S3 Search EC2 ODBC/JDBC Store 3. Data accessed on demand using standard SQL tools leveraging elasticity of the cloud
73. Quick summary The growing volume, variety and velocity of data is a problem, but it is also an opportunity Requires a broader approach to data management Deploy appliances and Hadoop for specific use-cases, and online repository for historical data ‘Datastructure’ will become increasingly valuable, not only as a source of data but also as a source of intelligence Data location, and the role of data virtualization will come into greater focus 36
De-dupe & ReductionAny storage / PlatformCloud EnabledLimitless Data VolumesFast load – Ingestion RatesSQL Query – High PerformanceImmutable Compliant Store
So if we take a look at Matt’s earlier high level architecture diagram, I think its worth pointing out the key areas RainStor technology can be applied – at the top, we have a RS repository which can be deployed alongside the RDBMS … and can be archived / retired saving by compressing the data to a much smaller footprint. Our INFA partnership focuses on this area predominantly and retires a large number of applications such as Oracle ebusiness suite… On the lower part of the screen – RS can be deployed as the leading repository to store long term historical data for EDW’s and additionally the same data sets can be stored on the cloud…
Security Industry:The combination of the increase in cybercrime, changing regulations, and public exposures is increasing the attention and resources dedicated to data security. Over the next three years it's expected that data security issues (and the related application security) will account for over 60% of new enterprise security spending- this includes spending on new technologies, and excludes maintenance of existing technologies such as firewalls and antivirus, which account for most current security costs.Data and business application security will drive most of the new growth of the security market over the next 3-5 years.Business network traffic for 2010 > 3,800 Pb / month> 2,500 Pb internet traffic > 1,200 Pb WAN traffic > 58 Pb mobile trafficCisco forecasts 20% CAGRData breaches are common - 95% of records stolen externally - 90% involved malware - 70% were uncovered by outsiders - 50% went unnoticed for monthsCSPs: Global mobile data traffic will increase 26-fold between 2010 and 2015. Mobile data traffic will grow at acompound annual growth rate (CAGR) of 92 percent from 2010 to 2015, reaching 6.3 exabytes per month by 2015.Last year’s mobile data traffic was three times the size of the entire global Internet in 2000.