2. Big Data Defined
Volume Velocity
• Datasets that grow too large to • Large volume streaming data that
easily manage in traditional RDBMS can overwhelm traditional BI & ETL
• TBs, PBs, ZBs processes
Variety Value
• Data sources extraneous to • Big Data can have a
traditional business systems that transformational effect on business
can be unstructured and require when the proper systems and
text analytics processes are put in place
3. Big Data vs. Classic BI
What is different from classic DW/BI and Big Data Analytics?
Businesses today treat data warehouse & business intelligence as must-have reporting and
operational capability
Businesses that are not fully mature in BI lifecycle may struggle with Big Data
Big Data Projects look for untapped analytics, not BI dashboards
SCALE: Think Volume, Variety and Velocity
Yahoo! Uses Microsoft SQL Server & Analysis Services, with Hadoop, Oracle & Tableau
38,000 machines distributed across 20 different clusters
2-petabyte Hadoop cluster that feeds 1.2 terabytes of raw data each day into Oracle RAC
Data is compressed and 135 gigabytes of data per day is sent to a SQL Server 2008 R2 Analysis
Services cube
Cube produces 24 terabytes of data each quarter
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707
5. Go Beyond Dashboards. Provide Advanced Analytics.
Large number of data
Tableau
points adds new business
value
Big Data advanced
analytics requires tool that Microsoft Power
can sample complex data View
sources
Must provide quick
aggregations of large data
sets that are easily Qlikview
consumed by the human
eye
Must provide “data
discovery” for ad-hoc
analysis
6. Marketing Samples
Enhance marketing
campaigns with Big Data
Social analytics,
customer analytic,
targeted marketing,
brand sentiment
Big Data has proven
transformational for
marketing organizations
(Razorfish, Yahoo!,
NBC, [x+1])
Web Analytics from Google Analytics
7. Anexinet Big Data Offerings
Strategy Engagement
• Customer stakeholder interviews & interactive sessions
• Define Big Data Requirements
• Design Big Data Strategy
• Deliver Strategy & Roadmap Documents
Starter Solution
• Let Anexinet handle the hardest parts of a Big Data solution
* Getting started
* Collecting & processing data
* Uncover business value from Big Data
Big Data Project Engagement
• End-to-end Big Data project
* Big Data Discovery
* Big Data Platform
* Big Data Analytics
* Big Data Visualizations
8. Partnerships
Big Data Platforms Big Data Databases Big Data Visualizations
• EMC Greenplum • HP Vertica • QlikView
• Hortonworks • EMC Greenplum • Tableau
(OSS, MSFT, HP) • Microsoft PDW • Microsoft PowerPivot
• Cloudera • Oracle Exalytics • Microsoft Power View
(OSS, Oracle, HP) • Oracle Big Data
Appliance
9. A Credible Partner to Deploy Big Data Solutions
Security Integration Configuration Governance
• Ensure • ETL / ELT • Configure the • Ensure Data
privacy of PII • Integrate Big Data Quality
Hadoop into environment to • MDM
• Conform Big your DW & maximize • Process
Data solution Analytics throughput, Governance
to your environments performance
enterprise • Integrate Big and analytics to
security Data into your IT meet your
investments stated SLA goals
standards
11. Big Data Buzzword Glossary
Big Data: Think 3 v’s, unstructured data, data that is not currently managed in DW. This is the data that
companies need to do game-changing analytics.
Big Data Analytics: Business insights gained from mining Big Data to transform business processes
Columnar: Column-oriented databases that are used in Big Data scenarios because of their speed and
compression capabilities, i.e. HP Vertica, HBase
Hadoop: Apache open-source framework for Big Data processing. Made up of multiple components. The
leading Big Data platform. Marketed by Couldera & Hortonworks.
In-memory DB: A database that resides fully in memory, eliminating IO bottlenecks. Very important in Big
Data Analytics systems, i.e. Microsoft PowerPivot, SSAS 2012, SAP HANA
MapReduce: Distributed data programming and processing framework. A key aspect of processing Big
Data is using a MapReduce framework across distributed clusters of commodity servers. Available as
open source in the Hadoop framework and in various Hadoop distribution flavors.
MPP: Massively Parallel Processing database engine, mostly used for data warehouse & BI workloads.
I.e. SQL Server PDW, IBM Netezza, Teradata
NoSQL: Key-value data store for quick eventual-ACID schemaless database writes. Big Data systems will
use these to store data coming in from sources that dump large amounts of data quickly, i.e. Cassandra,
MongoDB.