1. Presented by
EDW Technology & Process Recommendation
Over the last few years, organizations across public and private sectors have made a
strategic decision to turn big data into competitive advantage. The challenge of
extracting value from big data is similar in many ways to the age-old problem of
distilling business intelligence from transactional data. At the heart of this challenge
is the process used to extract data from multiple sources, transform it to fit your
analytical needs, and load it into a Enterprise Data Warehouse for subsequent
analysis, a process known as “Extract, Transform & Load” (ETL) for which Smartmonk
is recommending Apache hadoop echo-system.
Big Data analytics and the Apache Hadoop open Source
project are rapidly emerging as the preferred solution to
address business and technology trends that are
disrupting traditional data management and processing
Enterprises can gain a competitive advantage by
being early adopters of big data analytics.
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
2. Big Data is Different than Business Intelligence
"TRADITIONAL BI"
Experimental, Ad Hoc
Mostly Semi-Structured
External + Operational
10s of TB to 100 of PB's
Repetitive
Structures
Operational
GBs to 10s of TBs
Presented by
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
3. Questions from Business will Vary
Presented by
Past Future
What
happened?
What is
happening
What is likely to
happen?
Reporting,
Dashboards
Forensics & Data
Mining
Real-Time
Analytics
Real-Time
Data Mining
Predictive
Analytics
Prescriptive
Analytics
Why did it
happen?
Why is it
happening?
What should I do
about it?
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
4. Presented by
Hadoop Adoption in the industry
2007 2008
Presented by
Hadoop Adoption in the industry
2007 2008 2009 2010
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
5. Presented by
Traditional EDW Architecture
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
6. Presented by
Proposed Hadoop Architecture
LOGICAL ARCHITECTURE
Processing: MapReduce
Storage: HDFS
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
7. Presented by
PROCESS FLOW
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
Proposed Hadoop Architecture
8. Presented by
PHYSICAL ARCHITECTURE
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
Proposed Hadoop Architecture
9. Presented by
Traditional ETL Architecture
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
11. Presented by
MapReduce Provides
• Automatic parallelization and distribution
• Fault Tolerance
• Status and Monitoring Tools
• A clean Abstraction for Programmers
• Google Technology Roundtable : MapReduce
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
12. Presented by
Hadoop Vs RDBMS
Hadoop RDBMS
Open Source Mostly propriety
Eco System Suite of java based(mostly) projects, A
framework
One project with multiple components
Designed to support distributed architecture Designed with idea of server client Architecture
Designed to run on commodity hardware High usage would expect High end server
Cost efficient Costly
High fault tolerance Legacy procedure
Based on distributed file system like GFS, HDFS.. Rely on OS file system
Very good support of unstructured data Needs structured data
Flexible, evolvable and fast Needs to follow defined constraints
Still evolving Has lots of very good products like oracle ,sql.
Suitable for Batch processing Real time Read/Write
Sequential write Arbitrary insert and update
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co
13. Presented by
Comparing RDMS and MapReduce
Traditional RDBMS MapReduce
Data Size Gigabytes (terabytes) Petabytes(Exabyte's)
Access Interactive and Batch Batch
Updates Read /write many times Read /write many times
Structure Static schema Dynamic schema
Integrity high (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Contact Name: B N Reddy | eMail: bnreddy@smartmonk.co | mobile: 0091-9160000748 | Website: www.smartmonk.co