This document provides an overview of open source data warehousing and business intelligence (DW/BI). It defines cloud computing and explains how open DW consists of pre-designed data warehouse architectures that are free to use. Open DW reduces costs and risks by shortening design and development time. While the architectures are free, vendors charge for services like customization, support, and maintenance. The document discusses the need for and benefits of open DW/BI, including faster deployment, lower costs, and mitigated risks through rapid development. It also outlines some popular open source databases, tools, and vendors in this space.
How pig and hadoop fit in data processing architecture
Open Source DWBI-A Primer
1. OPEN SOURCE DATA WAREHOUSE
/BI-A PRIMER
Webinar session for TechGig.com
Presentor –Parthasarathi Doraisamy
Enterprise BIDI Solutions
1
2. CLOUD --WHAT DOES THIS MEAN?
UC Berkeley RAD Lab definition:
1. The illusion of infinite computing resources available on
demand, thereby eliminating the need for Cloud Computing
users to plan far ahead for provisioning
2. The elimination of an up-front commitment by Cloud users,
thereby allowing companies to start small and increase hardware
resources only when there is an increase in their needs; and
3. The ability to pay for use of computing resources on a short term
basis as needed (e.g., processors by the hour and storage
by the day) and release them as needed, thereby rewarding
conservation by letting machines and storage go when they are
no longer useful.
2
4. WHAT IS OPEN DW/BI?
Beware:Open doesn‘t means the product(s) are free!!!!!!!!
Open DW consists of pre designed,prebuilt Data warehouse architecture which
comes free
Thereby it reduces overall cost and risk by reducing design,development and
implementation time
-> Reduces consumer‘s initial development cost(DQ,ETL,BI & Analytics etc.)
But the vendors charge for the related services in maintainig the DW
solution,further customizing to their exact business need ,Support &
maintenance of the system.
Mitigates the risk through Rapid development
There are technical, social, and economic reasons that will move data
warehousing and, perhaps all data models toward ‗open‘ solutions
4
5. NEED FOR OPEN DW/BI
Open data warehouse,BI development
progressed rapidly over the past few years due
to compelling economic downturn
Faster deployment need of the proposed
solution due to dynamic business changes
Now a days we can get‗Open Source‘ product
for almost every aspect of the BI/Data
warehouse stack including architectures which
are picking up pace.(Few noticable players
Talend,Pentaho,Jaspersoft,Birst .Qlikview etc.)
5
6. INDUSTRY STATS ON TRADITIONAL DWBI
The average cost of these projects was $2.2
million ($3.1 million today, adjusted for inflation).
The average payback period was 2.3
years, with over 30% experiencing a 5+ year
payback period.
The majority of respondents reported that their
data warehouses consumed enormous
resources and remained ―works in progress‖ for
extended periods of time.
6
7. NEED FOR OPEN DW/BI ….
Popular open source databases which help
in these Open data warehouse are MySql
(and its eco-system of add-
ons), Ingres, EnterpriseDB.
Hardware,software cost considerations are
further reduced by extending the Open
solution in the hosted SaaS environment.
7
8. ODW MODEL –A FRAMEWORK
Open Data Warehouse Model (ODWM)
provides a generic framework for delivering an
Open data warehouse
This generic data warehouse model can be
further fine tuned to specific industry
Domain experts work upon these specific
industry solutions just like in typical proprietary
DW/BI solutions earlier,but differ in certain
critical aspects like pre-design of Open DWBI
architecture –data model,Etl design,BI design
for the
concerned industry domains
8
9. ODW MODEL PRINCIPLE
The Open Datamodel consists of Hundreds of potential dimension tables
with thousands of fields which forms the ―Foundation‖
These Open data warehouse are carefully designed to ensure stability of
the DW system and easily facilitates the use of commercial ETL
bridges/connectors
(yet allow for interpretation through aggregation and by other means)
OLAP cubes and data marts can be constructed from the foundation as
required by the business through similar bridges/connectors
These are the potential opportunity for Developers in their respective
technology-ie.ETL,BI & Analytics area to come up with appropriate bridge
solutions to seamlessly develop the entire ODW & BI model into a
functional datamart,Enterprise Data warehouse
9
10. ODW MODEL & ITS EXTENSIONS…..
They must allow for integration of multiple data
sources of different granularity ;should in some
manner, accommodate slowly changing dimensions
Each of the baseline ODW Db instance model can
further create a range of domain specific(we can call
it a Industry‘Slice‘) packaged solutions.These
package may comprise of DQ,ETL,BI solution as
outlined earlier.
These package solutions comprises of
Host the domain specific ODW solution(s) in the
cloud .
These hosted Open DWBI solutions leads us to the
packaged Data warehouse/BI Appliances 10
12. OPEN DWBI APPLIANCES ……
The Open DWBI Appliance combines and
supports thousands of data warehouses, many
of those with hundreds of millions of records in a
scalable multi-tenant environment.
These appliances got the capablity to generate
complex datamodels, complex algorithms inbuilt
within their query engine
These appliance vendors tie up with Hardware
suppliers to construct the appliance in such a
way for performing to its maximum efficiency
12
13. OPEN DWBI APPLIANCES ……
These appliances are designed to power an
on-demand software solution that needs to
support a large number of users
simultaneously and has the ability to quickly
increase capacity
Built on a shared-nothing architecture and no
data is shared across nodes (servers).
Popular appliances are
Nettezza,Greenplum..
13
15. DWBI APPLIANCES –SALENT FEATURES
High Availability and Failover Support
Designed for operation in a high-availability clustered Open DWBI
environment
Global Cache
Provides superior query performance via its massive-scale
caching capabilities
Simplified software Deployment and Upgrades in Place
Dramatically simplifies its deployment by freeing IT from having to
worry about resolving potentially complex OS compatibility
issues, library dependencies or undesirable interactions with
other applications.
15
16. DWBI APPLIANCES –SALENT FEATURES….
Advanced ETL Services and a complete
analytical data warehouse with automated
warehouse generation
Cloud Connectors, for connecting to operational
cloud applications- Eg.Salesforce.com,Google
Analytics
These Connecters allow for automatic uploading
of data into the appliance from various sources
Live Access, which allows you to analyze data
from on-premise data
warehouseswithout uploading
16
18. SAAS –OPEN BI SOLUTION…..
Low-cost, open source solution.
End-to-end, integrated BI and ETL
capabilities.
Full enterprise-level support.
Flexibility of on-demand and on-premise
deployment.
Support for mobile devices as a BI platform.
Support for iterative IT and business-user
report generation process.
18
19. CLOUD --WHAT DOES THIS MEAN?
Depends upon how you slice it vertically
• IaaS -AWS, GoGrid, Mosso
• PaaS -Google App Engine, Microsoft Azure
• SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,
Pentaho,BIRT etc.
19
22. ODW -WHEN TO USE THE CLOUD?
Transient application lifespan or use
Quick start required
Budget pressure
Variable use/scale of application unknown
IT unavailable/unresponsive
22
24. KEY FINDINGS FOR BUSINESS TRANSITION TO
CLOUD TECHNOLOGY(IN 2009)
By 2012, at least 50% of direct commercial revenue attributed to
open-source products or services will come from projects under a
single vendor's patronage.
Through 2011, less than 50% of Global 2000 IT organizations will
have implemented a formal open-source adoption and
management policy as part of an enterprise software asset
management strategy.
Through 2013, 50% of mainstream IT projects using open-source
software (OSS) will not achieve cost savings over closed-source
alternatives.
Through 2013, 90% of market-leading, cloud-computing providers
will depend on OSS to deliver products and services.
24
25. MOVING TO CLOUD-RECOMMENDATIONS
Expect vendors to play an increasing role in the governance of
many market-leading, open-source solutions during the next
several years.
Move aggressively to establish an effective enterprise adoption
policy, and bring OSS and hardware under asset management
controls.
Do not expect to automatically save money with OSS or any
technology without effective financial management. Do expect to
carefully manage open-source solutions in the appropriate
scenarios to realize total cost of ownership (TCO) advantages.
Manage cloud-based software strategies and open-source
strategies together for maximum effect. Look for synergies
between both, and the ability of OSS to move your workloads to
the cloud.
25
26. STRATEGIC PLANNING ASSUMPTION(S)
By 2012, at least 50% of direct commercial revenue
attributed to open-source products or services will
come from projects under a single vendor's
patronage.
Through 2011, less than 35% of Global 2000 IT
organizations will have implemented a formal open-
source adoption and management policy.
Through 2013, 50% of mainstream IT projects using
OSS will not achieve cost savings over closed-source
alternatives.
Through 2013, 90% of market-leading, cloud-
computing providers will depend on OSS to deliver
products and services.
26
31. HARDWARE ACCESS IN CLOUD OPEN DW/BI…
Secure access via web,RDC,VPN or combo..
Customized server(Choose ur own
CPU,RAM,Disk space)
Scale up your capacity anytime
Level 2,3 Server support incl 24 * 7
monitoring service
Applicaton support on demand
Integrate with your local & Global IT groups
31
32. SECURITY ASPECTS IN CLOUD OPEN DW/BI…
Web,RDC,VPN or a combo
Firewalls
Certified Data center –SAS 70 type II
NDA
Virus protection
32
33. MDM
MDM success for enterprise open source
DWBI implementation—
High quality master data is extremely
valuable to enterprise business
processes and analytics
33
34. MDM-KEY CONSIDERATIONS
Some key considerations for creating a
master reference data source are outlined
below:
Central master reference data model
Mapping
Populating the master
Publish data
Access and provisioning
Ownership and process
34
35. MDM CHECKLIST
MDM provides the system in obtaining the
―Single version of truth‖ across the various
applications within the enterprise(despite the
disparity of source systems)
The following checklist provides functional
requirements for implementing and deploying
MDM in an enterprise environment :
.
35
37. MDM-ACTIVE DATA MODEL ….
Multi-Domain capability
Object-Oriented Data Modeling
Domain Templates
Basic Data Validations and Business Rules
Graphical Modeling Tool
Multiple Language Support
37
38. MDM-DOMAIN INTEGRATION
Complete Data Integration Functionality
Automated Services-Based Integration
Real-Time and Batch Integration
SOA Manager/Console
38
39. MDM-DQ INTEGRATION WITH ETL,BI
Data Profiling
Accurate Data Match and Merge
Data Bucketing and Blocking
Data Augmentation
Advanced Data Validations and Business Rules
Data Standardization
Data Cleansing
39
40. MDM-DATA STEWARDSHIP & GOVERNANCE
Hierarchy Management – Multiple and Recursive
Hierarchies
Hierarchy Import and Overlays
Business Process Management (BPM) and Workflow
Automated Data Survivorship
Manual Resolution through intuitive GUI interface
40
41. MDM-ADMINSITRATION
Historical Views of Hub Data
Hub Versioning
Master Data Audit Trail Information
Roles-Based Security and Active Directory Integration
Versioning
41
42. TALEND MDM SOLUTION –OS PRODUCTS
IBM Eclipse; JBoss Application Server and Portal;
eXist Open database;
XSD / XML Schema for the XML data models;
XSLT for data transformation;
Object programming following the EJB 2.1 standards
("Enterprise Java Beans") on Jboss server
XQuery for queries on XML database;
Document/literal WSI norm ("Web Service
Interoperability") for web services
Bonita for business process management.
42
43. COST COMPARISION
Eg: Total cost for a small project, comparing the use of 3 approaches to
data integration: opensource, proprietary and manual coding
43
46. ODW /BI --WHY IT WILL SUCCEED IN MARKET
ODW/BI has got lot of winner(financial) groups……..
Owners get low cost rapid entry into a data
warehouses they can extend.
Developers get to create/sell new ETL/BI products in
a new market(Tool providers)
‗Source‘ vendors can solve reporting problems and
advance new ways to compete(Source providers)
Consultants get a bigger market for their services
(Service providers).
Domain exerts can participate by creating new open
data warehouses using their deep industry
knowledge (Service providers).
46
47. ODW /BI --WHY IT WILL SUCCEED IN MARKET
Development licenses
Training curve
Development time
Run-time licenses
Deployment of hardware and operating
system licenses
IT operations
47
48. ODW /BI --WHY IT WILL SUCCEED IN MARKET
Maintenance/subscription
Maintenance time
Reliability and predictability of the data
integration processes
48