SlideShare a Scribd company logo
1 of 50
OPEN SOURCE DATA WAREHOUSE
        /BI-A PRIMER


 Webinar session for TechGig.com
 Presentor –Parthasarathi Doraisamy
              Enterprise BIDI Solutions




                                          1
CLOUD --WHAT DOES THIS MEAN?
UC Berkeley RAD Lab definition:

1. The illusion of infinite computing resources available on
   demand, thereby eliminating the need for Cloud Computing
   users to plan far ahead for provisioning

2. The elimination of an up-front commitment by Cloud users,
   thereby allowing companies to start small and increase hardware
   resources only when there is an increase in their needs; and

3. The ability to pay for use of computing resources on a short term
   basis as needed (e.g., processors by the hour and storage
   by the day) and release them as needed, thereby rewarding
   conservation by letting machines and storage go when they are
   no longer useful.



                                                                       2
REFERENCES/ACKNOWLEDGEMENT
 Talend
 Pentaho
 Birt-eclipse
 Birst
 Jaspersoft
 Greenplum
 ASA –ODW model
 Gartner research analysis
 TDWI



                              3
WHAT IS OPEN DW/BI?
   Beware:Open doesn‘t means the product(s) are free!!!!!!!!

   Open DW consists of pre designed,prebuilt Data warehouse architecture which
    comes free

   Thereby it reduces overall cost and risk by reducing design,development and
    implementation time

    -> Reduces consumer‘s initial development cost(DQ,ETL,BI & Analytics etc.)

    But the vendors charge for the related services in maintainig the DW
     solution,further customizing to their exact business need ,Support &
     maintenance of the system.

   Mitigates the risk through Rapid development

   There are technical, social, and economic reasons that will move data
    warehousing and, perhaps all data models toward ‗open‘ solutions

                                                                                  4
NEED FOR OPEN DW/BI
 Open data warehouse,BI development
  progressed rapidly over the past few years due
  to compelling economic downturn
 Faster deployment need of the proposed
  solution due to dynamic business changes
 Now a days we can get‗Open Source‘ product
  for almost every aspect of the BI/Data
  warehouse stack including architectures which
  are picking up pace.(Few noticable players
  Talend,Pentaho,Jaspersoft,Birst .Qlikview etc.)

                                                    5
INDUSTRY STATS ON TRADITIONAL DWBI

 The average cost of these projects was $2.2
  million ($3.1 million today, adjusted for inflation).
 The average payback period was 2.3
  years, with over 30% experiencing a 5+ year
  payback period.
 The majority of respondents reported that their
  data warehouses consumed enormous
  resources and remained ―works in progress‖ for
  extended periods of time.

                                                      6
NEED FOR OPEN DW/BI ….

 Popular open source databases which help
  in these Open data warehouse are MySql
  (and its eco-system of add-
  ons), Ingres, EnterpriseDB.
 Hardware,software cost considerations are
  further reduced by extending the Open
  solution in the hosted SaaS environment.



                                              7
ODW MODEL –A FRAMEWORK
 Open Data Warehouse Model (ODWM)
  provides a generic framework for delivering an
  Open data warehouse
 This generic data warehouse model can be
  further fine tuned to specific industry
 Domain experts work upon these specific
  industry solutions just like in typical proprietary
  DW/BI solutions earlier,but differ in certain
  critical aspects like pre-design of Open DWBI
  architecture –data model,Etl design,BI design
  for the
   concerned industry domains
                                                        8
ODW MODEL PRINCIPLE
   The Open Datamodel consists of Hundreds of potential dimension tables
    with thousands of fields which forms the ―Foundation‖

    These Open data warehouse are carefully designed to ensure stability of
    the DW system and easily facilitates the use of commercial ETL
    bridges/connectors

     (yet allow for interpretation through aggregation and by other means)

   OLAP cubes and data marts can be constructed from the foundation as
    required by the business through similar bridges/connectors

   These are the potential opportunity for Developers in their respective
    technology-ie.ETL,BI & Analytics area to come up with appropriate bridge
    solutions to seamlessly develop the entire ODW & BI model into a
    functional datamart,Enterprise Data warehouse


                                                                               9
ODW MODEL & ITS EXTENSIONS…..
   They must allow for integration of multiple data
    sources of different granularity ;should in some
    manner, accommodate slowly changing dimensions
   Each of the baseline ODW Db instance model can
    further create a range of domain specific(we can call
    it a Industry‘Slice‘) packaged solutions.These
    package may comprise of DQ,ETL,BI solution as
    outlined earlier.
   These package solutions comprises of
   Host the domain specific ODW solution(s) in the
    cloud .
   These hosted Open DWBI solutions leads us to the
    packaged Data warehouse/BI Appliances                 10
OPEN DATAWAREHOUSE/BI APPLIANCE




                                  11
OPEN DWBI APPLIANCES ……
 The Open DWBI Appliance combines and
  supports thousands of data warehouses, many
  of those with hundreds of millions of records in a
  scalable multi-tenant environment.
 These appliances got the capablity to generate
  complex datamodels, complex algorithms inbuilt
  within their query engine
 These appliance vendors tie up with Hardware
  suppliers to construct the appliance in such a
  way for performing to its maximum efficiency

                                                   12
OPEN DWBI APPLIANCES ……

 These appliances are designed to power an
  on-demand software solution that needs to
  support a large number of users
  simultaneously and has the ability to quickly
  increase capacity
 Built on a shared-nothing architecture and no
  data is shared across nodes (servers).
 Popular appliances are
  Nettezza,Greenplum..
                                              13
MULTIPLE APPLIANCES FOR ENTERPRISE NEED




                                          14
DWBI APPLIANCES –SALENT FEATURES
High Availability and Failover Support
 Designed for operation in a high-availability clustered Open DWBI
  environment
Global Cache
 Provides superior query performance via its massive-scale
  caching capabilities

Simplified software Deployment and Upgrades in Place

  Dramatically simplifies its deployment by freeing IT from having to
  worry about resolving potentially complex OS compatibility
  issues, library dependencies or undesirable interactions with
  other applications.



                                                                    15
DWBI APPLIANCES –SALENT FEATURES….
 Advanced ETL Services and a complete
  analytical data warehouse with automated
  warehouse generation
 Cloud Connectors, for connecting to operational
  cloud applications- Eg.Salesforce.com,Google
  Analytics
 These Connecters allow for automatic uploading
  of data into the appliance from various sources
 Live Access, which allows you to analyze data
  from on-premise data
  warehouseswithout uploading
                                                16
SAAS BASED OPEN BI SOLUTION




                              17
SAAS –OPEN BI SOLUTION…..

 Low-cost, open source solution.
 End-to-end, integrated BI and ETL
  capabilities.
 Full enterprise-level support.

 Flexibility of on-demand and on-premise
  deployment.
 Support for mobile devices as a BI platform.

 Support for iterative IT and business-user
  report generation process.
                                                 18
CLOUD --WHAT DOES THIS MEAN?

  Depends upon how you slice it vertically
• IaaS -AWS, GoGrid, Mosso
• PaaS -Google App Engine, Microsoft Azure
• SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,
      Pentaho,BIRT etc.




                                               19
AGILE BI-ASTER,CHEAPER,BETTER….




                                  20
CLOUD --WHAT DOES THIS MEAN?




                               21
ODW -WHEN TO USE THE CLOUD?

 Transient application lifespan or use
 Quick start required

 Budget pressure

 Variable use/scale of application unknown

 IT unavailable/unresponsive




                                              22
SAAS –OPEN DWBI




                  23
KEY FINDINGS FOR BUSINESS TRANSITION TO
CLOUD TECHNOLOGY(IN 2009)

   By 2012, at least 50% of direct commercial revenue attributed to
    open-source products or services will come from projects under a
    single vendor's patronage.
   Through 2011, less than 50% of Global 2000 IT organizations will
    have implemented a formal open-source adoption and
    management policy as part of an enterprise software asset
    management strategy.
   Through 2013, 50% of mainstream IT projects using open-source
    software (OSS) will not achieve cost savings over closed-source
    alternatives.
   Through 2013, 90% of market-leading, cloud-computing providers
    will depend on OSS to deliver products and services.



                                                                  24
MOVING TO CLOUD-RECOMMENDATIONS

   Expect vendors to play an increasing role in the governance of
    many market-leading, open-source solutions during the next
    several years.
   Move aggressively to establish an effective enterprise adoption
    policy, and bring OSS and hardware under asset management
    controls.
   Do not expect to automatically save money with OSS or any
    technology without effective financial management. Do expect to
    carefully manage open-source solutions in the appropriate
    scenarios to realize total cost of ownership (TCO) advantages.
   Manage cloud-based software strategies and open-source
    strategies together for maximum effect. Look for synergies
    between both, and the ability of OSS to move your workloads to
    the cloud.



                                                                      25
STRATEGIC PLANNING ASSUMPTION(S)
   By 2012, at least 50% of direct commercial revenue
    attributed to open-source products or services will
    come from projects under a single vendor's
    patronage.
   Through 2011, less than 35% of Global 2000 IT
    organizations will have implemented a formal open-
    source adoption and management policy.
   Through 2013, 50% of mainstream IT projects using
    OSS will not achieve cost savings over closed-source
    alternatives.
   Through 2013, 90% of market-leading, cloud-
    computing providers will depend on OSS to deliver
    products and services.
                                                       26
CLOUD USAGE BY VARIOUS ORGANIZATIONS..




                                         27
OPENSOURCE BI TOOLS




                      28
TDWI RESEARCH STUDY…




                       29
SAAS BI PROCESS FLOW




                       30
HARDWARE ACCESS IN CLOUD OPEN DW/BI…

 Secure access via web,RDC,VPN or combo..
 Customized server(Choose ur own
  CPU,RAM,Disk space)
 Scale up your capacity anytime

 Level 2,3 Server support incl 24 * 7
  monitoring service
 Applicaton support on demand

 Integrate with your local & Global IT groups

                                             31
SECURITY ASPECTS IN CLOUD OPEN DW/BI…

 Web,RDC,VPN or a combo
 Firewalls

 Certified Data center –SAS 70 type II

 NDA

 Virus protection




                                          32
MDM



MDM success for enterprise open source
       DWBI implementation—
 High quality master data is extremely
 valuable to enterprise business
 processes and analytics

                                     33
MDM-KEY CONSIDERATIONS
 Some key considerations for creating a
  master reference data source are outlined
  below:
 Central master reference data model
 Mapping
 Populating the master
 Publish data
 Access and provisioning
 Ownership and process

                                              34
MDM CHECKLIST

 MDM provides the system in obtaining the
  ―Single version of truth‖ across the various
  applications within the enterprise(despite the
  disparity of source systems)
The following checklist provides functional
  requirements for implementing and deploying
  MDM in an enterprise environment :
.

                                               35
MDM CHECKLIST –FUNCTIONALITY COVERED

 Profiling,
 Modeling

 Data quality

 Data Stewardship & Governance -Hierarchy
  management & security
 Workflow administration




                                             36
MDM-ACTIVE DATA MODEL ….

   Multi-Domain capability

   Object-Oriented Data Modeling

   Domain Templates

   Basic Data Validations and Business Rules

   Graphical Modeling Tool

   Multiple Language Support



                                                37
MDM-DOMAIN INTEGRATION


   Complete Data Integration Functionality

   Automated Services-Based Integration

   Real-Time and Batch Integration

   SOA Manager/Console

                                              38
MDM-DQ INTEGRATION WITH ETL,BI

   Data Profiling

   Accurate Data Match and Merge

   Data Bucketing and Blocking

   Data Augmentation

   Advanced Data Validations and Business Rules

   Data Standardization

   Data Cleansing


                                                   39
MDM-DATA STEWARDSHIP & GOVERNANCE

   Hierarchy Management – Multiple and Recursive
    Hierarchies

   Hierarchy Import and Overlays

   Business Process Management (BPM) and Workflow


   Automated Data Survivorship

   Manual Resolution through intuitive GUI interface

                                                        40
MDM-ADMINSITRATION

   Historical Views of Hub Data

   Hub Versioning

   Master Data Audit Trail Information

   Roles-Based Security and Active Directory Integration


   Versioning

                                                       41
TALEND MDM SOLUTION –OS PRODUCTS
   IBM Eclipse; JBoss Application Server and Portal;
    eXist Open database;
    XSD / XML Schema for the XML data models;
   XSLT for data transformation;
   Object programming following the EJB 2.1 standards
    ("Enterprise Java Beans") on Jboss server
   XQuery for queries on XML database;
    Document/literal WSI norm ("Web Service
    Interoperability") for web services
   Bonita for business process management.


                                                     42
COST COMPARISION




  Eg: Total cost for a small project, comparing the use of 3 approaches to
  data integration: opensource, proprietary and manual coding


                                                                             43
SUMMARISED COST-SMALL ETL PROJECT




                                    44
SUMMARY COST FOR MEDIUM ETL PROJECT




                                      45
ODW /BI --WHY IT WILL SUCCEED IN MARKET

   ODW/BI has got lot of winner(financial) groups……..
   Owners get low cost rapid entry into a data
    warehouses they can extend.
   Developers get to create/sell new ETL/BI products in
    a new market(Tool providers)
   ‗Source‘ vendors can solve reporting problems and
    advance new ways to compete(Source providers)
   Consultants get a bigger market for their services
    (Service providers).
   Domain exerts can participate by creating new open
    data warehouses using their deep industry
    knowledge (Service providers).

                                                           46
ODW /BI --WHY IT WILL SUCCEED IN MARKET

 Development licenses
 Training curve

 Development time

 Run-time licenses

 Deployment of hardware and operating
  system licenses
  IT operations


                                          47
ODW /BI --WHY IT WILL SUCCEED IN MARKET

 Maintenance/subscription
 Maintenance time

 Reliability and predictability of the data
  integration processes




                                               48
QUESTIONS?

Any questions,please get in touch with me at

Partha.dorai@ebidisolutions.com

Skype -ebidisolutions




                                               49
Thank You!




             50

More Related Content

What's hot

Step 2: Back Up Less Datasheet
Step 2: Back Up Less DatasheetStep 2: Back Up Less Datasheet
Step 2: Back Up Less DatasheetHitachi Vantara
 
Five Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceFive Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceHitachi Vantara
 
Dynamic Hyper-Converged Future Proof Your Data Center
Dynamic Hyper-Converged Future Proof Your Data CenterDynamic Hyper-Converged Future Proof Your Data Center
Dynamic Hyper-Converged Future Proof Your Data CenterDataCore Software
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperHitachi Vantara
 
Advantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSPAdvantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSPHitachi Vantara
 
Hitachi Virtual Storage Platform Competitive Comparison Guide
Hitachi Virtual Storage Platform Competitive Comparison GuideHitachi Virtual Storage Platform Competitive Comparison Guide
Hitachi Virtual Storage Platform Competitive Comparison GuideHitachi Vantara
 
Hitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- DatasheetHitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- DatasheetHitachi Vantara
 
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerStorage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerHitachi Vantara
 
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platformHitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platformHitachi Vantara
 
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File SharingESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File SharingHitachi Vantara
 
G11.2014 magic quadrant for general-purpose disk
G11.2014   magic quadrant for general-purpose diskG11.2014   magic quadrant for general-purpose disk
G11.2014 magic quadrant for general-purpose diskSatya Harish
 
Preparing for next-generation cloud: Lessons learned and insights shared
Preparing for next-generation cloud: Lessons learned and insights sharedPreparing for next-generation cloud: Lessons learned and insights shared
Preparing for next-generation cloud: Lessons learned and insights sharedThe Economist Media Businesses
 
Hitachi Unified Compute Platform Select for SAP HANA -- Solution Profile
Hitachi Unified Compute Platform Select for SAP HANA -- Solution ProfileHitachi Unified Compute Platform Select for SAP HANA -- Solution Profile
Hitachi Unified Compute Platform Select for SAP HANA -- Solution ProfileHitachi Vantara
 
Maximize IT Overview Slidecast
Maximize IT Overview SlidecastMaximize IT Overview Slidecast
Maximize IT Overview SlidecastHitachi Vantara
 
Maximize Operational Efficiency in a Tiered Storage Environment
Maximize Operational Efficiency in a Tiered Storage EnvironmentMaximize Operational Efficiency in a Tiered Storage Environment
Maximize Operational Efficiency in a Tiered Storage EnvironmentHitachi Vantara
 
Face Data Challenges of Life Science Organizations With Next-Generation Hitac...
Face Data Challenges of Life Science Organizations With Next-Generation Hitac...Face Data Challenges of Life Science Organizations With Next-Generation Hitac...
Face Data Challenges of Life Science Organizations With Next-Generation Hitac...Hitachi Vantara
 
Red Hat and Microsoft Partnership
Red Hat and Microsoft PartnershipRed Hat and Microsoft Partnership
Red Hat and Microsoft PartnershipKevin McCauley
 
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...Hitachi Vantara
 
Insider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage ServiceInsider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage ServiceDataCore Software
 
The State of Software Defined Storage Survey 2015
The State of Software Defined Storage Survey 2015The State of Software Defined Storage Survey 2015
The State of Software Defined Storage Survey 2015DataCore Software
 

What's hot (20)

Step 2: Back Up Less Datasheet
Step 2: Back Up Less DatasheetStep 2: Back Up Less Datasheet
Step 2: Back Up Less Datasheet
 
Five Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceFive Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud Experience
 
Dynamic Hyper-Converged Future Proof Your Data Center
Dynamic Hyper-Converged Future Proof Your Data CenterDynamic Hyper-Converged Future Proof Your Data Center
Dynamic Hyper-Converged Future Proof Your Data Center
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White Paper
 
Advantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSPAdvantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSP
 
Hitachi Virtual Storage Platform Competitive Comparison Guide
Hitachi Virtual Storage Platform Competitive Comparison GuideHitachi Virtual Storage Platform Competitive Comparison Guide
Hitachi Virtual Storage Platform Competitive Comparison Guide
 
Hitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- DatasheetHitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- Datasheet
 
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerStorage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
 
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platformHitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
 
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File SharingESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
 
G11.2014 magic quadrant for general-purpose disk
G11.2014   magic quadrant for general-purpose diskG11.2014   magic quadrant for general-purpose disk
G11.2014 magic quadrant for general-purpose disk
 
Preparing for next-generation cloud: Lessons learned and insights shared
Preparing for next-generation cloud: Lessons learned and insights sharedPreparing for next-generation cloud: Lessons learned and insights shared
Preparing for next-generation cloud: Lessons learned and insights shared
 
Hitachi Unified Compute Platform Select for SAP HANA -- Solution Profile
Hitachi Unified Compute Platform Select for SAP HANA -- Solution ProfileHitachi Unified Compute Platform Select for SAP HANA -- Solution Profile
Hitachi Unified Compute Platform Select for SAP HANA -- Solution Profile
 
Maximize IT Overview Slidecast
Maximize IT Overview SlidecastMaximize IT Overview Slidecast
Maximize IT Overview Slidecast
 
Maximize Operational Efficiency in a Tiered Storage Environment
Maximize Operational Efficiency in a Tiered Storage EnvironmentMaximize Operational Efficiency in a Tiered Storage Environment
Maximize Operational Efficiency in a Tiered Storage Environment
 
Face Data Challenges of Life Science Organizations With Next-Generation Hitac...
Face Data Challenges of Life Science Organizations With Next-Generation Hitac...Face Data Challenges of Life Science Organizations With Next-Generation Hitac...
Face Data Challenges of Life Science Organizations With Next-Generation Hitac...
 
Red Hat and Microsoft Partnership
Red Hat and Microsoft PartnershipRed Hat and Microsoft Partnership
Red Hat and Microsoft Partnership
 
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
Achieve Higher Quality Decisions Faster for a Competitive Edge in the Oil and...
 
Insider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage ServiceInsider's Guide- Building a Virtualized Storage Service
Insider's Guide- Building a Virtualized Storage Service
 
The State of Software Defined Storage Survey 2015
The State of Software Defined Storage Survey 2015The State of Software Defined Storage Survey 2015
The State of Software Defined Storage Survey 2015
 

Viewers also liked

Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehousehadoopsphere
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 

Viewers also liked (6)

Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 

Similar to Open Source DWBI-A Primer

ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Alluxio, Inc.
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...John Archer
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Red Hat Summit 2015: Red Hat Storage Breakfast session
Red Hat Summit 2015: Red Hat Storage Breakfast sessionRed Hat Summit 2015: Red Hat Storage Breakfast session
Red Hat Summit 2015: Red Hat Storage Breakfast sessionRed_Hat_Storage
 
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...DataCore Software
 
How to Transform Corporate IT into the Driver for Digital Transformation
How to Transform Corporate IT into the Driver for Digital TransformationHow to Transform Corporate IT into the Driver for Digital Transformation
How to Transform Corporate IT into the Driver for Digital TransformationEnterprise Management Associates
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_clouderaPrem Jain
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overviewKeshav Murthy
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Enterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktopsEnterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktopsMolten Technologies
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Software Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost ReductionSoftware Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost ReductionDataCore Software
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 

Similar to Open Source DWBI-A Primer (20)

ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with D...
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Red Hat Summit 2015: Red Hat Storage Breakfast session
Red Hat Summit 2015: Red Hat Storage Breakfast sessionRed Hat Summit 2015: Red Hat Storage Breakfast session
Red Hat Summit 2015: Red Hat Storage Breakfast session
 
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
 
Datos iO Product Overview
Datos iO Product OverviewDatos iO Product Overview
Datos iO Product Overview
 
How to Transform Corporate IT into the Driver for Digital Transformation
How to Transform Corporate IT into the Driver for Digital TransformationHow to Transform Corporate IT into the Driver for Digital Transformation
How to Transform Corporate IT into the Driver for Digital Transformation
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Enterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktopsEnterprise Desktops Well Served - a technical perspective on virtual desktops
Enterprise Desktops Well Served - a technical perspective on virtual desktops
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Software Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost ReductionSoftware Defined Storage Accelerates Storage Cost Reduction
Software Defined Storage Accelerates Storage Cost Reduction
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 

Open Source DWBI-A Primer

  • 1. OPEN SOURCE DATA WAREHOUSE /BI-A PRIMER  Webinar session for TechGig.com  Presentor –Parthasarathi Doraisamy Enterprise BIDI Solutions 1
  • 2. CLOUD --WHAT DOES THIS MEAN? UC Berkeley RAD Lab definition: 1. The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning 2. The elimination of an up-front commitment by Cloud users, thereby allowing companies to start small and increase hardware resources only when there is an increase in their needs; and 3. The ability to pay for use of computing resources on a short term basis as needed (e.g., processors by the hour and storage by the day) and release them as needed, thereby rewarding conservation by letting machines and storage go when they are no longer useful. 2
  • 3. REFERENCES/ACKNOWLEDGEMENT  Talend  Pentaho  Birt-eclipse  Birst  Jaspersoft  Greenplum  ASA –ODW model  Gartner research analysis  TDWI 3
  • 4. WHAT IS OPEN DW/BI?  Beware:Open doesn‘t means the product(s) are free!!!!!!!!  Open DW consists of pre designed,prebuilt Data warehouse architecture which comes free  Thereby it reduces overall cost and risk by reducing design,development and implementation time -> Reduces consumer‘s initial development cost(DQ,ETL,BI & Analytics etc.) But the vendors charge for the related services in maintainig the DW solution,further customizing to their exact business need ,Support & maintenance of the system.  Mitigates the risk through Rapid development  There are technical, social, and economic reasons that will move data warehousing and, perhaps all data models toward ‗open‘ solutions 4
  • 5. NEED FOR OPEN DW/BI  Open data warehouse,BI development progressed rapidly over the past few years due to compelling economic downturn  Faster deployment need of the proposed solution due to dynamic business changes  Now a days we can get‗Open Source‘ product for almost every aspect of the BI/Data warehouse stack including architectures which are picking up pace.(Few noticable players Talend,Pentaho,Jaspersoft,Birst .Qlikview etc.) 5
  • 6. INDUSTRY STATS ON TRADITIONAL DWBI  The average cost of these projects was $2.2 million ($3.1 million today, adjusted for inflation).  The average payback period was 2.3 years, with over 30% experiencing a 5+ year payback period.  The majority of respondents reported that their data warehouses consumed enormous resources and remained ―works in progress‖ for extended periods of time. 6
  • 7. NEED FOR OPEN DW/BI ….  Popular open source databases which help in these Open data warehouse are MySql (and its eco-system of add- ons), Ingres, EnterpriseDB.  Hardware,software cost considerations are further reduced by extending the Open solution in the hosted SaaS environment. 7
  • 8. ODW MODEL –A FRAMEWORK  Open Data Warehouse Model (ODWM) provides a generic framework for delivering an Open data warehouse  This generic data warehouse model can be further fine tuned to specific industry  Domain experts work upon these specific industry solutions just like in typical proprietary DW/BI solutions earlier,but differ in certain critical aspects like pre-design of Open DWBI architecture –data model,Etl design,BI design for the concerned industry domains 8
  • 9. ODW MODEL PRINCIPLE  The Open Datamodel consists of Hundreds of potential dimension tables with thousands of fields which forms the ―Foundation‖ These Open data warehouse are carefully designed to ensure stability of the DW system and easily facilitates the use of commercial ETL bridges/connectors (yet allow for interpretation through aggregation and by other means)  OLAP cubes and data marts can be constructed from the foundation as required by the business through similar bridges/connectors  These are the potential opportunity for Developers in their respective technology-ie.ETL,BI & Analytics area to come up with appropriate bridge solutions to seamlessly develop the entire ODW & BI model into a functional datamart,Enterprise Data warehouse 9
  • 10. ODW MODEL & ITS EXTENSIONS…..  They must allow for integration of multiple data sources of different granularity ;should in some manner, accommodate slowly changing dimensions  Each of the baseline ODW Db instance model can further create a range of domain specific(we can call it a Industry‘Slice‘) packaged solutions.These package may comprise of DQ,ETL,BI solution as outlined earlier.  These package solutions comprises of  Host the domain specific ODW solution(s) in the cloud .  These hosted Open DWBI solutions leads us to the packaged Data warehouse/BI Appliances 10
  • 12. OPEN DWBI APPLIANCES ……  The Open DWBI Appliance combines and supports thousands of data warehouses, many of those with hundreds of millions of records in a scalable multi-tenant environment.  These appliances got the capablity to generate complex datamodels, complex algorithms inbuilt within their query engine  These appliance vendors tie up with Hardware suppliers to construct the appliance in such a way for performing to its maximum efficiency 12
  • 13. OPEN DWBI APPLIANCES ……  These appliances are designed to power an on-demand software solution that needs to support a large number of users simultaneously and has the ability to quickly increase capacity  Built on a shared-nothing architecture and no data is shared across nodes (servers).  Popular appliances are Nettezza,Greenplum.. 13
  • 14. MULTIPLE APPLIANCES FOR ENTERPRISE NEED 14
  • 15. DWBI APPLIANCES –SALENT FEATURES High Availability and Failover Support  Designed for operation in a high-availability clustered Open DWBI environment Global Cache  Provides superior query performance via its massive-scale caching capabilities Simplified software Deployment and Upgrades in Place Dramatically simplifies its deployment by freeing IT from having to worry about resolving potentially complex OS compatibility issues, library dependencies or undesirable interactions with other applications. 15
  • 16. DWBI APPLIANCES –SALENT FEATURES….  Advanced ETL Services and a complete analytical data warehouse with automated warehouse generation  Cloud Connectors, for connecting to operational cloud applications- Eg.Salesforce.com,Google Analytics  These Connecters allow for automatic uploading of data into the appliance from various sources  Live Access, which allows you to analyze data from on-premise data warehouseswithout uploading 16
  • 17. SAAS BASED OPEN BI SOLUTION 17
  • 18. SAAS –OPEN BI SOLUTION…..  Low-cost, open source solution.  End-to-end, integrated BI and ETL capabilities.  Full enterprise-level support.  Flexibility of on-demand and on-premise deployment.  Support for mobile devices as a BI platform.  Support for iterative IT and business-user report generation process. 18
  • 19. CLOUD --WHAT DOES THIS MEAN?  Depends upon how you slice it vertically • IaaS -AWS, GoGrid, Mosso • PaaS -Google App Engine, Microsoft Azure • SaaS(BaaS) -Salesforce ,Talend,Jaspersoft, Pentaho,BIRT etc. 19
  • 21. CLOUD --WHAT DOES THIS MEAN? 21
  • 22. ODW -WHEN TO USE THE CLOUD?  Transient application lifespan or use  Quick start required  Budget pressure  Variable use/scale of application unknown  IT unavailable/unresponsive 22
  • 24. KEY FINDINGS FOR BUSINESS TRANSITION TO CLOUD TECHNOLOGY(IN 2009)  By 2012, at least 50% of direct commercial revenue attributed to open-source products or services will come from projects under a single vendor's patronage.  Through 2011, less than 50% of Global 2000 IT organizations will have implemented a formal open-source adoption and management policy as part of an enterprise software asset management strategy.  Through 2013, 50% of mainstream IT projects using open-source software (OSS) will not achieve cost savings over closed-source alternatives.  Through 2013, 90% of market-leading, cloud-computing providers will depend on OSS to deliver products and services. 24
  • 25. MOVING TO CLOUD-RECOMMENDATIONS  Expect vendors to play an increasing role in the governance of many market-leading, open-source solutions during the next several years.  Move aggressively to establish an effective enterprise adoption policy, and bring OSS and hardware under asset management controls.  Do not expect to automatically save money with OSS or any technology without effective financial management. Do expect to carefully manage open-source solutions in the appropriate scenarios to realize total cost of ownership (TCO) advantages.  Manage cloud-based software strategies and open-source strategies together for maximum effect. Look for synergies between both, and the ability of OSS to move your workloads to the cloud. 25
  • 26. STRATEGIC PLANNING ASSUMPTION(S)  By 2012, at least 50% of direct commercial revenue attributed to open-source products or services will come from projects under a single vendor's patronage.  Through 2011, less than 35% of Global 2000 IT organizations will have implemented a formal open- source adoption and management policy.  Through 2013, 50% of mainstream IT projects using OSS will not achieve cost savings over closed-source alternatives.  Through 2013, 90% of market-leading, cloud- computing providers will depend on OSS to deliver products and services. 26
  • 27. CLOUD USAGE BY VARIOUS ORGANIZATIONS.. 27
  • 30. SAAS BI PROCESS FLOW 30
  • 31. HARDWARE ACCESS IN CLOUD OPEN DW/BI…  Secure access via web,RDC,VPN or combo..  Customized server(Choose ur own CPU,RAM,Disk space)  Scale up your capacity anytime  Level 2,3 Server support incl 24 * 7 monitoring service  Applicaton support on demand  Integrate with your local & Global IT groups 31
  • 32. SECURITY ASPECTS IN CLOUD OPEN DW/BI…  Web,RDC,VPN or a combo  Firewalls  Certified Data center –SAS 70 type II  NDA  Virus protection 32
  • 33. MDM MDM success for enterprise open source DWBI implementation— High quality master data is extremely valuable to enterprise business processes and analytics 33
  • 34. MDM-KEY CONSIDERATIONS  Some key considerations for creating a master reference data source are outlined below:  Central master reference data model  Mapping  Populating the master  Publish data  Access and provisioning  Ownership and process 34
  • 35. MDM CHECKLIST MDM provides the system in obtaining the ―Single version of truth‖ across the various applications within the enterprise(despite the disparity of source systems) The following checklist provides functional requirements for implementing and deploying MDM in an enterprise environment : . 35
  • 36. MDM CHECKLIST –FUNCTIONALITY COVERED  Profiling,  Modeling  Data quality  Data Stewardship & Governance -Hierarchy management & security  Workflow administration 36
  • 37. MDM-ACTIVE DATA MODEL ….  Multi-Domain capability  Object-Oriented Data Modeling  Domain Templates  Basic Data Validations and Business Rules  Graphical Modeling Tool  Multiple Language Support 37
  • 38. MDM-DOMAIN INTEGRATION  Complete Data Integration Functionality  Automated Services-Based Integration  Real-Time and Batch Integration  SOA Manager/Console 38
  • 39. MDM-DQ INTEGRATION WITH ETL,BI  Data Profiling  Accurate Data Match and Merge  Data Bucketing and Blocking  Data Augmentation  Advanced Data Validations and Business Rules  Data Standardization  Data Cleansing 39
  • 40. MDM-DATA STEWARDSHIP & GOVERNANCE  Hierarchy Management – Multiple and Recursive Hierarchies  Hierarchy Import and Overlays  Business Process Management (BPM) and Workflow  Automated Data Survivorship  Manual Resolution through intuitive GUI interface 40
  • 41. MDM-ADMINSITRATION  Historical Views of Hub Data  Hub Versioning  Master Data Audit Trail Information  Roles-Based Security and Active Directory Integration  Versioning 41
  • 42. TALEND MDM SOLUTION –OS PRODUCTS  IBM Eclipse; JBoss Application Server and Portal; eXist Open database;  XSD / XML Schema for the XML data models;  XSLT for data transformation;  Object programming following the EJB 2.1 standards ("Enterprise Java Beans") on Jboss server  XQuery for queries on XML database; Document/literal WSI norm ("Web Service Interoperability") for web services  Bonita for business process management. 42
  • 43. COST COMPARISION Eg: Total cost for a small project, comparing the use of 3 approaches to data integration: opensource, proprietary and manual coding 43
  • 45. SUMMARY COST FOR MEDIUM ETL PROJECT 45
  • 46. ODW /BI --WHY IT WILL SUCCEED IN MARKET  ODW/BI has got lot of winner(financial) groups……..  Owners get low cost rapid entry into a data warehouses they can extend.  Developers get to create/sell new ETL/BI products in a new market(Tool providers)  ‗Source‘ vendors can solve reporting problems and advance new ways to compete(Source providers)  Consultants get a bigger market for their services (Service providers).  Domain exerts can participate by creating new open data warehouses using their deep industry knowledge (Service providers). 46
  • 47. ODW /BI --WHY IT WILL SUCCEED IN MARKET  Development licenses  Training curve  Development time  Run-time licenses  Deployment of hardware and operating system licenses IT operations 47
  • 48. ODW /BI --WHY IT WILL SUCCEED IN MARKET  Maintenance/subscription  Maintenance time  Reliability and predictability of the data integration processes 48
  • 49. QUESTIONS? Any questions,please get in touch with me at Partha.dorai@ebidisolutions.com Skype -ebidisolutions 49