SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Open Source SOA in
         the Cloud: Data
       Analytics in the Cloud
Tom Plunkett   TomPlunkett@vt.edu
Michael Sick   michael.sick@serenesoftware.com


           SOA World 2009
Overview

                                                  • Who are we?
                                 Introductions
                                                  • Baselines & definitions

                                                  • Targeted Use Cases
                                 Opportunity      • Technical convergence & opportunities
                                                  • Commercial opportunities & drivers

                                                  • State of current technology
Data Analytics                   Technology &
                                                  • Commercial & FOSS solutions
in the Cloud                     Standards
                                                  • Hadoop Focus

                                                  • Challenges to Meet Target Use Cases
                                 Challenges       • Economic challenges & the role of “free”
                                                  • Wide scale challenges in Cloud and data analytics

                                                  • Questions
                                 Questions
                                                  • Contacts

This work is licensed under a Creative                                                      Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                  2
License
Introductions




                                     Data Analytics in the Cloud:                Data Analytics
                                                                                 in the Cloud
                                                                                                  Opportunity



                                                                                                  Technology &
                                                                                                  Standards




                                           Introductions
                                                                                                  Challenges



                                                                                                  Questions




                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions


This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              3
License
Introductions



                                                                                                                 Opportunity




                                                Tom Plunkett
                                                                                                Data Analytics   Technology &
                                                                                                in the Cloud     Standards


                                                                                                                 Challenges



                                                                                                                 Questions




                                         Extensive Federal Government Experience

                                         IBM Certified SOA Solution Designer

                                         Patents

                                         Teach OOP and Java for Virginia Tech




This work is licensed under a Creative                                             Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                        4
License
Introductions



                                                                                                                             Opportunity




                                                     Michael Sick
                                                                                                            Data Analytics   Technology &
                                                                                                            in the Cloud     Standards


                                                                                                                             Challenges



                                                                                                                             Questions




                                         Commercial & Federal Enterprise Architect


                                         Owner: Serene Software Inc. – EA Services Firm

                                         Clients include: BAE, USAF, Raytheon, BearingPoint,
                                         McGraw-Hill, Sun Microsystems, Badcock Furniture

                                         Fascinated by technology -15 years running




This work is licensed under a Creative                                                         Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                             5
License
Introductions



                                                                                                                             Opportunity




                                              Serene Software
                                                                                                            Data Analytics   Technology &
                                                                                                            in the Cloud     Standards


                                                                                                                             Challenges



                                                                                                                             Questions




                                • Serene is a boutique consulting company focusing on
                                  delivery of Enterprise Architecture services and solutions
                                • Service Areas
                                  – IT Governance
                                  – IT Strategy
                                  – IT Cost Containment
                                  – Service Oriented Architectures (SOA)
                                  – IT Solution Selection
                                  – IT Audit & Analysis
                                • Experience includes: BAE, USAF, Raytheon, BearingPoint,
                                  McGraw-Hill, Sun Microsystems, Badcock Furniture, …
                                • Founded in 2003 (privately held, no debt) and
                                  headquartered in Jacksonville, FL



This work is licensed under a Creative                                                         Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           6
License
Introductions



                                                                                                                             Opportunity




              Draft NIST Definition of Cloud Computing
                                                                                                            Data Analytics   Technology &
                                                                                                            in the Cloud     Standards


                                                                                                                             Challenges



                                                                                                                             Questions




               A model for enabling convenient, on-demand network access to a shared pool
               of configurable computing resources that can be rapidly provisioned and relea-
               sed with minimal management effort or service provider interaction

 Essential Characteristics                   Delivery Models                 Deployment Models
 • On-demand self-service                    • Cloud Software as a           • Private cloud
                                               Service (SaaS)
 • Ubiquitous network access                                                 • Community cloud
                                             • Cloud Platform as a Service
 • Location independent                                                      • Public cloud
                                               (PaaS)
   resource pooling
                                                                             • Hybrid cloud
                                             • Cloud Infrastructure as a
 • Rapid elasticity
                                               Service (IaaS)
 • Measured Service




Source: Draft NIST Definition of Cloud Computing, 06/2009

This work is licensed under a Creative                                                         Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                       7
License
Introductions



                                                                                                                         Opportunity




                                         OSI Open Source Definition
                                                                                                        Data Analytics   Technology &
                                                                                                        in the Cloud     Standards


                                                                                                                         Challenges



                                                                                                                         Questions




                                            Free Redistribution

                                            Source Code

                                            Derived Works

                                            Integrity of The Author's Source Code

                                            No Discrimination Against Persons or Groups

                                            No Discrimination Against Fields of Endeavor

                                            Distribution of License

                                            License Must Not Be Specific to a Product

                                            License Must Not Restrict Other Software

                                            License Must Be Technology-Neutral
Source: http://www.opensource.org/docs/osd

This work is licensed under a Creative                                                     Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                             8
License
Introductions



                                                                                                                               Opportunity




                              The Open Group SOA Definition
                                                                                                              Data Analytics   Technology &
                                                                                                              in the Cloud     Standards


                                                                                                                               Challenges



                                                                                                                               Questions




                                  Service-Oriented Architecture (SOA) is an architectural
                                  style that supports service orientation

                                 Service orientation is a way of thinking in terms of services
                                 and service-based development and the outcomes of services




Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632

This work is licensed under a Creative                                                           Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           9
License
Introductions




                   Data Clouds & Data Grids – What‘s the                                                             Data Analytics
                                                                                                                     in the Cloud
                                                                                                                                      Opportunity



                                                                                                                                      Technology &
                                                                                                                                      Standards




                                difference?
                                                                                                                                      Challenges



                                                                                                                                      Questions




                                         Often Data Clouds & Data Grids are used inter-
                                         changeably, we make the following distinctions

 Data Grids                                                          Data Clouds
 • Grid computing system optimized to share                          • Focuses on perception of infinite storage,
   large amounts of distributed data                                   computing capacity
 • Focus on technical capabilities                                   • Focus on cost, virtualization & flexible
                                                                       capacity
 • Often combined with computational grid
   computing systems                                                 • Enables scale-up/scale-down economics
 • Data often moved to compute grid for use                          • Data moved rarely, locality is a key feature
 • Often oriented towards highly structured                          • Clouds thus far focusing on column
   scientific data computing applications                              oriented, massively scalable data stores




Sources: Wikipedia & [Grossman 1]

This work is licensed under a Creative                                                                  Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           10
License
Introductions



                                                                                                                      Opportunity




                                           Definition: Mashups
                                                                                                     Data Analytics   Technology &
                                                                                                     in the Cloud     Standards


                                                                                                                      Challenges



                                                                                                                      Questions




                                  Web available resource that combines data/functions
                                  from two or more external resources

                                 Idea of mashup efforts is to reduce the cost of
                                 producing and consuming resources

                                 Integration should be fast, easy

                                 Often focuses on widely available formats/protocols
                                 like RSS or Atom over HTTP




This work is licensed under a Creative                                                  Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           11
License
Introductions




                                     Data Analytics in the Cloud:                Data Analytics
                                                                                 in the Cloud
                                                                                                  Opportunity



                                                                                                  Technology &
                                                                                                  Standards




                                           Opportunities
                                                                                                  Challenges



                                                                                                  Questions




                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions


This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              12
License
Introductions




               Use Case: Cloud Data Analytical Tools for                                                    Data Analytics
                                                                                                            in the Cloud
                                                                                                                             Opportunity



                                                                                                                             Technology &
                                                                                                                             Standards




                Intelligence Community Field Analyst
                                                                                                                             Challenges



                                                                                                                             Questions




                              Problem Statement: Analytical Tools Obsolete On Deployment,
                              field analysts need timely, configurable data analytics. How
                              does cloud based DA meet the needs of IC analysts

                                             Cloud Analytical
 Customer Problem                                                            Customer Value
                                             Tools Solution
 • Traditional business                      • Recomposable Cloud            • Enabling field analysts to
   intelligence tools require                  Computing Data Analytical       quickly build the analytical
   years to develop                            Tools                           tool they need to analyze
                                                                               petabytes of data
 • Field Analysts confront                      – Apache Hadoop
   situations which are rapidly
                                                – Mashups
   changing
                                                – Service-Oriented
 • Petabytes of data require
                                                  Architecture
   analysis




This work is licensed under a Creative                                                         Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                       13
License
Introductions




              Why the “Buzzword” Soup? Convergence                                                   Data Analytics
                                                                                                     in the Cloud
                                                                                                                      Opportunity



                                                                                                                      Technology &
                                                                                                                      Standards




                          of Capabilities
                                                                                                                      Challenges



                                                                                                                      Questions




                                                                     Convergence of capabilities
                                 Free Open                           New opportunities in breadth
                                   Source                            and depth of DA services
                                  Software                           • Big Data: Cloud disk and data
                                   (FOSS)                              storage engines make peta-
                                                                       byte environments available
                                                                       to new clients
                                                                     • Value Based Billing: Heavy
     Virtual-                      Cloud                 Data          use of FOSS in the cloud
                                             SaaS                      reduces costs directly &
     ization                     Computing               Analytics
                                                                       indirectly
                                                                     • Capacity Scaling: Scaling
                                                                       up/down of capacity in pay-go
                                                                       fashion makes DA available to
                                                                       wider audience
                                   Mashups                           • Composable UI’s: Capability
                                                                       to assemble DA results into
                                                                       various interfaces

This work is licensed under a Creative                                                  Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States               14
License
Introductions




                                         Early Data Analytic Cloud                                                      Data Analytics
                                                                                                                        in the Cloud
                                                                                                                                         Opportunity



                                                                                                                                         Technology &
                                                                                                                                         Standards




                                           Consumers/Providers
                                                                                                                                         Challenges



                                                                                                                                         Questions




                       Profile            Types                         Example Companies

                                         Big Internet Companies        • Yahoo, Amazon – can build DA on inf.
                       Internet Scale




                                                                                                                                  Services
                       Service           SaaS Companies                • Force.com – DA & Warehousing to SBA’s
                       Providers                                       • Facebook – sell DA access to anon. user info
                                         Social Platforms

                                         Insurers                      • BCBS – private clouds across consortium




                                                                                                                                  Services
                       Large data-
                       centric Tradi-    Healthcare & Biotech          • Kaiser Permanente – common DA services
Cloud DA               tional Co’s
                                         Rating Agencies               • S & P – open DA cloud to customers
Oppor-
tunities
                                         Intelligence Community        • CIA –private org-wide Cloud




                                                                                                                                  Services
                       Government
                                         Defense Managed Services • DISA -- offer DA to .mil clients
                       Organizations
                                         Healthcare                    • SSA – offer DA to fraud prevention analysts




                                                                                                                                  Services
                                         DAaas Infrastructure          • Cloudera –managed Hadoop instances
                       DAaaS
                       Providers         SMB DAaaS Provider            • ?? – managed DAaaS, simplified, low cost

This work is licensed under a Creative                                                                     Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                             15
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                      Technology & Standards
                                                                                                   Challenges



                                                                                                   Questions




                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions


This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              16
License
Introductions



                                                                                                                                 Opportunity




                                           Google MapReduce
                                                                                                                Data Analytics   Technology &
                                                                                                                in the Cloud     Standards


                                                                                                                                 Challenges



                                                                                                                                 Questions




                                  Algorithm for computing distributed problems using a
                                  divide and conquer approach with a cluster of nodes

                                  Master node Maps input into smaller sub-problems and distributes
                                  the work to the cluster. A worker node may further map the work
                                  for a further cluster of nodes. The worker nodes then process the
                                  smaller problems, and return the answers back to the master node



                                  Master node then Reduces the set of answers into the answer to the
                                  original problem




This work is licensed under a Creative                                                            Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                          17
License
Introductions



                                                                                                                            Opportunity




                                           Apache Hadoop
                                                                                                           Data Analytics   Technology &
                                                                                                           in the Cloud     Standards


                                                                                                                            Challenges



                                                                                                                            Questions




                          Open Source implementation of the MapReduce algorithms

                          Hadoop can store and process petabytes of data

                          Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper

                          Yahoo (more than 100,000 CPUs in >25,000 computers
                          running Hadoop) and other companies make extensive use of Hadoop




This work is licensed under a Creative                                                       Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                      18
License
Introductions




                          As-Is Hadoop Simplified Reference                                       Data Analytics
                                                                                                  in the Cloud
                                                                                                                   Opportunity



                                                                                                                   Technology &
                                                                                                                   Standards




                                    Architecture
                                                                                                                   Challenges



                                                                                                                   Questions




                                         Chukwa           HBase



                                                                  Structured Data
                                                  Apache Hadoop

                                                                  Unstructured
                                                    Zookeeper
                                                                  Data


               Business
                                         ETL              Pig         Hive
               Intelligence




This work is licensed under a Creative                                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                    19
License
Introductions



                                                                                                                         Opportunity




                                   Apache Hadoop Sub-projects
                                                                                                        Data Analytics   Technology &
                                                                                                        in the Cloud     Standards


                                                                                                                         Challenges



                                                                                                                         Questions




Hadoop Sub-
                               Capabilities                                Example Companies
projects
Chukwa                      • Data collection system for monitoring and   • Yahoo
                              analyzing large distributed systems

HBase                       • Similar to Google’s BigTable                • Yahoo
                            • Distributed database for structured data
                            • Multi-dimensional sorted map

Hive                        • Data warehouse infrastructure for large     • Facebook
                              datasets
                            • Hive QL query language

Pig                         • High-level language for data analysis       • Yahoo
                            • Compiler for Map-Reduce programs

Zookeeper                   • Configuration, Naming, Distributed          • Yahoo
                              Synchronization, and group services

This work is licensed under a Creative                                                    Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                        20
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                            Challenges
                                                                                                   Challenges



                                                                                                   Questions




                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions


This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              21
License
Introductions



                                                                                                                   Opportunity




                    To-Be Simplified Hadoop Architecture
                                                                                                  Data Analytics   Technology &
                                                                                                  in the Cloud     Standards


                                                                                                                   Challenges



                                                                                                                   Questions




 REST API

                                                              HBase
 SOAP API


 Business                                                                         Structured
 Intelligence                                                                     Data
                                     Query           Apache Hadoop
                                     Language                                     Unstructured
 Pig                                            Chukwa                Zookeeper   Data


 Hive
                                                          Algorithm
                                                          Library

 ETL

This work is licensed under a Creative                                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                    22
License
Introductions



                                                                                                                                                      Opportunity




                                               Key Challenges
                                                                                                                                     Data Analytics   Technology &
                                                                                                                                     in the Cloud     Standards


                                                                                                                                                      Challenges



                                                                                                                                                      Questions




                                             Hardware                     Speed of Rack Interconnects, Multi-core
                          Infrastructure     Parallelization              Core platform, Data Analytic Components
                                             Node Affinity                Make use of super nodes, XML i/o, en/de-crypt
                                             Cost                         “brutally efficient” pricing, FOSS advantages
                          Adoption           Cost Models                  Accurate, open models of CapEx, OpEx costs
                                             Migration Pain               Full warehouse migration, ETL,
                                             Ease of Admin.               Parallel current RDBMS, Warehouse admin
                                             Debugging                    Distributed debugging, integration w/ Provider
Emerging                  Administration
Challenges                                   Flexible Provisioning        Multi-level provisioning – co., dept, individual
                                             System Reporting             Reporting, audit trails, view to DA system
                                             ETL Integration              Interface, metadata optimized for ETL loading
                          Input & Analysis   Intuitive API’s              Declarative & programmatic cross language
                                             Product Integration          BI, Applications (SAP, Oracle Financial, Lawson)
                                             Data Visualization           Viewing & drill down of very large data sets
                          Output             Intuitive API’s              Declarative & programmatic cross language
                                             Mashups/Dynamics             Easy discovery of data & functions & workflows

This work is licensed under a Creative                                                                                 Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                                23
License
Introductions



                                                                                                                                                      Opportunity




                          Solutions: Projected & In-Progress
                                                                                                                                     Data Analytics   Technology &
                                                                                                                                     in the Cloud     Standards


                                                                                                                                                      Challenges



                                                                                                                                                      Questions




                                             Hardware                     Interconnect $$ dropping, hardware maturing
                          Infrastructure     Parallelization              Platforms advance, market for components
                                             Node Affinity                Discovery of capability, affinity into Hadoop, …
                                             Cost                         FOSS’s game to loose, small diff * a lot = a lot
                          Adoption           Cost Models                  Industry standard ROI/IRR models for CC
                                             Migration Pain               Migration toolkits for traditional DW products
                                             Ease of Admin.               Integrated & extended admin packages
                                             Debugging                    Commercial distributed debugging
Emerging                  Administration
Challenges                                   Flexible Provisioning        Multi-level provisioning – co., dept, individual
                                             System Reporting             Reporting, audit trails, view to DA system
                                             ETL Integration              ETL interface, support of popular packages
                          Input & Analysis   Intuitive API’s              SQL like interface in core, language bindings
                                             Product Integration          3rd party adaptors, IWay et al
                                             Data Visualization           Modeling, meta-data, traceability, and new UI’s
                          Output             Intuitive API’s              SQL like interface in core, language bindings
                                             Mashups/Dynamics             Generic datatypes, discovery services

This work is licensed under a Creative                                                                                 Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                                24
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                             Questions
                                                                                                   Challenges



                                                                                                   Questions




                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions


This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              25
License
Introductions



                                                                                                           Opportunity




                            Question? & Contact Information
                                                                                          Data Analytics   Technology &
                                                                                          in the Cloud     Standards


                                                                                                           Challenges



                                                                                                           Questions




       Principle Architect / Partner            Cloud Computing Architect
       Michael A. Sick                          Tom Plunkett
       888.777.1847                             888.777.1847
       michael.sick@serenesoftware.com          TomPlunkett@vt.edu

       Address                                  Address
       Serene Software                          Serene Software
       116 19th Ave. North, Suite 503           116 19th Ave. North, Suite 503
       Jacksonville Beach, FL                   Jacksonville Beach, FL
       URL: www.serenesoftware.com              URL: www.serenesoftware.com




This work is licensed under a Creative                                      Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States      26
License

Contenu connexe

Similaire à Data Analytics In The Cloud Soa World

Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Hoffman.ed
Hoffman.edHoffman.ed
Hoffman.edNASAPMC
 
ROI at the bug factory - Goldratt & throughput (2004)
ROI at the bug factory - Goldratt & throughput (2004)ROI at the bug factory - Goldratt & throughput (2004)
ROI at the bug factory - Goldratt & throughput (2004)Neil Thompson
 
JDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdf
JDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdfJDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdf
JDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdfInSync2011
 
מצגת מטריקס
מצגת מטריקסמצגת מטריקס
מצגת מטריקסguestdb2e01
 
מצגת מטריקס
מצגת מטריקסמצגת מטריקס
מצגת מטריקסyaelzl
 
Voice of the Customer
Voice of the CustomerVoice of the Customer
Voice of the CustomerSVPMA
 
Cloud project secrets of success
Cloud project secrets of successCloud project secrets of success
Cloud project secrets of successKhazret Sapenov
 
Governance as Sustainability in the Enterprise Architecture Discipline
Governance as Sustainability in the Enterprise Architecture Discipline Governance as Sustainability in the Enterprise Architecture Discipline
Governance as Sustainability in the Enterprise Architecture Discipline Eric Stephens
 
Barrick simulation with mimic presentation
Barrick simulation with mimic presentationBarrick simulation with mimic presentation
Barrick simulation with mimic presentationMYNAH Technologies
 
The Hidden Costs Ba World V2 1
The Hidden Costs   Ba World V2 1The Hidden Costs   Ba World V2 1
The Hidden Costs Ba World V2 1bclohesy
 
Business Innovation Conference 10 11 2011
Business Innovation Conference 10 11 2011Business Innovation Conference 10 11 2011
Business Innovation Conference 10 11 2011Maria Thompson
 
BEI - Predictive Innovation
BEI - Predictive InnovationBEI - Predictive Innovation
BEI - Predictive InnovationMaria Thompson
 
Data Mining
Data MiningData Mining
Data Miningswami920
 
Cloud computing standards
Cloud computing standardsCloud computing standards
Cloud computing standardsSeungyun Lee
 
10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academiaJon Hedlund
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...ptaglephd
 
Guide to New Product Development (NPD)
Guide to New Product Development (NPD)Guide to New Product Development (NPD)
Guide to New Product Development (NPD)Technology Multipliers
 

Similaire à Data Analytics In The Cloud Soa World (20)

Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Hoffman.ed
Hoffman.edHoffman.ed
Hoffman.ed
 
ROI at the bug factory - Goldratt & throughput (2004)
ROI at the bug factory - Goldratt & throughput (2004)ROI at the bug factory - Goldratt & throughput (2004)
ROI at the bug factory - Goldratt & throughput (2004)
 
Sukhbir jasuja digital_trends_11
Sukhbir jasuja digital_trends_11Sukhbir jasuja digital_trends_11
Sukhbir jasuja digital_trends_11
 
JDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdf
JDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdfJDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdf
JDE & Peoplesoft 2 _ Mike Ward _ Security implications of Upgrading JDE.pdf
 
מצגת מטריקס
מצגת מטריקסמצגת מטריקס
מצגת מטריקס
 
מצגת מטריקס
מצגת מטריקסמצגת מטריקס
מצגת מטריקס
 
Voice of the Customer
Voice of the CustomerVoice of the Customer
Voice of the Customer
 
Cloud project secrets of success
Cloud project secrets of successCloud project secrets of success
Cloud project secrets of success
 
Governance as Sustainability in the Enterprise Architecture Discipline
Governance as Sustainability in the Enterprise Architecture Discipline Governance as Sustainability in the Enterprise Architecture Discipline
Governance as Sustainability in the Enterprise Architecture Discipline
 
Barrick simulation with mimic presentation
Barrick simulation with mimic presentationBarrick simulation with mimic presentation
Barrick simulation with mimic presentation
 
The Hidden Costs Ba World V2 1
The Hidden Costs   Ba World V2 1The Hidden Costs   Ba World V2 1
The Hidden Costs Ba World V2 1
 
Business Innovation Conference 10 11 2011
Business Innovation Conference 10 11 2011Business Innovation Conference 10 11 2011
Business Innovation Conference 10 11 2011
 
BEI - Predictive Innovation
BEI - Predictive InnovationBEI - Predictive Innovation
BEI - Predictive Innovation
 
Data Mining
Data MiningData Mining
Data Mining
 
Cloud computing standards
Cloud computing standardsCloud computing standards
Cloud computing standards
 
10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia10. fri 1130 1230 soni - analytics in academia
10. fri 1130 1230 soni - analytics in academia
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
 
Tech innovation s5_intelligence
Tech innovation s5_intelligenceTech innovation s5_intelligence
Tech innovation s5_intelligence
 
Guide to New Product Development (NPD)
Guide to New Product Development (NPD)Guide to New Product Development (NPD)
Guide to New Product Development (NPD)
 

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Data Analytics In The Cloud Soa World

  • 1. Open Source SOA in the Cloud: Data Analytics in the Cloud Tom Plunkett TomPlunkett@vt.edu Michael Sick michael.sick@serenesoftware.com SOA World 2009
  • 2. Overview • Who are we? Introductions • Baselines & definitions • Targeted Use Cases Opportunity • Technical convergence & opportunities • Commercial opportunities & drivers • State of current technology Data Analytics Technology & • Commercial & FOSS solutions in the Cloud Standards • Hadoop Focus • Challenges to Meet Target Use Cases Challenges • Economic challenges & the role of “free” • Wide scale challenges in Cloud and data analytics • Questions Questions • Contacts This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 2 License
  • 3. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Introductions Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 3 License
  • 4. Introductions Opportunity Tom Plunkett Data Analytics Technology & in the Cloud Standards Challenges Questions Extensive Federal Government Experience IBM Certified SOA Solution Designer Patents Teach OOP and Java for Virginia Tech This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 4 License
  • 5. Introductions Opportunity Michael Sick Data Analytics Technology & in the Cloud Standards Challenges Questions Commercial & Federal Enterprise Architect Owner: Serene Software Inc. – EA Services Firm Clients include: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture Fascinated by technology -15 years running This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 5 License
  • 6. Introductions Opportunity Serene Software Data Analytics Technology & in the Cloud Standards Challenges Questions • Serene is a boutique consulting company focusing on delivery of Enterprise Architecture services and solutions • Service Areas – IT Governance – IT Strategy – IT Cost Containment – Service Oriented Architectures (SOA) – IT Solution Selection – IT Audit & Analysis • Experience includes: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture, … • Founded in 2003 (privately held, no debt) and headquartered in Jacksonville, FL This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 6 License
  • 7. Introductions Opportunity Draft NIST Definition of Cloud Computing Data Analytics Technology & in the Cloud Standards Challenges Questions A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and relea- sed with minimal management effort or service provider interaction Essential Characteristics Delivery Models Deployment Models • On-demand self-service • Cloud Software as a • Private cloud Service (SaaS) • Ubiquitous network access • Community cloud • Cloud Platform as a Service • Location independent • Public cloud (PaaS) resource pooling • Hybrid cloud • Cloud Infrastructure as a • Rapid elasticity Service (IaaS) • Measured Service Source: Draft NIST Definition of Cloud Computing, 06/2009 This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 7 License
  • 8. Introductions Opportunity OSI Open Source Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Free Redistribution Source Code Derived Works Integrity of The Author's Source Code No Discrimination Against Persons or Groups No Discrimination Against Fields of Endeavor Distribution of License License Must Not Be Specific to a Product License Must Not Restrict Other Software License Must Be Technology-Neutral Source: http://www.opensource.org/docs/osd This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 8 License
  • 9. Introductions Opportunity The Open Group SOA Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Service-Oriented Architecture (SOA) is an architectural style that supports service orientation Service orientation is a way of thinking in terms of services and service-based development and the outcomes of services Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632 This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 9 License
  • 10. Introductions Data Clouds & Data Grids – What‘s the Data Analytics in the Cloud Opportunity Technology & Standards difference? Challenges Questions Often Data Clouds & Data Grids are used inter- changeably, we make the following distinctions Data Grids Data Clouds • Grid computing system optimized to share • Focuses on perception of infinite storage, large amounts of distributed data computing capacity • Focus on technical capabilities • Focus on cost, virtualization & flexible capacity • Often combined with computational grid computing systems • Enables scale-up/scale-down economics • Data often moved to compute grid for use • Data moved rarely, locality is a key feature • Often oriented towards highly structured • Clouds thus far focusing on column scientific data computing applications oriented, massively scalable data stores Sources: Wikipedia & [Grossman 1] This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 10 License
  • 11. Introductions Opportunity Definition: Mashups Data Analytics Technology & in the Cloud Standards Challenges Questions Web available resource that combines data/functions from two or more external resources Idea of mashup efforts is to reduce the cost of producing and consuming resources Integration should be fast, easy Often focuses on widely available formats/protocols like RSS or Atom over HTTP This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 11 License
  • 12. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Opportunities Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 12 License
  • 13. Introductions Use Case: Cloud Data Analytical Tools for Data Analytics in the Cloud Opportunity Technology & Standards Intelligence Community Field Analyst Challenges Questions Problem Statement: Analytical Tools Obsolete On Deployment, field analysts need timely, configurable data analytics. How does cloud based DA meet the needs of IC analysts Cloud Analytical Customer Problem Customer Value Tools Solution • Traditional business • Recomposable Cloud • Enabling field analysts to intelligence tools require Computing Data Analytical quickly build the analytical years to develop Tools tool they need to analyze petabytes of data • Field Analysts confront – Apache Hadoop situations which are rapidly – Mashups changing – Service-Oriented • Petabytes of data require Architecture analysis This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 13 License
  • 14. Introductions Why the “Buzzword” Soup? Convergence Data Analytics in the Cloud Opportunity Technology & Standards of Capabilities Challenges Questions Convergence of capabilities Free Open New opportunities in breadth Source and depth of DA services Software • Big Data: Cloud disk and data (FOSS) storage engines make peta- byte environments available to new clients • Value Based Billing: Heavy Virtual- Cloud Data use of FOSS in the cloud SaaS reduces costs directly & ization Computing Analytics indirectly • Capacity Scaling: Scaling up/down of capacity in pay-go fashion makes DA available to wider audience Mashups • Composable UI’s: Capability to assemble DA results into various interfaces This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 14 License
  • 15. Introductions Early Data Analytic Cloud Data Analytics in the Cloud Opportunity Technology & Standards Consumers/Providers Challenges Questions Profile Types Example Companies Big Internet Companies • Yahoo, Amazon – can build DA on inf. Internet Scale Services Service SaaS Companies • Force.com – DA & Warehousing to SBA’s Providers • Facebook – sell DA access to anon. user info Social Platforms Insurers • BCBS – private clouds across consortium Services Large data- centric Tradi- Healthcare & Biotech • Kaiser Permanente – common DA services Cloud DA tional Co’s Rating Agencies • S & P – open DA cloud to customers Oppor- tunities Intelligence Community • CIA –private org-wide Cloud Services Government Defense Managed Services • DISA -- offer DA to .mil clients Organizations Healthcare • SSA – offer DA to fraud prevention analysts Services DAaas Infrastructure • Cloudera –managed Hadoop instances DAaaS Providers SMB DAaaS Provider • ?? – managed DAaaS, simplified, low cost This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 15 License
  • 16. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Technology & Standards Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 16 License
  • 17. Introductions Opportunity Google MapReduce Data Analytics Technology & in the Cloud Standards Challenges Questions Algorithm for computing distributed problems using a divide and conquer approach with a cluster of nodes Master node Maps input into smaller sub-problems and distributes the work to the cluster. A worker node may further map the work for a further cluster of nodes. The worker nodes then process the smaller problems, and return the answers back to the master node Master node then Reduces the set of answers into the answer to the original problem This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 17 License
  • 18. Introductions Opportunity Apache Hadoop Data Analytics Technology & in the Cloud Standards Challenges Questions Open Source implementation of the MapReduce algorithms Hadoop can store and process petabytes of data Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper Yahoo (more than 100,000 CPUs in >25,000 computers running Hadoop) and other companies make extensive use of Hadoop This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 18 License
  • 19. Introductions As-Is Hadoop Simplified Reference Data Analytics in the Cloud Opportunity Technology & Standards Architecture Challenges Questions Chukwa HBase Structured Data Apache Hadoop Unstructured Zookeeper Data Business ETL Pig Hive Intelligence This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 19 License
  • 20. Introductions Opportunity Apache Hadoop Sub-projects Data Analytics Technology & in the Cloud Standards Challenges Questions Hadoop Sub- Capabilities Example Companies projects Chukwa • Data collection system for monitoring and • Yahoo analyzing large distributed systems HBase • Similar to Google’s BigTable • Yahoo • Distributed database for structured data • Multi-dimensional sorted map Hive • Data warehouse infrastructure for large • Facebook datasets • Hive QL query language Pig • High-level language for data analysis • Yahoo • Compiler for Map-Reduce programs Zookeeper • Configuration, Naming, Distributed • Yahoo Synchronization, and group services This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 20 License
  • 21. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Challenges Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 21 License
  • 22. Introductions Opportunity To-Be Simplified Hadoop Architecture Data Analytics Technology & in the Cloud Standards Challenges Questions REST API HBase SOAP API Business Structured Intelligence Data Query Apache Hadoop Language Unstructured Pig Chukwa Zookeeper Data Hive Algorithm Library ETL This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 22 License
  • 23. Introductions Opportunity Key Challenges Data Analytics Technology & in the Cloud Standards Challenges Questions Hardware Speed of Rack Interconnects, Multi-core Infrastructure Parallelization Core platform, Data Analytic Components Node Affinity Make use of super nodes, XML i/o, en/de-crypt Cost “brutally efficient” pricing, FOSS advantages Adoption Cost Models Accurate, open models of CapEx, OpEx costs Migration Pain Full warehouse migration, ETL, Ease of Admin. Parallel current RDBMS, Warehouse admin Debugging Distributed debugging, integration w/ Provider Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration Interface, metadata optimized for ETL loading Input & Analysis Intuitive API’s Declarative & programmatic cross language Product Integration BI, Applications (SAP, Oracle Financial, Lawson) Data Visualization Viewing & drill down of very large data sets Output Intuitive API’s Declarative & programmatic cross language Mashups/Dynamics Easy discovery of data & functions & workflows This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 23 License
  • 24. Introductions Opportunity Solutions: Projected & In-Progress Data Analytics Technology & in the Cloud Standards Challenges Questions Hardware Interconnect $$ dropping, hardware maturing Infrastructure Parallelization Platforms advance, market for components Node Affinity Discovery of capability, affinity into Hadoop, … Cost FOSS’s game to loose, small diff * a lot = a lot Adoption Cost Models Industry standard ROI/IRR models for CC Migration Pain Migration toolkits for traditional DW products Ease of Admin. Integrated & extended admin packages Debugging Commercial distributed debugging Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration ETL interface, support of popular packages Input & Analysis Intuitive API’s SQL like interface in core, language bindings Product Integration 3rd party adaptors, IWay et al Data Visualization Modeling, meta-data, traceability, and new UI’s Output Intuitive API’s SQL like interface in core, language bindings Mashups/Dynamics Generic datatypes, discovery services This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 24 License
  • 25. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Questions Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 25 License
  • 26. Introductions Opportunity Question? & Contact Information Data Analytics Technology & in the Cloud Standards Challenges Questions Principle Architect / Partner Cloud Computing Architect Michael A. Sick Tom Plunkett 888.777.1847 888.777.1847 michael.sick@serenesoftware.com TomPlunkett@vt.edu Address Address Serene Software Serene Software 116 19th Ave. North, Suite 503 116 19th Ave. North, Suite 503 Jacksonville Beach, FL Jacksonville Beach, FL URL: www.serenesoftware.com URL: www.serenesoftware.com This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 26 License