SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
AGILE BIG DATA ANALYTICS DEVELOPMENT:
AN ARCHITECTURE-CENTRIC APPROACH
Prof. Hong-Mei Chen
IT Management, Shidler College of Business
University of Hawaii at Manoa, USA
Prof. Rick Kazman
University of Hawaii at Manoa
Software Engineering Institute, Carnegie Mellon University, USA
Serge Haziyev
SoftServe Inc.
Austin, TX, USA
HICSS49, Jan. 2016
OUTLINE
• Big Data Analytics
• An Architecture-Centric Approach
• Research Method
• Results: AABA Methodology
• Conclusions
2
Big Data: Big Hype
3
Gartner Hype Cycle 2015
Big Data Analytics: All the Rage
• Big data analytics is the process of
examining big data to uncover hidden
patterns, unknown correlations and
other useful info to provide “value”
• Big data analytics is predicted to be a
US$125 billion market
4
Challenges for Big Data Analytics
• Technical
 5V requirements for big data management
 building a solid infrastructure that orchestrates
technology components
• Organizational
 how data scientists team can work with
software engineers
 rapid delivery of actionable predictions
discovered from big data
5
Agile Analytics: Paradigm Shifts
6
Data Lake (Hortonworks)
7
Paradigm Shift
8
Research Questions
• RQ1: How should a big data system be
designed and developed to support
advanced analytics effectively?
• RQ2: How should agile processes be
adapted for big data analytics
development?
9
Architecture-Centric Approach
1) For big data system development
2) For agile analytics development
10
Architecture Design is critical and complex
in Big data System Development
I. Volume: Distributed and scalable architecture
II. Variety: Polyglot persistence architecture
III. Velocity: Complex Event processing +  Lambda
Architecture
IV. Veracity: Architecture design for understanding
the data sources and the cleanliness, validation
of each
V. Value: New architectures for
 hybrid, agile Analytics, big data analytics cloud
 Integrating the new and the Old (EDW, ETL)
VI. Integration: Integrating separate architectures
addressing each of the 5V challenges 11
Refined RQ1
• How to extend existing architectural
methods and integrate polyglot data
modeling, architecture design, and
technology orchestration techniques
in a single effective and efficient big
data design method
12
Architecture vs. Agile
• Architectural practices and agile
practices are actually well aligned,
but this position has not always
been universally accepted
13
Architecture vs. Agile (Continued)
• The creators of the Agile Manifesto
described 12 principles. 11 of these are fully
compatible with architectural practices.
• This principle is: “The best architectures,
requirements, and designs emerge from self-
organizing teams.”
• While this principle may have held true for
small and perhaps even medium-sized
projects, we are unaware of any cases
where it has been successful in large
projects, particularly those with complex
requirements and distributed development
such as big data analytics. 14
Architecture-centric Approach
To Agile Big Data Analytics
• Given rapid big data technology changes,
risks of doing too much too soon and
hence being locked into a solution that
will inevitably need to be modified, at
significant cost.
• An architecture-centric approach
facilitates future rapid iterations with less
disruption, as well as lower risk and cost,
by creating appropriate architecture
abstractions.
15
Refined RQ2
• How should agile principles be
combined with the architecture
design method to achieve effective
agile big data analytics?
16
AABA MethodologyArchitecture-centric Agile Big data Analytics
17
Research Method
• Case study research is deemed suitable:
 System development, be it big or small data, cannot
be separated from its organizational and business
contexts
 “How” research questions
• Collaborative Practice Research (CPR)
 A particular kind of case study
 a collaborative company, SSV, in the outsourcing
industry
 who has successfully deployed a number of big
data projects that can be triangulated as multiple
(embedded) case studies 18
Collaborative Practice Research (CPR)
Steps in an Iteration
1) Appreciate problem situation
2) Study literature
3) Develop framework
4) Evolve Method
5) Action
6) Evaluate experiences
7) Exit
8) Assess usefulness
9) Elicit research results
19
Collaborative Practice Research
(CPR)
Appreciate
problem
situation
Study
literature
Develop
framework
Evolve
Method
Action
Evaluate
experience
s
Exit
Assess
usefulness
Elicit
research
results
20
Appreciate
problem
situation
Study
literature
Develop
framework
Evolve
Method
Action
Evaluate
experience
s
Exit
Assess
usefulness
Elicit
research
results
Appreciate
problem
situation
Study
literature
Develop
framework
Evolve
Method
Action
Evaluate
experience
s
Exit
Assess
usefulness
Elicit
research
results
(Cases 1-4)
(Cases 3-6)
Cases 7-10
CPR Cycles
21
CASES 1-3
22
Case # Business goals Start Big data Technologies Challenges
1
Network Security,
Intrusion Prevention
US MNC IT corp.
(Employees > 320,000)
• Provide ability for
security analysts to
improve intrusion
detection techniques;
• Observe traffic
behavior and make
infrastructure
adjustments:
• Adjust company
security policies
• Improve system
performance
Late 2010, 8.5
month
 Machine generated data
- 7.5BLN event records
per day collected from IPS
devices
 Near real-time reporting
 Reports which “touch”
billions of rows should
generates < 1 min
•ETL - Talend
•Storage/DW – InfoBright
EE, HP Vertica
•OLAP – Pentaho Mondrian
•BI – JasperServer Pro
• High throughput, different
device data schemas
(versions)
• keep system performance
at required level when
supporting IP/geography
analysis: avoid join.
• Keep required
performance for complex
querying over billions rows
2
Anti-Spam Network
Security System
US MNC Networking
equipment corp.
employees > 74,000
 Validation of the new
developed set of anti-
spam rules against the
large training set of
known emails
 Detection of the best
anti-spam rules in terms
of performance and
efficacy
2012-2013 • 20K Anti-spam rules
• 5M email training set
• 100+ Nodes in Hadoop
Clusters
• Vanilla Apache Hadoop
(HDFS,MapReduce,Oozie,Zo
okeeper )
• Perl/Python
• SpamAssassin
• Perceptron
• MapReduce was written
on Python and Hadoop
Streaming was used. The
challenge was to optimize
jobs performance.
• Optimal Hadoop cluster
configuration for
maximizing performance
and minimize map-reduce
processing time
3
Online Coupon Web
Analytics Platform
US MNC: World’s
largest coupon site,
2014 Revenue >
US$200M
• In-house Web
Analytics Platform for
Conversion Funnel
Analysis, marketing
campaign optimization,
user behavior analytics
• clickstream analytics,
platform feature usage
analysis
2012,
Ongoing
• 500 million visits a year
• 25TB+ HP Vertica Data
Warehouse
• 50TB+ Hadoop Cluster
• Near-Real time analytics
(15 minutes is supported
for clickstream data)
• Data Lake - (Amazon EMR)
/Hive/Hue/MapReduce/Flu
me/Spark
• DW: HP Vertica, MySQL
• ETL/Data Integration –
custom using python
• BI: R, Mahout, Tableau
• Minimize transformation
time for semi-structured
data
• Data quality and
consistency
 complex data integration
 fast growing data
volumes,
 performance issues with
Hadoop Map/Reduce
(moving to Spark)
CASES 4-6
23
Case # Business goals Start Big data Technologies Challenges
4
Social Marketing
Analytical Platform
US MNC Internet
marketing (user
reviews)
‘14 Revenue > US$
48M
• Build in-house Analytics
Platform for ROI
measurement and
performance analysis of
every product and
feature delivered by the
e-commerce platform;
• Provide analysis on
how end-users are
interacting with service
content, products, and
features
2012,
ongoing
•Volume - 45 TB
• Sources - JSON
• Throughput - >
20K/sec
• Latency (1 hour – for
static/pre-defined
reports /real-time for
streaming data)
•Lambda architecture
• Amazon AWS, S3
• Apache Kafka, Storm
• Hadoop - CDH 5,
HDFS(raw data),
MapReduce), Cloudera
Manager, Oozie, Zookeper
• HBase (2 clusters: batch
views, streaming data)
• Hadoop upgrade – CDH 4 to
CDH 5
• Data integrity and data
quality
• Very high data throughput
caused a challenge with data
loss prevention (introduced
Apache Kafka as a solution)
• System performance for data
discovery (introduced Redshift
considering Spark)
• Constraints - public cloud,
multi-tenant
5
Cloud-based Mobile
App Development
Platform
US private Internet Co.
Funding > US$100M
• Provide visual
environment for building
custom mobile
applications
• Charge customers by
usage
• Analysis of platform
feature usage by end-
users and platform
optimization
2013, 8 month • Data Volume > 10 TB
• Sources: JSON
• Data Throughput >
10K/sec
• Analytics - self-
service, pre-defined
reports, ad-hoc
• Data Latency – 2 min
• Middleware: RabbitMQ,
Amazon SQS, Celery
• DB: Amazon Redshift,
RDS, S3
• Jaspersoft
• Elastic Beanstalk
• Integration: Python
• Aria Subscription Billing
Platform
• schema extensibility
• minimize TCO
• achieve high data
compression without significant
performance degradation was
quite challenging.
• technology selection:
performance benchmarks and
price comparison of Redshift vs
HPVertica vs Amazon RDS).
6
Telecom E-tailing
platform
Russian mobile phone
retailer
‘14 Revenue: 108B
rubles
• Build an OMNI-Channel
platform to improve
sales and operations
• analyze all enterprise
data from multiple
sources for real-time
recommendation and
sales
End of 2013,
(did only
discovery)
• Analytics on 90+ TB
(30+ TB structured, 60+
TB unstructured and
semi-structured data)
• Elasticity: through
SDE principles
• Hadoop (HDFS, Hive,
HBase)
• Cassandra
• HP Vertica/Teradata
• Microstrategy/Tableau
• Data Volume for real-time
analytics
• Data Variety: data science
over data in different formats
from multiple data sources
• Elasticity: private cloud,
Hadoop as a service with auto-
scale capabilities
CASES 7-10
24
Case # Business goals Start Big data Technologies Challenges
7
Social Relationship
Marketing Platform
US private Internet Co.
Funding > US$100M
• Build social relationship
platform that allows
enterprise brands and
organizations to
manage, monitor, and
measure their social
media programs
• Build an Analytics
module to analyze and
measure results.
2013 ongoing
(redesign 2009
system)
• > one billion social
connections across 84
countries
• 650 million pieces of
social content per day
• MySQL (~ 11 Tb)
Cassandra (~ 6Tb), ETL
(> 8Tb per day)
• Cassandra • MySQL
• Elasticsearch
• SaaS BI Platform -
GoodData
• Clover ETL, custom in
Java,
• PHP, Amazon
S3,Amazon SQS
• RabbitMQ
• Minimize data processing
time (ETL)
• Implement incremental
ETL, processing and
uploading only the latest
data.
8
Web Analytics &
Marketing
Optimization
US MNC IT consulting co.
(Employees > 430,000)
• Optimization of all
web, mobile, and social
channels
• Optimization of
recomm-endations for
each visitor
• High return on online
marketing investments
2014,
Ongoing
(Redesign 2006-
2010 system)
• Data Volume > 1 PB
• 5-10 GB per
customer/day
• Data sources –
clickstream data,
webserver logs
• Vanilla Apache
Hadoop
(HDFS,MapReduce,Oo
zie,Zookeeper )
•Hadoop/HBase
• Aster Data
• Oracle
•Java/Flex/JavaScript
• Hive performance for
analytics queries. Difficult
to support real-time
scenario for ad-hoc
queries.
• Data consistency between
two layers: raw data in
Hadoop and aggregated
data in relational DW
• Complex data
transformation jobs
9
Network Monitoring &
Management Platform
US OSS vendor
Revenue > US$ 22M
•Build tool to monitor
network availability,
performance, events
and configuration.
• Integrate data storage
and collection
processes with one
web-based user
interface.
•IT as a service
2014,
Ongoing
(Redesign 2006
system)
•collect data in large
datacenters (each:
gigabytes to terabytes)
•real-time data analysis
and monitoring (< 1
minute)
• types of devices:
hundreds
• MySQL
• RRDtool
• HBase
• Elasticsearch
• High memory consumption
of HBase when deployed in
a single server mode
10
Healthcare Insurance
Operation Intelligence
US health plan provider
Employees> 4,500
Revenue> US$10B
• Operation cost
optimization for 3.4
million members
• Track anomaly cases
(e.g. control schedule 1
and 2 drugs, refill
status control)
• Collaboration tool
between 65,000
providers.
2014, Phase 1: 8
months,
ongoing
• Velocity: 10K+ events
per second
• Complex Event
Processing - pattern
detection, enrichment,
projection,
aggregation, join
• High scalability, High-
availability , fault-
tolerance
• AWS VPC
• Apache Mesos,
Apache Marathon,
Chronus
• Cassandra
• Apache Storm
• ELK (Elasticsearch,
Logstash, Kibana)
• Netflix Exhibitor •Chef
• Technology selection
constraints by
HIPAA compliance:
SQS(selected) vs Kafka
• Chef Resource
optimization:
extending/fixing open
source frameworks
• 90% utilization ratio
• Constraints: AWS, HIPAA
RESULTS:
ANSWERING RQ1
25
ADD
 ADD (Attribute-Driven Design) is an architecture
design method "driven" by quality attribute
concerns
 Most Popular method in Industry
 Version 1.0 released 2000 by SEI
 Version 2.0 released Nov. 2006 (on Current SEI site)
 Version 2.5 published in 2013
 Version 3.0 to be published in 2016
 The method provides a detailed set of steps for
architecture design
 enables design to be performed in a systematic,
repeatable way
 leading to predictable outcomes
26
ADD 3.0: 2016
27
ADD 3.0
Focus of the iteration
- Architectural issues
- Architectural drivers
Selection of elements
Selection of design concept:
- Pattern / Tactic
- Reference architecture
- Deployment architecture
- Framework / technology
Use of driver fulfillment tables
Record design
decisions
28
BIG Data Design (BDD) Method
29
BDD (Big Data Design) Method
1. New Development Process
 Value discovery, innovation, experimental stages before design.
 Data-program independence undone
2. “Futuring”: big data scenario generation for innovation
 Eco-Arch method (Chen & Kazman, 2012).
3. Architecture design integrated with new big data
modeling techniques:
 Extended DFD, big data architecture template, transformation rules.
4. Extended architecture design method
 ADD 2.0 (by CMU SEI) to ADD 3.0, then to BDD.
5. Use of design concepts catalogues (reference architecture,
frameworks, platforms, architectural and deployment
patterns, tactics, data models) and a technology catalogue
with quality attributes ratings.
6. Adding architecture evaluation, BITAM (Business and IT
Alignment Model), for risk analysis and ensuring alignment
with business goals and innovation desires.
 BITAM (Chen et.al. 2005, 2010) extended ATAM. 30
AABA MethodologyArchitecture-centric Agile Big data Analytics
31
“Futuring”: big data scenario
generation for innovation
 Shift from “small” data to “big” data
thinking
 Tools for innovation thinking new
business models
 New process of enterprise-wide idea
creation
 Utilizing Eco-Arch method (Chen & Kazman
2012)
32
33
ECO-ARCH Method (Chen & Kazman, 2012)
34
ECO-ARCH Method (Chen & Kazman, 2012)
Big Data Architecture Design:
Data Element Template
1) Data sources: what are the data used in the scenario, where is it (are they) generated? Answer questions below for
each source.
2) Data source quality: is this data trustworthy? How accurate does it represent the real world element it represents?
Such as temperature taken?
3) Data content format: structured, semi-structured, unstructured? Specify subtypes.
4) Data velocity: what is the speed and frequency the data is generated/ingested?
5) Data volume and Frequency: What is the volume and frequency of data?
6) Data Time To Live (TTL): How long will the data live during processing?
7) Data storage : What is the volume and frequency of the data generated that need to be stored.
8) Data Life: how long should the data need to be kept in storage? (Historical storage/time series or legal requirements).
9) Data Access type: OLTP (transactional), OLAP (aggregates-based), OLCP (advanced analytics)
10) Data queries/reports by who: what questions are asked about the data by who? What reports (real time, minutes, days,
monthly?)
11) Access pattern: read-heavy, write-heavy, or balanced?
12) Data read/write frequency: how often is the data read, written?
13) Data response requirements: how fast of the data queries needs to respond?
14) Data consistency and availability requirements: ACID or BASE (strong, medium, weak)?
 A Scenario description includes the 6 elements: source, stimuli, environment,
artifacts, response, response metrics.
Sample
35
Technology Catalogue: -> 2016
36
Ratings on Quality Attribute
37
Sample
Architecture Evaluation: BITAM
(Business-IT Alignment Model)
38
1) Business Model: drivers, strategies,
revenue streams, investments,
constraints, regulations
2) Business Architecture: applications,
business processes, workflow, data flow,
organization, skills
3) IT Architecture: hardware, software,
networks, components, interfaces,
platforms, standards
(Chen, Kazman, & Garg, 2005)
RESULTS:
ANSWERING RQ2
39
CPR Cycles
40
Lessons Learned: AA -> AAA 1.0
1. Need to include Data Analysts/Scientists early.
2. Continuous architecture support is required
for big data analytics.
3. Architecture-supported agile “spikes” are
necessary to address rapid technology changes
and emerging requirements.
4. The use of reference architectures increases
architecture agility.
5. Feedback loops need to be open.
 technical feedback about quality attributes requirements,
such as performance, availability, and security;
 business feedback about emerging requirements e.g., the
business model might be changed, or new user-facing
features might be needed. 41
AAA 1.0  AAA 2.0
• AAA 1.0 started to connect to ADD 3.0, which
employs reference architectures as the first step
of architecture design. The creation and
utilization of the Design Concepts Catalog
reduced the cycle time for both spikes and main
development.
• Cases 3-4 were redeveloped applying the new
method, where automated testing and
deployment was essential to support rapid
cycle time.
• The lesson learned from this CPR cycle is that
automation must be supported by architecture
to be efficient and effective.
42
AAA 2.0: DevOps
• AAA 2.0 improved on AAA 1.0 by
focusing on architecture support for
continuous delivery, DevOps.
• Continuous deployment requires
architectural support in:
 deploying without requiring explicit
coordination among teams,
 allowing for different versions of the same
services to be simultaneously in production,
 Rolling back a deployment in the event of
errors; allowing for various forms of live
testing.
43
AAA 2.0 (Continued)
• While DevOps practices are not inherently tied to
architectural practices, if architects do not consider
DevOps as they design, build and evolve the
system, then critical activities such as continuous
build integration, automated test execution, and
operational support will be more challenging, more
error-prone, and less efficient.
 For example, a tightly coupled architecture can become a barrier to
continuous integration because small changes require a rebuild of the
entire system, which limits the number of builds possible in a day. To
fully automate testing the system needs to provide architectural
(system-wide) test capabilities such as interfaces to record, playback,
and control system state. To support high availability the system must be
self-monitoring, requiring architectural capabilities such as self-test,
ping/echo, heartbeat, monitor, hot spares, etc.
• For DevOps to be successful, an architectural approach
must be taken to ensure that system-wide requirements
are consistently realized.
44
Manual vs. DevOps
Activity Manual (min) Automated (min)
Build 60 2
Create Demo Environment 240 15
Smoke Testing 120 20
Regression Testing 480 40
Add new VM to cluster 30 5
45
*Hong-Mei Chen, Rick Kazman, Serge Haziyev, Valentyn Kropov and Dmitri
Chtchourov. “Architectural Support for DevOps in a Neo-Metropolis BDaaS
Platform,” The Second International Workshop on Dependability and Security
of System Operation (DSSO 2015), Montreal, Quebec, Canada, Sept 28, 2015.
AABA MethodologyArchitecture-centric Agile Big data Analytics
46
Architecture Spike
47
AABA Methodology
• AABA methodology, filling a methodological void,
addressed both the technical and organizational issues
of agile big data analytics development.
• It distinguishes itself from agile analytics through the
central role of software architecture as a key enabler
of agility.
• It integrates an architecture-centric big data design
method, BDD, and architecture-centric agile analytics
with architecture-supported DevOps, AAA.
• AABA provides a basis for reasoning about tradeoffs,
for value discovery with stakeholders, for planning and
estimating cost and schedule, for supporting
experimentation, and for supporting DevOps and
rapid, continuous delivery of “value.”
48
Architecture-centric Agile Big data Analytics
Conclusions
1. Existing agile analytics development methods have
no architecture support for big data analytics.
2. AABA was developed through 3 CPR cycles;
architecture agility, through integration of AAA
and BDD, has proven to be critical to the success of
the 10 SSV’s big data analytics projects.
3. Agile architecture practices in AABA, including
reference architecture, design concepts
catalogues, architecture spikes, etc. help to tame
project complexity, reducing uncertainty and
hence reducing project risk.
4. An architecture-centric approach to DevOps was
critical to achieving strategic control over
continuous value delivery.
49

Contenu connexe

Tendances

Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 

Tendances (20)

Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 

Similaire à Agile Big Data Analytics Development: An Architecture-Centric Approach

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackSnapLogic
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsExtraHop Networks
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 

Similaire à Agile Big Data Analytics Development: An Architecture-Centric Approach (20)

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 

Plus de SoftServe

Approaching Quality in Digital Era
Approaching Quality in Digital EraApproaching Quality in Digital Era
Approaching Quality in Digital EraSoftServe
 
Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product SecuritySoftServe
 
Testing Tools and Tips
Testing Tools and TipsTesting Tools and Tips
Testing Tools and TipsSoftServe
 
Android Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsAndroid Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsSoftServe
 
Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...SoftServe
 
How to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsHow to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsSoftServe
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution SoftServe
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS SoftServe
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowSoftServe
 
Using AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondUsing AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondSoftServe
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationSoftServe
 
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...SoftServe
 
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...SoftServe
 
Managing Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovManaging Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovSoftServe
 
How to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullyHow to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullySoftServe
 
Designing Big Data Systems Like a Pro
Designing Big Data Systems Like a ProDesigning Big Data Systems Like a Pro
Designing Big Data Systems Like a ProSoftServe
 
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukProduct Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukSoftServe
 
From Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym FedorovFrom Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym FedorovSoftServe
 

Plus de SoftServe (20)

Approaching Quality in Digital Era
Approaching Quality in Digital EraApproaching Quality in Digital Era
Approaching Quality in Digital Era
 
Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product Security
 
Testing Tools and Tips
Testing Tools and TipsTesting Tools and Tips
Testing Tools and Tips
 
Android Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, ToolsAndroid Mobile Application Testing: Human Interface Guideline, Tools
Android Mobile Application Testing: Human Interface Guideline, Tools
 
Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...Android Mobile Application Testing: Specific Functional, Performance, Device ...
Android Mobile Application Testing: Specific Functional, Performance, Device ...
 
How to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps SolutionsHow to Reduce Time to Market Using Microsoft DevOps Solutions
How to Reduce Time to Market Using Microsoft DevOps Solutions
 
Containerization: The DevOps Revolution
Containerization: The DevOps Revolution Containerization: The DevOps Revolution
Containerization: The DevOps Revolution
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS Rapid Prototyping for Big Data with AWS
Rapid Prototyping for Big Data with AWS
 
Implementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should KnowImplementing Test Automation: What a Manager Should Know
Implementing Test Automation: What a Manager Should Know
 
Using AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and BeyondUsing AWS Lambda for Infrastructure Automation and Beyond
Using AWS Lambda for Infrastructure Automation and Beyond
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for InnovationBig Data as a Service: A Neo-Metropolis Model Approach for Innovation
Big Data as a Service: A Neo-Metropolis Model Approach for Innovation
 
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
Personalized Medicine in a Contemporary World by Eugene Borukhovich, SVP Heal...
 
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
Health 2.0 WinterTech: Will Artificial Intelligence change healthcare? by Eug...
 
Managing Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max MarkovManaging Requirements with Word and TFS by Max Markov
Managing Requirements with Word and TFS by Max Markov
 
How to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions SuccessfullyHow to Implement Hybrid Cloud Solutions Successfully
How to Implement Hybrid Cloud Solutions Successfully
 
Designing Big Data Systems Like a Pro
Designing Big Data Systems Like a ProDesigning Big Data Systems Like a Pro
Designing Big Data Systems Like a Pro
 
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman PavlyukProduct Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
Product Management in Outsourcing by Roman Kolodchak and Roman Pavlyuk
 
From Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym FedorovFrom Sandbox to Production by Vadym Fedorov
From Sandbox to Production by Vadym Fedorov
 

Dernier

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Dernier (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Agile Big Data Analytics Development: An Architecture-Centric Approach

  • 1. AGILE BIG DATA ANALYTICS DEVELOPMENT: AN ARCHITECTURE-CENTRIC APPROACH Prof. Hong-Mei Chen IT Management, Shidler College of Business University of Hawaii at Manoa, USA Prof. Rick Kazman University of Hawaii at Manoa Software Engineering Institute, Carnegie Mellon University, USA Serge Haziyev SoftServe Inc. Austin, TX, USA HICSS49, Jan. 2016
  • 2. OUTLINE • Big Data Analytics • An Architecture-Centric Approach • Research Method • Results: AABA Methodology • Conclusions 2
  • 3. Big Data: Big Hype 3 Gartner Hype Cycle 2015
  • 4. Big Data Analytics: All the Rage • Big data analytics is the process of examining big data to uncover hidden patterns, unknown correlations and other useful info to provide “value” • Big data analytics is predicted to be a US$125 billion market 4
  • 5. Challenges for Big Data Analytics • Technical  5V requirements for big data management  building a solid infrastructure that orchestrates technology components • Organizational  how data scientists team can work with software engineers  rapid delivery of actionable predictions discovered from big data 5
  • 9. Research Questions • RQ1: How should a big data system be designed and developed to support advanced analytics effectively? • RQ2: How should agile processes be adapted for big data analytics development? 9
  • 10. Architecture-Centric Approach 1) For big data system development 2) For agile analytics development 10
  • 11. Architecture Design is critical and complex in Big data System Development I. Volume: Distributed and scalable architecture II. Variety: Polyglot persistence architecture III. Velocity: Complex Event processing +  Lambda Architecture IV. Veracity: Architecture design for understanding the data sources and the cleanliness, validation of each V. Value: New architectures for  hybrid, agile Analytics, big data analytics cloud  Integrating the new and the Old (EDW, ETL) VI. Integration: Integrating separate architectures addressing each of the 5V challenges 11
  • 12. Refined RQ1 • How to extend existing architectural methods and integrate polyglot data modeling, architecture design, and technology orchestration techniques in a single effective and efficient big data design method 12
  • 13. Architecture vs. Agile • Architectural practices and agile practices are actually well aligned, but this position has not always been universally accepted 13
  • 14. Architecture vs. Agile (Continued) • The creators of the Agile Manifesto described 12 principles. 11 of these are fully compatible with architectural practices. • This principle is: “The best architectures, requirements, and designs emerge from self- organizing teams.” • While this principle may have held true for small and perhaps even medium-sized projects, we are unaware of any cases where it has been successful in large projects, particularly those with complex requirements and distributed development such as big data analytics. 14
  • 15. Architecture-centric Approach To Agile Big Data Analytics • Given rapid big data technology changes, risks of doing too much too soon and hence being locked into a solution that will inevitably need to be modified, at significant cost. • An architecture-centric approach facilitates future rapid iterations with less disruption, as well as lower risk and cost, by creating appropriate architecture abstractions. 15
  • 16. Refined RQ2 • How should agile principles be combined with the architecture design method to achieve effective agile big data analytics? 16
  • 18. Research Method • Case study research is deemed suitable:  System development, be it big or small data, cannot be separated from its organizational and business contexts  “How” research questions • Collaborative Practice Research (CPR)  A particular kind of case study  a collaborative company, SSV, in the outsourcing industry  who has successfully deployed a number of big data projects that can be triangulated as multiple (embedded) case studies 18
  • 19. Collaborative Practice Research (CPR) Steps in an Iteration 1) Appreciate problem situation 2) Study literature 3) Develop framework 4) Evolve Method 5) Action 6) Evaluate experiences 7) Exit 8) Assess usefulness 9) Elicit research results 19
  • 22. CASES 1-3 22 Case # Business goals Start Big data Technologies Challenges 1 Network Security, Intrusion Prevention US MNC IT corp. (Employees > 320,000) • Provide ability for security analysts to improve intrusion detection techniques; • Observe traffic behavior and make infrastructure adjustments: • Adjust company security policies • Improve system performance Late 2010, 8.5 month  Machine generated data - 7.5BLN event records per day collected from IPS devices  Near real-time reporting  Reports which “touch” billions of rows should generates < 1 min •ETL - Talend •Storage/DW – InfoBright EE, HP Vertica •OLAP – Pentaho Mondrian •BI – JasperServer Pro • High throughput, different device data schemas (versions) • keep system performance at required level when supporting IP/geography analysis: avoid join. • Keep required performance for complex querying over billions rows 2 Anti-Spam Network Security System US MNC Networking equipment corp. employees > 74,000  Validation of the new developed set of anti- spam rules against the large training set of known emails  Detection of the best anti-spam rules in terms of performance and efficacy 2012-2013 • 20K Anti-spam rules • 5M email training set • 100+ Nodes in Hadoop Clusters • Vanilla Apache Hadoop (HDFS,MapReduce,Oozie,Zo okeeper ) • Perl/Python • SpamAssassin • Perceptron • MapReduce was written on Python and Hadoop Streaming was used. The challenge was to optimize jobs performance. • Optimal Hadoop cluster configuration for maximizing performance and minimize map-reduce processing time 3 Online Coupon Web Analytics Platform US MNC: World’s largest coupon site, 2014 Revenue > US$200M • In-house Web Analytics Platform for Conversion Funnel Analysis, marketing campaign optimization, user behavior analytics • clickstream analytics, platform feature usage analysis 2012, Ongoing • 500 million visits a year • 25TB+ HP Vertica Data Warehouse • 50TB+ Hadoop Cluster • Near-Real time analytics (15 minutes is supported for clickstream data) • Data Lake - (Amazon EMR) /Hive/Hue/MapReduce/Flu me/Spark • DW: HP Vertica, MySQL • ETL/Data Integration – custom using python • BI: R, Mahout, Tableau • Minimize transformation time for semi-structured data • Data quality and consistency  complex data integration  fast growing data volumes,  performance issues with Hadoop Map/Reduce (moving to Spark)
  • 23. CASES 4-6 23 Case # Business goals Start Big data Technologies Challenges 4 Social Marketing Analytical Platform US MNC Internet marketing (user reviews) ‘14 Revenue > US$ 48M • Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform; • Provide analysis on how end-users are interacting with service content, products, and features 2012, ongoing •Volume - 45 TB • Sources - JSON • Throughput - > 20K/sec • Latency (1 hour – for static/pre-defined reports /real-time for streaming data) •Lambda architecture • Amazon AWS, S3 • Apache Kafka, Storm • Hadoop - CDH 5, HDFS(raw data), MapReduce), Cloudera Manager, Oozie, Zookeper • HBase (2 clusters: batch views, streaming data) • Hadoop upgrade – CDH 4 to CDH 5 • Data integrity and data quality • Very high data throughput caused a challenge with data loss prevention (introduced Apache Kafka as a solution) • System performance for data discovery (introduced Redshift considering Spark) • Constraints - public cloud, multi-tenant 5 Cloud-based Mobile App Development Platform US private Internet Co. Funding > US$100M • Provide visual environment for building custom mobile applications • Charge customers by usage • Analysis of platform feature usage by end- users and platform optimization 2013, 8 month • Data Volume > 10 TB • Sources: JSON • Data Throughput > 10K/sec • Analytics - self- service, pre-defined reports, ad-hoc • Data Latency – 2 min • Middleware: RabbitMQ, Amazon SQS, Celery • DB: Amazon Redshift, RDS, S3 • Jaspersoft • Elastic Beanstalk • Integration: Python • Aria Subscription Billing Platform • schema extensibility • minimize TCO • achieve high data compression without significant performance degradation was quite challenging. • technology selection: performance benchmarks and price comparison of Redshift vs HPVertica vs Amazon RDS). 6 Telecom E-tailing platform Russian mobile phone retailer ‘14 Revenue: 108B rubles • Build an OMNI-Channel platform to improve sales and operations • analyze all enterprise data from multiple sources for real-time recommendation and sales End of 2013, (did only discovery) • Analytics on 90+ TB (30+ TB structured, 60+ TB unstructured and semi-structured data) • Elasticity: through SDE principles • Hadoop (HDFS, Hive, HBase) • Cassandra • HP Vertica/Teradata • Microstrategy/Tableau • Data Volume for real-time analytics • Data Variety: data science over data in different formats from multiple data sources • Elasticity: private cloud, Hadoop as a service with auto- scale capabilities
  • 24. CASES 7-10 24 Case # Business goals Start Big data Technologies Challenges 7 Social Relationship Marketing Platform US private Internet Co. Funding > US$100M • Build social relationship platform that allows enterprise brands and organizations to manage, monitor, and measure their social media programs • Build an Analytics module to analyze and measure results. 2013 ongoing (redesign 2009 system) • > one billion social connections across 84 countries • 650 million pieces of social content per day • MySQL (~ 11 Tb) Cassandra (~ 6Tb), ETL (> 8Tb per day) • Cassandra • MySQL • Elasticsearch • SaaS BI Platform - GoodData • Clover ETL, custom in Java, • PHP, Amazon S3,Amazon SQS • RabbitMQ • Minimize data processing time (ETL) • Implement incremental ETL, processing and uploading only the latest data. 8 Web Analytics & Marketing Optimization US MNC IT consulting co. (Employees > 430,000) • Optimization of all web, mobile, and social channels • Optimization of recomm-endations for each visitor • High return on online marketing investments 2014, Ongoing (Redesign 2006- 2010 system) • Data Volume > 1 PB • 5-10 GB per customer/day • Data sources – clickstream data, webserver logs • Vanilla Apache Hadoop (HDFS,MapReduce,Oo zie,Zookeeper ) •Hadoop/HBase • Aster Data • Oracle •Java/Flex/JavaScript • Hive performance for analytics queries. Difficult to support real-time scenario for ad-hoc queries. • Data consistency between two layers: raw data in Hadoop and aggregated data in relational DW • Complex data transformation jobs 9 Network Monitoring & Management Platform US OSS vendor Revenue > US$ 22M •Build tool to monitor network availability, performance, events and configuration. • Integrate data storage and collection processes with one web-based user interface. •IT as a service 2014, Ongoing (Redesign 2006 system) •collect data in large datacenters (each: gigabytes to terabytes) •real-time data analysis and monitoring (< 1 minute) • types of devices: hundreds • MySQL • RRDtool • HBase • Elasticsearch • High memory consumption of HBase when deployed in a single server mode 10 Healthcare Insurance Operation Intelligence US health plan provider Employees> 4,500 Revenue> US$10B • Operation cost optimization for 3.4 million members • Track anomaly cases (e.g. control schedule 1 and 2 drugs, refill status control) • Collaboration tool between 65,000 providers. 2014, Phase 1: 8 months, ongoing • Velocity: 10K+ events per second • Complex Event Processing - pattern detection, enrichment, projection, aggregation, join • High scalability, High- availability , fault- tolerance • AWS VPC • Apache Mesos, Apache Marathon, Chronus • Cassandra • Apache Storm • ELK (Elasticsearch, Logstash, Kibana) • Netflix Exhibitor •Chef • Technology selection constraints by HIPAA compliance: SQS(selected) vs Kafka • Chef Resource optimization: extending/fixing open source frameworks • 90% utilization ratio • Constraints: AWS, HIPAA
  • 26. ADD  ADD (Attribute-Driven Design) is an architecture design method "driven" by quality attribute concerns  Most Popular method in Industry  Version 1.0 released 2000 by SEI  Version 2.0 released Nov. 2006 (on Current SEI site)  Version 2.5 published in 2013  Version 3.0 to be published in 2016  The method provides a detailed set of steps for architecture design  enables design to be performed in a systematic, repeatable way  leading to predictable outcomes 26
  • 28. ADD 3.0 Focus of the iteration - Architectural issues - Architectural drivers Selection of elements Selection of design concept: - Pattern / Tactic - Reference architecture - Deployment architecture - Framework / technology Use of driver fulfillment tables Record design decisions 28
  • 29. BIG Data Design (BDD) Method 29
  • 30. BDD (Big Data Design) Method 1. New Development Process  Value discovery, innovation, experimental stages before design.  Data-program independence undone 2. “Futuring”: big data scenario generation for innovation  Eco-Arch method (Chen & Kazman, 2012). 3. Architecture design integrated with new big data modeling techniques:  Extended DFD, big data architecture template, transformation rules. 4. Extended architecture design method  ADD 2.0 (by CMU SEI) to ADD 3.0, then to BDD. 5. Use of design concepts catalogues (reference architecture, frameworks, platforms, architectural and deployment patterns, tactics, data models) and a technology catalogue with quality attributes ratings. 6. Adding architecture evaluation, BITAM (Business and IT Alignment Model), for risk analysis and ensuring alignment with business goals and innovation desires.  BITAM (Chen et.al. 2005, 2010) extended ATAM. 30
  • 32. “Futuring”: big data scenario generation for innovation  Shift from “small” data to “big” data thinking  Tools for innovation thinking new business models  New process of enterprise-wide idea creation  Utilizing Eco-Arch method (Chen & Kazman 2012) 32
  • 33. 33 ECO-ARCH Method (Chen & Kazman, 2012)
  • 34. 34 ECO-ARCH Method (Chen & Kazman, 2012)
  • 35. Big Data Architecture Design: Data Element Template 1) Data sources: what are the data used in the scenario, where is it (are they) generated? Answer questions below for each source. 2) Data source quality: is this data trustworthy? How accurate does it represent the real world element it represents? Such as temperature taken? 3) Data content format: structured, semi-structured, unstructured? Specify subtypes. 4) Data velocity: what is the speed and frequency the data is generated/ingested? 5) Data volume and Frequency: What is the volume and frequency of data? 6) Data Time To Live (TTL): How long will the data live during processing? 7) Data storage : What is the volume and frequency of the data generated that need to be stored. 8) Data Life: how long should the data need to be kept in storage? (Historical storage/time series or legal requirements). 9) Data Access type: OLTP (transactional), OLAP (aggregates-based), OLCP (advanced analytics) 10) Data queries/reports by who: what questions are asked about the data by who? What reports (real time, minutes, days, monthly?) 11) Access pattern: read-heavy, write-heavy, or balanced? 12) Data read/write frequency: how often is the data read, written? 13) Data response requirements: how fast of the data queries needs to respond? 14) Data consistency and availability requirements: ACID or BASE (strong, medium, weak)?  A Scenario description includes the 6 elements: source, stimuli, environment, artifacts, response, response metrics. Sample 35
  • 37. Ratings on Quality Attribute 37 Sample
  • 38. Architecture Evaluation: BITAM (Business-IT Alignment Model) 38 1) Business Model: drivers, strategies, revenue streams, investments, constraints, regulations 2) Business Architecture: applications, business processes, workflow, data flow, organization, skills 3) IT Architecture: hardware, software, networks, components, interfaces, platforms, standards (Chen, Kazman, & Garg, 2005)
  • 41. Lessons Learned: AA -> AAA 1.0 1. Need to include Data Analysts/Scientists early. 2. Continuous architecture support is required for big data analytics. 3. Architecture-supported agile “spikes” are necessary to address rapid technology changes and emerging requirements. 4. The use of reference architectures increases architecture agility. 5. Feedback loops need to be open.  technical feedback about quality attributes requirements, such as performance, availability, and security;  business feedback about emerging requirements e.g., the business model might be changed, or new user-facing features might be needed. 41
  • 42. AAA 1.0  AAA 2.0 • AAA 1.0 started to connect to ADD 3.0, which employs reference architectures as the first step of architecture design. The creation and utilization of the Design Concepts Catalog reduced the cycle time for both spikes and main development. • Cases 3-4 were redeveloped applying the new method, where automated testing and deployment was essential to support rapid cycle time. • The lesson learned from this CPR cycle is that automation must be supported by architecture to be efficient and effective. 42
  • 43. AAA 2.0: DevOps • AAA 2.0 improved on AAA 1.0 by focusing on architecture support for continuous delivery, DevOps. • Continuous deployment requires architectural support in:  deploying without requiring explicit coordination among teams,  allowing for different versions of the same services to be simultaneously in production,  Rolling back a deployment in the event of errors; allowing for various forms of live testing. 43
  • 44. AAA 2.0 (Continued) • While DevOps practices are not inherently tied to architectural practices, if architects do not consider DevOps as they design, build and evolve the system, then critical activities such as continuous build integration, automated test execution, and operational support will be more challenging, more error-prone, and less efficient.  For example, a tightly coupled architecture can become a barrier to continuous integration because small changes require a rebuild of the entire system, which limits the number of builds possible in a day. To fully automate testing the system needs to provide architectural (system-wide) test capabilities such as interfaces to record, playback, and control system state. To support high availability the system must be self-monitoring, requiring architectural capabilities such as self-test, ping/echo, heartbeat, monitor, hot spares, etc. • For DevOps to be successful, an architectural approach must be taken to ensure that system-wide requirements are consistently realized. 44
  • 45. Manual vs. DevOps Activity Manual (min) Automated (min) Build 60 2 Create Demo Environment 240 15 Smoke Testing 120 20 Regression Testing 480 40 Add new VM to cluster 30 5 45 *Hong-Mei Chen, Rick Kazman, Serge Haziyev, Valentyn Kropov and Dmitri Chtchourov. “Architectural Support for DevOps in a Neo-Metropolis BDaaS Platform,” The Second International Workshop on Dependability and Security of System Operation (DSSO 2015), Montreal, Quebec, Canada, Sept 28, 2015.
  • 48. AABA Methodology • AABA methodology, filling a methodological void, addressed both the technical and organizational issues of agile big data analytics development. • It distinguishes itself from agile analytics through the central role of software architecture as a key enabler of agility. • It integrates an architecture-centric big data design method, BDD, and architecture-centric agile analytics with architecture-supported DevOps, AAA. • AABA provides a basis for reasoning about tradeoffs, for value discovery with stakeholders, for planning and estimating cost and schedule, for supporting experimentation, and for supporting DevOps and rapid, continuous delivery of “value.” 48 Architecture-centric Agile Big data Analytics
  • 49. Conclusions 1. Existing agile analytics development methods have no architecture support for big data analytics. 2. AABA was developed through 3 CPR cycles; architecture agility, through integration of AAA and BDD, has proven to be critical to the success of the 10 SSV’s big data analytics projects. 3. Agile architecture practices in AABA, including reference architecture, design concepts catalogues, architecture spikes, etc. help to tame project complexity, reducing uncertainty and hence reducing project risk. 4. An architecture-centric approach to DevOps was critical to achieving strategic control over continuous value delivery. 49