SlideShare une entreprise Scribd logo
1  sur  31
Big Data Analytics
Strategy and Roadmap
Srinath Perera
Director, Research, WSO2
(srinath@wso2.com,
@srinath_perera)
•Once Upon a time, there lived a wise Boy
•The king being unhappy with the Boy, asked
him a “Big Data question”
•We had Big data problems though time,
although could not solve them
•Early examples
–Census at Egypt (3000 BC)
–Census at Egypt (AD 144) that counted 49.73
million
A day in your life
 Think about a day in your life?
–What is the best road to take?
–Would there be any bad weather?
–How to invest my money?
–How is my health?
 There are many decisions that you can
do better if only you can access the
data and process them.
http://www.flickr.com/photos/kcolwell/5
512461652/ CC licence
Data Avalanche (Moore’s law of data)
• We are now collecting and converting large amount of data to
digital forms
• 90% of the data in the world today was created within the past two
years.
• Amount of data we have doubles very fast
Internet of Things
•Currently physical world and
software worlds are detached
•Internet of things promises to
bridge this
– It is about sensors and actuators
everywhere
– In your fridge, in your blanket, in
your chair, in your carpet.. Yes
even in your socks
– Google IO pressure mats
What can we do with Big Data?
• Optimize
– 1% saving in Airplanes and turbines
can save more than 1B$ each year
(GE talk, Strata 2014). Sri Lanka’s
total export 9B year
• Save lives
– Weather, Disease identification,
Personalized treatment
• Technology advancement
– Most high tech work are done via
simulations
Big Data Reference Architecture
Why Big Data is hard?
• How to store? Assuming 1TB bytes it takes
1000 computers to store a 1PB
• How to move? Assuming 10Gb network, it takes
2 hours to copy 1TB, or 83 days to copy a 1PB
• How to search? Assuming each record is 1KB
and one machine can process 1000 records per
sec, it needs 277CPU days to process a 1TB
and 785 CPU years to process a 1 PB
• How to process?
– Convert algorithms to work in large size
– Create new algorithms http://www.susanica.com/photo/9
Big data Processing Technologies
Making Sense of Data
•To know what happened?
(hindsight + oversight)
– Basic analytics + visualizations
(min, max, average, histogram,
distribution)
– Interactive drill down
•To explain why?(Insight)
– Data mining, classifications,
building models, clustering
•To forecast (Foresight)
– Neural networks, decision models
New Developments
•Internet of things (IoT)
–Building a bridge between
software and real world.
•Lambda Architecture
–Merging realtime and batch
processing in a same model
•Machine Learning
–Next Generation decisions (e.g.
Deep Learning)
WSO2 Big Data Platform
Data Collection
• Can receive events via
SOAP, HTTP, JMS, ..
• WSO2 Events is highly
optimized version (400K
events TPS)
• Default Agents and you
can write custom
agents.
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher(
"tcp://localhost:7612", .. );
StreamDefinition definition =
new StreamDefinition(STREAM_NAME,
VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event);
Business Activity Monitor
Complex Event Processor
What is new?
CEP High Availability
ACM DEBS Grand Challenge 2014
• DEBS (Distributed Event Based Systems) is
a premier academic conference, which post
yearly event processing challenge
• Smart Home electricity data: 2000 sensors,
40 houses, 4 Billion events
• WSO2 CEP based solution is one of the four
finalists (Others Dresden University of
Technology and Fraunhofer Institute
(Germany), and Imperial College London)
• We posted fastest single node solution
measured (400K events/sec) and close to
one million distributed throughput.
Dashboard Wizard for BAM and CEP
•We have been asking you to write
bit of code to get visualizations up
•But we have now added a wizard,
that guide you though the process
– Think it as a “New Servlet” menu, you can
customize what it is generated.
•Already in latest CEP and BAM
•Currently only DBs as data
sources, and simple graphs, but
that will grow!
Lambda Architecture with WSO2 Products
What keeping
us busy?
Scaling Complex Event Processing
• “CEP vs. Stream Processing”
is like Hive vs. Hadoop.
Former let users write SQL like
queries without implementing
things from ground up
• However scaling is the main
challenge
• We have written a Siddhi bolt
for Storm. Now you can do
distributed processing by
connecting Siddhi bolts
together!
SiddhiBolt siddhiBolt1 = new
SiddhiBolt( .. siddhi queries ..);
SiddhiBolt siddhiBolt2 = new SiddhiBolt( ..
siddhi queries .. );
TopologyBuilder builder = new
TopologyBuilder();
builder.setSpout("source", new PlayStream(),
1);
builder.setBolt("node1", siddhiBolt1, 1)
.shuffleGrouping("source",
"PlayStream1");
..
builder.setBolt("LeafEacho",
new EchoBolt(), 1)
.shuffleGrouping("node1",
"LongAdvanceStream");
..
cluster.submitTopology("word-count", conf,
builder.createTopology());
CEP Query => Distributed Execution
• Extend Siddhi language to include parallel constructs
partitions, pipelines, distributed operators
• Compile queries to a Storm cluster running Siddhi bolts
• Assign each partition to a different node, and partition the
data accordingly
• Some scenarios need results rearranged.
define partition on Palyer.sid{
from Player#window(30s)select avg(v)as v insert into AvgSpeedByPlayer;
}
from AvgSpeedByPlayer avg(v) insert into AvgSpeed;
Scaling CEP
• Think like MapReduce! ask user to define partitions: parallel and
non parallel parts of computations.
• Each node as Storm bolt, communication and HA via storm
Machine Learning Team
•We are building a machine learning
team
•To give first class support for
machine learning within WSO2
platform, specially in Big Data
solutions
– Idea is to guide you though the process of
finding and applying the best model for you
dataset and scenario
•We will reuse best opensource tools
and create what is missing
Domain Toolboxes
•Time Series Toolbox
– Forecasts and outlier detection
with cycle support
•Fraud Detection
– Set of common fraud detection
pattern implementations pointing
out how you can extend them
•GIS support
– Operations: within, inside, touches
– Geo Fencing
– Tracking
– Integration with GIS databases
Conclusion
•Introduction to Big Data, why and how?
•WSO2 Big Data platform
•What is new in the platform?
•What keeps us busy?
•Interested
–All the software we discussed are Open source under
Apache License. Visit http://wso2.com/.
–Like to integrate with us, help, or join? Talk to us at Big
Data booth or architecture@wso2.org
Thank You

Contenu connexe

Tendances

Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionDeloitte United States
 
Digital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinseyDigital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinseyAurelien Domont, MBA
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Strategy
Data StrategyData Strategy
Data Strategysabnees
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsIronside
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
Digital Transformation Strategy Template and Training
Digital Transformation Strategy Template and TrainingDigital Transformation Strategy Template and Training
Digital Transformation Strategy Template and TrainingAurelien Domont, MBA
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Digital Transformation From Strategy To Implementation
Digital Transformation From Strategy To ImplementationDigital Transformation From Strategy To Implementation
Digital Transformation From Strategy To ImplementationScopernia
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Management Consulting Toolkit - Framework, Best Practices and Templates
Management Consulting Toolkit - Framework, Best Practices and TemplatesManagement Consulting Toolkit - Framework, Best Practices and Templates
Management Consulting Toolkit - Framework, Best Practices and TemplatesAurelien Domont, MBA
 

Tendances (20)

Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 
Digital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinseyDigital Transformation Strategy & Framework | By ex-McKinsey
Digital Transformation Strategy & Framework | By ex-McKinsey
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
 
8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy8 Steps to Creating a Data Strategy
8 Steps to Creating a Data Strategy
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Digital Transformation Strategy Template and Training
Digital Transformation Strategy Template and TrainingDigital Transformation Strategy Template and Training
Digital Transformation Strategy Template and Training
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Digital Transformation From Strategy To Implementation
Digital Transformation From Strategy To ImplementationDigital Transformation From Strategy To Implementation
Digital Transformation From Strategy To Implementation
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Management Consulting Toolkit - Framework, Best Practices and Templates
Management Consulting Toolkit - Framework, Best Practices and TemplatesManagement Consulting Toolkit - Framework, Best Practices and Templates
Management Consulting Toolkit - Framework, Best Practices and Templates
 

Similaire à Big Data Analytics Strategy and Roadmap

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studySharjeel Imtiaz
 

Similaire à Big Data Analytics Strategy and Roadmap (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data
Big dataBig data
Big data
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
Big Data
Big Data Big Data
Big Data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Big Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case studyBig Data with IOT approach and trends with case study
Big Data with IOT approach and trends with case study
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 

Plus de Srinath Perera

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingSrinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsSrinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsSrinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of BlockchainSrinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesSrinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata EraSrinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksSrinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeSrinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies TimelineSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglySrinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
 

Plus de Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

Dernier

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Big Data Analytics Strategy and Roadmap

  • 1. Big Data Analytics Strategy and Roadmap Srinath Perera Director, Research, WSO2 (srinath@wso2.com, @srinath_perera)
  • 2. •Once Upon a time, there lived a wise Boy •The king being unhappy with the Boy, asked him a “Big Data question” •We had Big data problems though time, although could not solve them •Early examples –Census at Egypt (3000 BC) –Census at Egypt (AD 144) that counted 49.73 million
  • 3. A day in your life  Think about a day in your life? –What is the best road to take? –Would there be any bad weather? –How to invest my money? –How is my health?  There are many decisions that you can do better if only you can access the data and process them. http://www.flickr.com/photos/kcolwell/5 512461652/ CC licence
  • 4.
  • 5. Data Avalanche (Moore’s law of data) • We are now collecting and converting large amount of data to digital forms • 90% of the data in the world today was created within the past two years. • Amount of data we have doubles very fast
  • 6. Internet of Things •Currently physical world and software worlds are detached •Internet of things promises to bridge this – It is about sensors and actuators everywhere – In your fridge, in your blanket, in your chair, in your carpet.. Yes even in your socks – Google IO pressure mats
  • 7. What can we do with Big Data? • Optimize – 1% saving in Airplanes and turbines can save more than 1B$ each year (GE talk, Strata 2014). Sri Lanka’s total export 9B year • Save lives – Weather, Disease identification, Personalized treatment • Technology advancement – Most high tech work are done via simulations
  • 8. Big Data Reference Architecture
  • 9. Why Big Data is hard? • How to store? Assuming 1TB bytes it takes 1000 computers to store a 1PB • How to move? Assuming 10Gb network, it takes 2 hours to copy 1TB, or 83 days to copy a 1PB • How to search? Assuming each record is 1KB and one machine can process 1000 records per sec, it needs 277CPU days to process a 1TB and 785 CPU years to process a 1 PB • How to process? – Convert algorithms to work in large size – Create new algorithms http://www.susanica.com/photo/9
  • 10. Big data Processing Technologies
  • 11. Making Sense of Data •To know what happened? (hindsight + oversight) – Basic analytics + visualizations (min, max, average, histogram, distribution) – Interactive drill down •To explain why?(Insight) – Data mining, classifications, building models, clustering •To forecast (Foresight) – Neural networks, decision models
  • 12. New Developments •Internet of things (IoT) –Building a bridge between software and real world. •Lambda Architecture –Merging realtime and batch processing in a same model •Machine Learning –Next Generation decisions (e.g. Deep Learning)
  • 13. WSO2 Big Data Platform
  • 14. Data Collection • Can receive events via SOAP, HTTP, JMS, .. • WSO2 Events is highly optimized version (400K events TPS) • Default Agents and you can write custom agents. Agent agent = new Agent(agentConfiguration); publisher = new AsyncDataPublisher( "tcp://localhost:7612", .. ); StreamDefinition definition = new StreamDefinition(STREAM_NAME, VERSION); definition.addPayloadData("sid", STRING); ... publisher.addStreamDefinition(definition); ... Event event = new Event(); event.setPayloadData(eventData); publisher.publish(STREAM_NAME, VERSION, event);
  • 18.
  • 20. ACM DEBS Grand Challenge 2014 • DEBS (Distributed Event Based Systems) is a premier academic conference, which post yearly event processing challenge • Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events • WSO2 CEP based solution is one of the four finalists (Others Dresden University of Technology and Fraunhofer Institute (Germany), and Imperial College London) • We posted fastest single node solution measured (400K events/sec) and close to one million distributed throughput.
  • 21. Dashboard Wizard for BAM and CEP •We have been asking you to write bit of code to get visualizations up •But we have now added a wizard, that guide you though the process – Think it as a “New Servlet” menu, you can customize what it is generated. •Already in latest CEP and BAM •Currently only DBs as data sources, and simple graphs, but that will grow!
  • 22. Lambda Architecture with WSO2 Products
  • 23.
  • 25. Scaling Complex Event Processing • “CEP vs. Stream Processing” is like Hive vs. Hadoop. Former let users write SQL like queries without implementing things from ground up • However scaling is the main challenge • We have written a Siddhi bolt for Storm. Now you can do distributed processing by connecting Siddhi bolts together! SiddhiBolt siddhiBolt1 = new SiddhiBolt( .. siddhi queries ..); SiddhiBolt siddhiBolt2 = new SiddhiBolt( .. siddhi queries .. ); TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("source", new PlayStream(), 1); builder.setBolt("node1", siddhiBolt1, 1) .shuffleGrouping("source", "PlayStream1"); .. builder.setBolt("LeafEacho", new EchoBolt(), 1) .shuffleGrouping("node1", "LongAdvanceStream"); .. cluster.submitTopology("word-count", conf, builder.createTopology());
  • 26. CEP Query => Distributed Execution • Extend Siddhi language to include parallel constructs partitions, pipelines, distributed operators • Compile queries to a Storm cluster running Siddhi bolts • Assign each partition to a different node, and partition the data accordingly • Some scenarios need results rearranged. define partition on Palyer.sid{ from Player#window(30s)select avg(v)as v insert into AvgSpeedByPlayer; } from AvgSpeedByPlayer avg(v) insert into AvgSpeed;
  • 27. Scaling CEP • Think like MapReduce! ask user to define partitions: parallel and non parallel parts of computations. • Each node as Storm bolt, communication and HA via storm
  • 28. Machine Learning Team •We are building a machine learning team •To give first class support for machine learning within WSO2 platform, specially in Big Data solutions – Idea is to guide you though the process of finding and applying the best model for you dataset and scenario •We will reuse best opensource tools and create what is missing
  • 29. Domain Toolboxes •Time Series Toolbox – Forecasts and outlier detection with cycle support •Fraud Detection – Set of common fraud detection pattern implementations pointing out how you can extend them •GIS support – Operations: within, inside, touches – Geo Fencing – Tracking – Integration with GIS databases
  • 30. Conclusion •Introduction to Big Data, why and how? •WSO2 Big Data platform •What is new in the platform? •What keeps us busy? •Interested –All the software we discussed are Open source under Apache License. Visit http://wso2.com/. –Like to integrate with us, help, or join? Talk to us at Big Data booth or architecture@wso2.org