SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Learn	
  with	
  WSO2	
  -­‐	
  Building	
  
your	
  Big	
  Data	
  Solu8on	
  
	
  Srinath	
  Perera	
  
Director	
  of	
  Research	
  
WSO2	
  Inc.	
  	
  
About WSO2
•  Providing the only complete open source componentized
cloud platform
–  Dedicated to removing all the stumbling blocks to enterprise agility
–  Enabling you to focus on business logic and business value
•  Recognized by leading analyst firms as visionaries and
leaders
–  Gartner cites WSO2 as visionaries in all 3 categories of
application infrastructure
–  Forrester places WSO2 in top 2 for API Management
•  Global corporation with offices in USA, UK & Sri Lanka
–  200+ employees and growing
•  Business model of selling comprehensive support &
maintenance for our products
150+ globally positioned support customers
Consider	
  a	
  day	
  in	
  your	
  life	
  
•  What	
  is	
  the	
  best	
  road	
  to	
  take?	
  
•  Would	
  there	
  be	
  any	
  bad	
  
weather?	
  
•  What	
  is	
  the	
  best	
  way	
  to	
  invest	
  
the	
  money?	
  
•  Should	
  I	
  take	
  that	
  loan?	
  
•  Can	
  I	
  op8mize	
  my	
  day?	
  
•  Is	
  there	
  a	
  way	
  to	
  do	
  this	
  
faster?	
  
•  What	
  have	
  others	
  done	
  in	
  
similar	
  cases?	
  
•  Which	
  product	
  should	
  I	
  buy?	
  	
  
	
  
People	
  wanted	
  to	
  (through	
  ages)	
  
•  To	
  know	
  (what	
  
happened?)	
  
•  To	
  Explain	
  (why	
  it	
  
happened)	
  
•  To	
  Predict	
  (what	
  will	
  
happen?)	
  
	
  
What	
  is	
  Big	
  data?	
  
•  There	
  is	
  lot	
  of	
  data	
  available	
  
–  E.g.	
  Internet	
  of	
  things	
  	
  
•  We	
  have	
  compu8ng	
  power	
  	
  
•  We	
  have	
  technology	
  	
  
•  Goal	
  is	
  same	
  
–  To	
  know	
  
–  To	
  Explain	
  	
  
–  To	
  predict	
  	
  
•  Challenge	
  is	
  the	
  full	
  lifecycle	
  	
  
Drivers	
  of	
  Big	
  Data	
  
Data	
  Avalanche/	
  Moore’s	
  law	
  of	
  data	
  
•  We	
  are	
  now	
  collec8ng	
  and	
  conver8ng	
  large	
  amount	
  
of	
  data	
  to	
  digital	
  forms	
  	
  
•  90%	
  of	
  the	
  data	
  in	
  the	
  world	
  today	
  was	
  created	
  
within	
  the	
  past	
  two	
  years.	
  	
  
•  Amount	
  of	
  data	
  we	
  have	
  doubles	
  very	
  fast	
  
In	
  real	
  life,	
  most	
  data	
  are	
  Big	
  
•  Web	
  does	
  millions	
  of	
  ac8vi8es	
  per	
  second,	
  and	
  so	
  
much	
  server	
  logs	
  are	
  created.	
  	
  	
  
•  Social	
  networks	
  e.g.	
  Facebook,	
  800	
  Million	
  ac8ve	
  
users,	
  40	
  billion	
  photos	
  from	
  its	
  user	
  base.	
  
•  There	
  are	
  >4	
  billion	
  phones	
  and	
  >25%	
  are	
  smart	
  
phones.	
  There	
  are	
  billions	
  of	
  RFID	
  tags.	
  	
  
•  Observa8onal	
  and	
  Sensor	
  data	
  
–  Weather	
  Radars,	
  Balloons	
  	
  
–  Environmental	
  Sensors	
  	
  
–  Telescopes	
  	
  
–  Complex	
  physics	
  simula8ons	
  
Why	
  Big	
  Data	
  is	
  hard?	
  
•  How	
  store?	
  Assuming	
  1TB	
  bytes	
  it	
  
takes	
  1000	
  computers	
  to	
  store	
  a	
  1PB	
  	
  
•  How	
  to	
  move?	
  Assuming	
  10Gb	
  
network,	
  it	
  takes	
  2	
  hours	
  to	
  copy	
  1TB,	
  
or	
  83	
  days	
  to	
  copy	
  a	
  1PB	
  	
  
•  How	
  to	
  search?	
  Assuming	
  each	
  record	
  
is	
  1KB	
  and	
  one	
  machine	
  can	
  process	
  
1000	
  records	
  per	
  sec,	
  it	
  needs	
  277CPU	
  
days	
  to	
  process	
  a	
  1TB	
  and	
  785	
  CPU	
  
years	
  to	
  process	
  a	
  1	
  PB	
  
•  How	
  to	
  process?	
  	
  
–  How	
  to	
  convert	
  algorithms	
  to	
  work	
  in	
  
large	
  size	
  
–  How	
  to	
  create	
  new	
  algorithms	
  
hap://www.susanica.com/photo/9	
  
Why	
  it	
  is	
  hard	
  (Contd.)?	
  
•  System	
  build	
  of	
  many	
  
computers	
  	
  
•  That	
  handles	
  lots	
  of	
  data	
  
•  Running	
  complex	
  logic	
  	
  
•  This	
  pushes	
  us	
  to	
  fron8er	
  of	
  
Distributed	
  Systems	
  and	
  
Databases	
  	
  
•  More	
  data	
  does	
  not	
  mean	
  
there	
  is	
  a	
  simple	
  model	
  	
  
•  Some	
  models	
  can	
  be	
  complex	
  
as	
  the	
  system	
  
hap://www.flickr.com/photos/mariachily/5250487136,	
  
	
  Licensed	
  CC	
  
Big	
  Data	
  Architecture	
  
WSO2	
  Offerings	
  
•  Two	
  tools	
  	
  
– WSO2	
  BAM	
  for	
  store	
  and	
  process	
  	
  
– WSO2	
  CEP	
  for	
  real8me	
  processing	
  
•  These	
  tools	
  covers	
  whole	
  processing	
  life	
  cycle	
  
for	
  your	
  Big	
  Data	
  with	
  help	
  of	
  few	
  other	
  
products	
  as	
  needed.	
  	
  
– WSO2	
  Storage	
  server	
  
– WSO2	
  User	
  Experience	
  Server	
  	
  
Big	
  Data	
  Architecture	
  Implementa8on	
  
Sensors	
  
•  Built	
  sensors	
  in	
  WSO2	
  
Products	
  
•  Event	
  logs	
  	
  
–  Click	
  streams,	
  Emails,	
  chat,	
  
search,	
  tweets	
  ,Transac8ons	
  …	
  
•  Custom	
  Sensors	
  	
  
–  Video	
  surveillance,	
  Cash	
  flows,	
  
Traffic,	
  Surveillance,	
  Smart	
  Grid,	
  
Produc8on	
  line,	
  RFID	
  (e.g.	
  
Walmart),	
  GPS	
  sensors,	
  Mobile	
  
Phone,	
  Internet	
  of	
  Things	
  	
  
	
  
hap://www.flickr.com/photos/imuaoo/4257813689/	
  by	
  Ian	
  Muaoo,	
  
hap://www.flickr.com/photos/eastcapital/4554220770/,	
  hap://www.flickr.com/
photos/patdavid/4619331472/	
  by	
  Pat	
  David	
  copyright	
  CC	
  
Collec8ng	
  Data	
  
•  Data	
  collected	
  at	
  sensors	
  and	
  sent	
  to	
  big	
  data	
  
system	
  via	
  events	
  or	
  flat	
  files	
  
•  Event	
  Streams:	
  we	
  name	
  the	
  events	
  by	
  its	
  
content/	
  originator	
  	
  
•  Get	
  data	
  through	
  	
  
– Point	
  to	
  Point	
  
– Event	
  Bus	
  
•  E.g.	
  Data	
  bridge	
  
– a	
  thrij	
  based	
  transport	
  we	
  
did	
  that	
  do	
  about	
  400k	
  
events/	
  sec	
  
Storing	
  Data	
  
•  Historically	
  we	
  used	
  databases	
  
–  Scale	
  is	
  a	
  challenge:	
  replica8on,	
  
sharding	
  	
  
•  Scalable	
  op8ons	
  	
  	
  
–  NoSQL	
  (Cassandra,	
  Hbase)	
  [If	
  
data	
  is	
  structured]	
  
•  Column	
  families	
  Gaining	
  Ground	
  
–  Distributed	
  file	
  systems	
  (e.g.	
  
HDFS)	
  [If	
  data	
  is	
  unstructured]	
  
•  New	
  SQL	
  
–  In	
  Memory	
  compu8ng,	
  VoltDB	
  	
  
•  Specialized	
  data	
  structures	
  
–  Graph	
  Databases,	
  Data	
  structure	
  
servers	
  	
  	
   hap://www.flickr.com/photos/keso/
363133967/	
  
Storing	
  Data	
  (Contd.)	
  
•  WSO2	
  Offerings	
  (WSO2	
  Storage	
  Server)	
  
– Small	
  Structured	
  Data:	
  	
  keep	
  in	
  rela8onal	
  
databases.	
  	
  
– Large	
  structured	
  data	
  :	
  Cassandra	
  
– Large	
  unstructured	
  data:	
  HDFS	
  
Making	
  Sense	
  of	
  Data	
  
•  To	
  know	
  (what	
  happened?)	
  
–  Basic	
  analy8cs	
  +	
  
visualiza8ons	
  (min,	
  max,	
  
average,	
  histogram,	
  
distribu8ons	
  …	
  )	
  
–  Interac8ve	
  drill	
  down	
  
•  To	
  explain	
  (why)	
  
–  Data	
  mining,	
  classifica8ons,	
  
building	
  models,	
  clustering	
  	
  	
  	
  
•  To	
  forecast	
  	
  
–  Neural	
  networks,	
  decision	
  
models	
  	
  
Making	
  Sense	
  of	
  Data	
  (Contd.)	
  
•  Batch	
  processing	
  –	
  WSO2	
  BAM	
  
– Hive	
  Scripts	
  	
  
– Map	
  Reduce	
  Jobs	
  	
  
•  Real	
  8me	
  processing	
  –	
  CEP	
  	
  
– Event	
  Query	
  Language	
  	
  
•  Above	
  two	
  are	
  the	
  plarorm,	
  you	
  need	
  to	
  
program	
  your	
  usecase.	
  	
  
To	
  know	
  (what	
  happened?)	
  
•  Mainly	
  Analy8cs	
  
–  Min,	
  Max,	
  average,	
  
correla8on,	
  histograms	
  	
  
–  Might	
  join	
  group	
  data	
  in	
  
many	
  ways	
  	
  
•  Implemented	
  with	
  
MapReduce	
  or	
  Queries	
  	
  
•  Data	
  is	
  ojen	
  presented	
  with	
  
some	
  visualiza8ons	
  
•  Examples	
  
–  	
  forensics	
  	
  
–  Assessments	
  
–  Historical	
  data/	
  reports/	
  
trends	
  	
  	
  
hap://www.flickr.com/photos/isriya/
2967310333/	
  
To	
  Explain	
  (Paaerns)	
  
•  Correla8on	
  
–  Scaaer	
  plot,	
  sta8s8cal	
  
correla8on	
  
•  Data	
  Mining	
  (Detec8ng	
  
Paaerns)	
  
–  Clustering	
  and	
  classifica8on	
  	
  
–  Finding	
  Similar	
  items	
  	
  
–  Finding	
  Hubs	
  and	
  authori8es	
  
in	
  a	
  Graph	
  	
  
–  Finding	
  frequent	
  item	
  sets	
  
–  Making	
  recommenda8on	
  	
  
•  Apache	
  Mahout	
  	
  
hap://www.flickr.com/photos/eriwst/2987739376/	
  and	
  hap://www.flickr.com/photos/focx/5035444779/	
  	
  	
  	
  
To	
  Predict:	
  Forecasts	
  and	
  Models	
  
•  Trying	
  to	
  build	
  a	
  model	
  for	
  the	
  
data	
  
•  Theore8cally	
  or	
  empirically	
  	
  
–  Analy8cal	
  models	
  (e.g.	
  Physics)	
  
–  Neural	
  networks	
  	
  
–  Reinforcement	
  learning	
  	
  
–  Unsupervised	
  learning	
  (clustering,	
  
dimensionality	
  reduc8on,	
  kernel	
  
methods)	
  
•  Examples	
  	
  
–  Transla8on	
  	
  
–  Weather	
  Forecast	
  models	
  	
  
–  Building	
  profiles	
  of	
  users	
  	
  
–  Traffic	
  models	
  
–  Economic	
  models	
  	
  
•  Lot	
  of	
  domain	
  specific	
  work	
  
	
  
hap://misterbijou.blogspot.com/
2010_09_01_archive.html	
  
Informa8on	
  Visualiza8on	
  
•  Presen8ng	
  informa8on	
  	
  
–  To	
  end	
  user	
  	
  
–  To	
  decision	
  takers	
  	
  
–  To	
  scien8st	
  	
  
•  Interac8ve	
  explora8on	
  
•  Sending	
  alerts	
  	
  	
  
•  WSO2	
  UES	
  	
  
–  Jaggery	
  based	
  	
  
•  BAM/	
  CEP	
  can	
  Work	
  with	
  
most	
  other	
  UI	
  tools	
  
hap://www.flickr.com/photos/
stevefaeembra/3604686097/	
  
WSO2	
  UES	
  
•  Dashboards,	
  and	
  Store	
  
•  Build	
  your	
  own	
  Uis	
  with	
  
Jaggery	
  	
  
MapReduce/	
  Hadoop	
  
•  First	
  introduced	
  by	
  Google,	
  
and	
  used	
  as	
  the	
  processing	
  
model	
  for	
  their	
  architecture	
  	
  
•  Implemented	
  by	
  opensource	
  
projects	
  like	
  Apache	
  Hadoop	
  
and	
  Spark	
  	
  
•  Users	
  writes	
  two	
  func8ons:	
  
map	
  and	
  reduce	
  	
  
•  The	
  framework	
  handles	
  the	
  
details	
  like	
  distributed	
  
processing,	
  fault	
  tolerance,	
  
load	
  balancing	
  etc.	
  	
  
•  Widely	
  used,	
  and	
  the	
  one	
  of	
  
the	
  catalyst	
  of	
  Big	
  data	
  
void map(ctx, k, v){
tokens = v.split();
for t in tokens
ctx.emit(t,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}
MapReduce	
  (Contd.)	
  
Data	
  In	
  the	
  Move	
  
•  Idea	
  is	
  to	
  process	
  data	
  as	
  they	
  
are	
  received	
  in	
  streaming	
  
fashion	
  	
  
•  Used	
  when	
  we	
  need	
  	
  
–  Very	
  fast	
  output	
  	
  
–  Lots	
  of	
  events	
  (few	
  100k	
  to	
  
millions)	
  
–  Processing	
  without	
  storing	
  (e.g.	
  
too	
  much	
  data)	
  
•  Two	
  main	
  technologies	
  
–  Stream	
  Processing	
  (e.g.	
  Strom,	
  
hap://storm-­‐project.net/	
  )	
  
–  Complex	
  Event	
  Processing	
  (CEP)	
  
hap://wso2.com/products/
complex-­‐event-­‐processor/	
  	
  
Complex	
  Event	
  Processing	
  (CEP)	
  
•  Sees	
  inputs	
  as	
  Event	
  streams	
  and	
  queried	
  with	
  
SQL	
  like	
  language	
  	
  
•  Supports	
  Filters,	
  Windows,	
  Join,	
  Paaerns	
  and	
  
Sequences	
  	
  
from p=PINChangeEvents#win.time(3600) join
t=TransactionEvents[p.custid=custid][amount>10000]
#win.time(3600)
return t.custid, t.amount;
Summary	
  	
  
Case	
  Study	
  1:	
  Tracing	
  Business	
  Process	
  
•  Business	
  process	
  is	
  built	
  using	
  many	
  services	
  
•  Track	
  trace	
  each	
  
step,	
  and	
  analyze	
  
to	
  understand	
  
how	
  to	
  op8mize	
  	
  
•  E.g.	
  sales	
  pipeline	
  	
  
Some	
  Queries	
  
•  Conversion	
  rate?	
  
•  How	
  many	
  deals	
  in	
  pipeline	
  at	
  each	
  month?	
  
•  Average	
  size	
  of	
  the	
  deals?	
  	
  
•  Average	
  8me	
  deal	
  takes?	
  
•  Can	
  we	
  guess	
  an	
  large	
  size	
  deals	
  early?	
  	
  
•  Which	
  is	
  beaer?	
  Going	
  for	
  few	
  large	
  ones	
  or	
  
many	
  small	
  ones?	
  	
  
•  Was	
  there	
  any	
  delays	
  from	
  Ourside?	
  
Hive:	
  Average	
  Size	
  of	
  the	
  Deal	
  
•  Hive	
  uses	
  an	
  SQL	
  like	
  synatax.	
  	
  
•  Easy	
  to	
  understand	
  and	
  learn	
  	
  
hive> LOAD DATA ..
hive> SELECT avg(value) from LEAD_ACTIVITY
WHERE action=“closedWon” groupby month;
Map	
  Reduce:	
  How	
  many	
  deals	
  in	
  
Pipeline?	
  
How	
  many	
  deals	
  in	
  Pipeline?(Contd.)	
  
void map(ctx, k, v){
Deals deal= parse(v);
int month = getMonth(deal.time);
ctx.emit(month,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}
Case	
  study	
  2:	
  DEBS	
  Challenge	
  
•  Event	
  Processing	
  
challenge	
  	
  
•  Real	
  football	
  game,	
  
sensors	
  in	
  player	
  
shoes	
  +	
  ball	
  	
  
•  Events	
  in	
  15k	
  Hz	
  	
  
•  Event	
  format	
  	
  
–  Sensor	
  ID,	
  TS,	
  x,	
  y,	
  z,	
  v,	
  
a	
  
•  Queries	
  
–  Running	
  Stats	
  
–  Ball	
  Possession	
  
–  Heat	
  Map	
  of	
  Ac8vity	
  	
  
–  Shots	
  at	
  Goal	
  	
  
Example:	
  Detect	
  ball	
  Possession	
  	
  
•  Possession	
  is	
  8me	
  a	
  
player	
  hit	
  the	
  ball	
  
un8l	
  someone	
  else	
  
hits	
  it	
  or	
  it	
  goes	
  out	
  
of	
  the	
  ground	
  
from Ball#window.length(1) as b join
Players#window.length(1) as p
unidirectional
on debs: getDistance(b.x,b.y,b.z,
p.x, p.y, p.z) < 1000
and b.a > 55
select ...
insert into hitStream
from old = hitStream ,
b = hitStream [old. pid != pid ],
n= hitStream[b.pid == pid]*,
( e1 = hitStream[b.pid != pid ]
or e2= ballLeavingHitStream)
select ...
insert into BallPossessionStream
hap://www.flickr.com/photos/glennharper/146164820/	
  
Conclusions	
  
•  What	
  is	
  Big	
  Data?	
  	
  
•  Big	
  Data	
  Architecture	
  	
  
– Collec8ng	
  data	
  
– Storing	
  data	
  
– Processing	
  Data	
  
•  WSO2	
  Offerings	
  
•  Case	
  Studies	
  	
  
Ques%ons?	
  
Engage with WSO2
•  Helping you get the most out of your deployments
•  From project evaluation and inception to development
and going into production, WSO2 is your partner in
ensuring 100% project success
Building Your Big Data Solution with WSO2

Contenu connexe

Tendances

Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
SplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunk
 
FP&A with Spreadsheets and Spark with Oscar Castaneda-Villagran
FP&A with Spreadsheets and Spark with Oscar Castaneda-VillagranFP&A with Spreadsheets and Spark with Oscar Castaneda-Villagran
FP&A with Spreadsheets and Spark with Oscar Castaneda-VillagranDatabricks
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleDatabricks
 
Splunk in Staples: IT Operations
Splunk in Staples: IT OperationsSplunk in Staples: IT Operations
Splunk in Staples: IT OperationsTimur Bagirov
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
 
On the Radar: SnapLogic
On the Radar: SnapLogicOn the Radar: SnapLogic
On the Radar: SnapLogicSnapLogic
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksCI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksDatabricks
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Lviv Startup Club
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSAP Technology
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceSamanthaBerlant
 
Learning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaLearning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaDatabricks
 

Tendances (20)

TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
SplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - StaplesSplunkLive! Customer Presentation - Staples
SplunkLive! Customer Presentation - Staples
 
FP&A with Spreadsheets and Spark with Oscar Castaneda-Villagran
FP&A with Spreadsheets and Spark with Oscar Castaneda-VillagranFP&A with Spreadsheets and Spark with Oscar Castaneda-Villagran
FP&A with Spreadsheets and Spark with Oscar Castaneda-Villagran
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Splunk in Staples: IT Operations
Splunk in Staples: IT OperationsSplunk in Staples: IT Operations
Splunk in Staples: IT Operations
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
 
On the Radar: SnapLogic
On the Radar: SnapLogicOn the Radar: SnapLogic
On the Radar: SnapLogic
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on DatabricksCI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
 
Done oracle hcm cloud ppt (1)
Done  oracle hcm cloud ppt (1)Done  oracle hcm cloud ppt (1)
Done oracle hcm cloud ppt (1)
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business Operations
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
Learning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaLearning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar Castaneda
 

En vedette

Big Data for Local Context
Big Data for Local ContextBig Data for Local Context
Big Data for Local ContextGeorge Percivall
 
Creating the golden record that makes every click personal
Creating the golden record that makes every click personalCreating the golden record that makes every click personal
Creating the golden record that makes every click personalJean-Michel Franco
 
The golden-record
The golden-recordThe golden-record
The golden-recordOri Levi
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QADmitry Tolpeko
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Smart Citizens - Populating Smart Cities / IoTShifts
Smart Citizens - Populating Smart Cities / IoTShiftsSmart Citizens - Populating Smart Cities / IoTShifts
Smart Citizens - Populating Smart Cities / IoTShiftsVolker Hirsch
 
Case Study: SocialCops + Tata Trusts in Vijayawada
Case Study: SocialCops + Tata Trusts in VijayawadaCase Study: SocialCops + Tata Trusts in Vijayawada
Case Study: SocialCops + Tata Trusts in VijayawadaSocialCops
 
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...SocialCops
 

En vedette (10)

Big Data for Local Context
Big Data for Local ContextBig Data for Local Context
Big Data for Local Context
 
Creating the golden record that makes every click personal
Creating the golden record that makes every click personalCreating the golden record that makes every click personal
Creating the golden record that makes every click personal
 
The golden-record
The golden-recordThe golden-record
The golden-record
 
Big Data and E-Commerce
Big Data and E-CommerceBig Data and E-Commerce
Big Data and E-Commerce
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QA
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Smart Citizens - Populating Smart Cities / IoTShifts
Smart Citizens - Populating Smart Cities / IoTShiftsSmart Citizens - Populating Smart Cities / IoTShifts
Smart Citizens - Populating Smart Cities / IoTShifts
 
Case Study: SocialCops + Tata Trusts in Vijayawada
Case Study: SocialCops + Tata Trusts in VijayawadaCase Study: SocialCops + Tata Trusts in Vijayawada
Case Study: SocialCops + Tata Trusts in Vijayawada
 
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
 

Similaire à Building Your Big Data Solution with WSO2

Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 

Similaire à Building Your Big Data Solution with WSO2 (20)

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Big data
Big dataBig data
Big data
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 

Plus de WSO2

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
How to Create a Service in Choreo
How to Create a Service in ChoreoHow to Create a Service in Choreo
How to Create a Service in ChoreoWSO2
 
Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023WSO2
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzureWSO2
 
GartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfGartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfWSO2
 
[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in MinutesWSO2
 
Modernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityModernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityWSO2
 
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...WSO2
 
CIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfCIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfWSO2
 
Delivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoDelivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoWSO2
 
Fueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsFueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsWSO2
 
A Reference Methodology for Agile Digital Businesses
 A Reference Methodology for Agile Digital Businesses A Reference Methodology for Agile Digital Businesses
A Reference Methodology for Agile Digital BusinessesWSO2
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)WSO2
 
Lessons from the pandemic - From a single use case to true transformation
 Lessons from the pandemic - From a single use case to true transformation Lessons from the pandemic - From a single use case to true transformation
Lessons from the pandemic - From a single use case to true transformationWSO2
 
Adding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesAdding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesWSO2
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready BankWSO2
 
WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2
 
[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIsWSO2
 
[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native DeploymentWSO2
 
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”WSO2
 

Plus de WSO2 (20)

Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
How to Create a Service in Choreo
How to Create a Service in ChoreoHow to Create a Service in Choreo
How to Create a Service in Choreo
 
Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023Ballerina Tech Talk - May 2023
Ballerina Tech Talk - May 2023
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on Azure
 
GartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdfGartnerITSymSessionSlides.pdf
GartnerITSymSessionSlides.pdf
 
[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes[Webinar] How to Create an API in Minutes
[Webinar] How to Create an API in Minutes
 
Modernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos IdentityModernizing the Student Journey with Ethos Identity
Modernizing the Student Journey with Ethos Identity
 
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
Choreo - Build unique digital experiences on WSO2's platform, secured by Etho...
 
CIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdfCIO Summit Berlin 2022.pptx.pdf
CIO Summit Berlin 2022.pptx.pdf
 
Delivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing ChoreoDelivering New Digital Experiences Fast - Introducing Choreo
Delivering New Digital Experiences Fast - Introducing Choreo
 
Fueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected ProductsFueling the Digital Experience Economy with Connected Products
Fueling the Digital Experience Economy with Connected Products
 
A Reference Methodology for Agile Digital Businesses
 A Reference Methodology for Agile Digital Businesses A Reference Methodology for Agile Digital Businesses
A Reference Methodology for Agile Digital Businesses
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
 
Lessons from the pandemic - From a single use case to true transformation
 Lessons from the pandemic - From a single use case to true transformation Lessons from the pandemic - From a single use case to true transformation
Lessons from the pandemic - From a single use case to true transformation
 
Adding Liveliness to Banking Experiences
Adding Liveliness to Banking ExperiencesAdding Liveliness to Banking Experiences
Adding Liveliness to Banking Experiences
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready Bank
 
WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021WSO2 API Manager Community Call - November 2021
WSO2 API Manager Community Call - November 2021
 
[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs[API World ] - Managing Asynchronous APIs
[API World ] - Managing Asynchronous APIs
 
[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment[API World 2021 ] - Understanding Cloud Native Deployment
[API World 2021 ] - Understanding Cloud Native Deployment
 
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
[API Word 2021] - Quantum Duality of “API as a Business and a Technology”
 

Building Your Big Data Solution with WSO2

  • 1. Learn  with  WSO2  -­‐  Building   your  Big  Data  Solu8on    Srinath  Perera   Director  of  Research   WSO2  Inc.    
  • 2. About WSO2 •  Providing the only complete open source componentized cloud platform –  Dedicated to removing all the stumbling blocks to enterprise agility –  Enabling you to focus on business logic and business value •  Recognized by leading analyst firms as visionaries and leaders –  Gartner cites WSO2 as visionaries in all 3 categories of application infrastructure –  Forrester places WSO2 in top 2 for API Management •  Global corporation with offices in USA, UK & Sri Lanka –  200+ employees and growing •  Business model of selling comprehensive support & maintenance for our products
  • 3. 150+ globally positioned support customers
  • 4. Consider  a  day  in  your  life   •  What  is  the  best  road  to  take?   •  Would  there  be  any  bad   weather?   •  What  is  the  best  way  to  invest   the  money?   •  Should  I  take  that  loan?   •  Can  I  op8mize  my  day?   •  Is  there  a  way  to  do  this   faster?   •  What  have  others  done  in   similar  cases?   •  Which  product  should  I  buy?      
  • 5. People  wanted  to  (through  ages)   •  To  know  (what   happened?)   •  To  Explain  (why  it   happened)   •  To  Predict  (what  will   happen?)    
  • 6. What  is  Big  data?   •  There  is  lot  of  data  available   –  E.g.  Internet  of  things     •  We  have  compu8ng  power     •  We  have  technology     •  Goal  is  same   –  To  know   –  To  Explain     –  To  predict     •  Challenge  is  the  full  lifecycle    
  • 7. Drivers  of  Big  Data  
  • 8. Data  Avalanche/  Moore’s  law  of  data   •  We  are  now  collec8ng  and  conver8ng  large  amount   of  data  to  digital  forms     •  90%  of  the  data  in  the  world  today  was  created   within  the  past  two  years.     •  Amount  of  data  we  have  doubles  very  fast  
  • 9. In  real  life,  most  data  are  Big   •  Web  does  millions  of  ac8vi8es  per  second,  and  so   much  server  logs  are  created.       •  Social  networks  e.g.  Facebook,  800  Million  ac8ve   users,  40  billion  photos  from  its  user  base.   •  There  are  >4  billion  phones  and  >25%  are  smart   phones.  There  are  billions  of  RFID  tags.     •  Observa8onal  and  Sensor  data   –  Weather  Radars,  Balloons     –  Environmental  Sensors     –  Telescopes     –  Complex  physics  simula8ons  
  • 10. Why  Big  Data  is  hard?   •  How  store?  Assuming  1TB  bytes  it   takes  1000  computers  to  store  a  1PB     •  How  to  move?  Assuming  10Gb   network,  it  takes  2  hours  to  copy  1TB,   or  83  days  to  copy  a  1PB     •  How  to  search?  Assuming  each  record   is  1KB  and  one  machine  can  process   1000  records  per  sec,  it  needs  277CPU   days  to  process  a  1TB  and  785  CPU   years  to  process  a  1  PB   •  How  to  process?     –  How  to  convert  algorithms  to  work  in   large  size   –  How  to  create  new  algorithms   hap://www.susanica.com/photo/9  
  • 11. Why  it  is  hard  (Contd.)?   •  System  build  of  many   computers     •  That  handles  lots  of  data   •  Running  complex  logic     •  This  pushes  us  to  fron8er  of   Distributed  Systems  and   Databases     •  More  data  does  not  mean   there  is  a  simple  model     •  Some  models  can  be  complex   as  the  system   hap://www.flickr.com/photos/mariachily/5250487136,    Licensed  CC  
  • 13. WSO2  Offerings   •  Two  tools     – WSO2  BAM  for  store  and  process     – WSO2  CEP  for  real8me  processing   •  These  tools  covers  whole  processing  life  cycle   for  your  Big  Data  with  help  of  few  other   products  as  needed.     – WSO2  Storage  server   – WSO2  User  Experience  Server    
  • 14. Big  Data  Architecture  Implementa8on  
  • 15. Sensors   •  Built  sensors  in  WSO2   Products   •  Event  logs     –  Click  streams,  Emails,  chat,   search,  tweets  ,Transac8ons  …   •  Custom  Sensors     –  Video  surveillance,  Cash  flows,   Traffic,  Surveillance,  Smart  Grid,   Produc8on  line,  RFID  (e.g.   Walmart),  GPS  sensors,  Mobile   Phone,  Internet  of  Things       hap://www.flickr.com/photos/imuaoo/4257813689/  by  Ian  Muaoo,   hap://www.flickr.com/photos/eastcapital/4554220770/,  hap://www.flickr.com/ photos/patdavid/4619331472/  by  Pat  David  copyright  CC  
  • 16. Collec8ng  Data   •  Data  collected  at  sensors  and  sent  to  big  data   system  via  events  or  flat  files   •  Event  Streams:  we  name  the  events  by  its   content/  originator     •  Get  data  through     – Point  to  Point   – Event  Bus   •  E.g.  Data  bridge   – a  thrij  based  transport  we   did  that  do  about  400k   events/  sec  
  • 17. Storing  Data   •  Historically  we  used  databases   –  Scale  is  a  challenge:  replica8on,   sharding     •  Scalable  op8ons       –  NoSQL  (Cassandra,  Hbase)  [If   data  is  structured]   •  Column  families  Gaining  Ground   –  Distributed  file  systems  (e.g.   HDFS)  [If  data  is  unstructured]   •  New  SQL   –  In  Memory  compu8ng,  VoltDB     •  Specialized  data  structures   –  Graph  Databases,  Data  structure   servers       hap://www.flickr.com/photos/keso/ 363133967/  
  • 18. Storing  Data  (Contd.)   •  WSO2  Offerings  (WSO2  Storage  Server)   – Small  Structured  Data:    keep  in  rela8onal   databases.     – Large  structured  data  :  Cassandra   – Large  unstructured  data:  HDFS  
  • 19. Making  Sense  of  Data   •  To  know  (what  happened?)   –  Basic  analy8cs  +   visualiza8ons  (min,  max,   average,  histogram,   distribu8ons  …  )   –  Interac8ve  drill  down   •  To  explain  (why)   –  Data  mining,  classifica8ons,   building  models,  clustering         •  To  forecast     –  Neural  networks,  decision   models    
  • 20. Making  Sense  of  Data  (Contd.)   •  Batch  processing  –  WSO2  BAM   – Hive  Scripts     – Map  Reduce  Jobs     •  Real  8me  processing  –  CEP     – Event  Query  Language     •  Above  two  are  the  plarorm,  you  need  to   program  your  usecase.    
  • 21. To  know  (what  happened?)   •  Mainly  Analy8cs   –  Min,  Max,  average,   correla8on,  histograms     –  Might  join  group  data  in   many  ways     •  Implemented  with   MapReduce  or  Queries     •  Data  is  ojen  presented  with   some  visualiza8ons   •  Examples   –   forensics     –  Assessments   –  Historical  data/  reports/   trends       hap://www.flickr.com/photos/isriya/ 2967310333/  
  • 22. To  Explain  (Paaerns)   •  Correla8on   –  Scaaer  plot,  sta8s8cal   correla8on   •  Data  Mining  (Detec8ng   Paaerns)   –  Clustering  and  classifica8on     –  Finding  Similar  items     –  Finding  Hubs  and  authori8es   in  a  Graph     –  Finding  frequent  item  sets   –  Making  recommenda8on     •  Apache  Mahout     hap://www.flickr.com/photos/eriwst/2987739376/  and  hap://www.flickr.com/photos/focx/5035444779/        
  • 23. To  Predict:  Forecasts  and  Models   •  Trying  to  build  a  model  for  the   data   •  Theore8cally  or  empirically     –  Analy8cal  models  (e.g.  Physics)   –  Neural  networks     –  Reinforcement  learning     –  Unsupervised  learning  (clustering,   dimensionality  reduc8on,  kernel   methods)   •  Examples     –  Transla8on     –  Weather  Forecast  models     –  Building  profiles  of  users     –  Traffic  models   –  Economic  models     •  Lot  of  domain  specific  work     hap://misterbijou.blogspot.com/ 2010_09_01_archive.html  
  • 24. Informa8on  Visualiza8on   •  Presen8ng  informa8on     –  To  end  user     –  To  decision  takers     –  To  scien8st     •  Interac8ve  explora8on   •  Sending  alerts       •  WSO2  UES     –  Jaggery  based     •  BAM/  CEP  can  Work  with   most  other  UI  tools   hap://www.flickr.com/photos/ stevefaeembra/3604686097/  
  • 25. WSO2  UES   •  Dashboards,  and  Store   •  Build  your  own  Uis  with   Jaggery    
  • 26. MapReduce/  Hadoop   •  First  introduced  by  Google,   and  used  as  the  processing   model  for  their  architecture     •  Implemented  by  opensource   projects  like  Apache  Hadoop   and  Spark     •  Users  writes  two  func8ons:   map  and  reduce     •  The  framework  handles  the   details  like  distributed   processing,  fault  tolerance,   load  balancing  etc.     •  Widely  used,  and  the  one  of   the  catalyst  of  Big  data   void map(ctx, k, v){ tokens = v.split(); for t in tokens ctx.emit(t,1) } void reduce(ctx, k, values[]){ count = 0; for v in values count = count + v; ctx.emit(k,count); }
  • 28. Data  In  the  Move   •  Idea  is  to  process  data  as  they   are  received  in  streaming   fashion     •  Used  when  we  need     –  Very  fast  output     –  Lots  of  events  (few  100k  to   millions)   –  Processing  without  storing  (e.g.   too  much  data)   •  Two  main  technologies   –  Stream  Processing  (e.g.  Strom,   hap://storm-­‐project.net/  )   –  Complex  Event  Processing  (CEP)   hap://wso2.com/products/ complex-­‐event-­‐processor/    
  • 29. Complex  Event  Processing  (CEP)   •  Sees  inputs  as  Event  streams  and  queried  with   SQL  like  language     •  Supports  Filters,  Windows,  Join,  Paaerns  and   Sequences     from p=PINChangeEvents#win.time(3600) join t=TransactionEvents[p.custid=custid][amount>10000] #win.time(3600) return t.custid, t.amount;
  • 31. Case  Study  1:  Tracing  Business  Process   •  Business  process  is  built  using  many  services   •  Track  trace  each   step,  and  analyze   to  understand   how  to  op8mize     •  E.g.  sales  pipeline    
  • 32. Some  Queries   •  Conversion  rate?   •  How  many  deals  in  pipeline  at  each  month?   •  Average  size  of  the  deals?     •  Average  8me  deal  takes?   •  Can  we  guess  an  large  size  deals  early?     •  Which  is  beaer?  Going  for  few  large  ones  or   many  small  ones?     •  Was  there  any  delays  from  Ourside?  
  • 33. Hive:  Average  Size  of  the  Deal   •  Hive  uses  an  SQL  like  synatax.     •  Easy  to  understand  and  learn     hive> LOAD DATA .. hive> SELECT avg(value) from LEAD_ACTIVITY WHERE action=“closedWon” groupby month;
  • 34. Map  Reduce:  How  many  deals  in   Pipeline?  
  • 35. How  many  deals  in  Pipeline?(Contd.)   void map(ctx, k, v){ Deals deal= parse(v); int month = getMonth(deal.time); ctx.emit(month,1) } void reduce(ctx, k, values[]){ count = 0; for v in values count = count + v; ctx.emit(k,count); }
  • 36. Case  study  2:  DEBS  Challenge   •  Event  Processing   challenge     •  Real  football  game,   sensors  in  player   shoes  +  ball     •  Events  in  15k  Hz     •  Event  format     –  Sensor  ID,  TS,  x,  y,  z,  v,   a   •  Queries   –  Running  Stats   –  Ball  Possession   –  Heat  Map  of  Ac8vity     –  Shots  at  Goal    
  • 37. Example:  Detect  ball  Possession     •  Possession  is  8me  a   player  hit  the  ball   un8l  someone  else   hits  it  or  it  goes  out   of  the  ground   from Ball#window.length(1) as b join Players#window.length(1) as p unidirectional on debs: getDistance(b.x,b.y,b.z, p.x, p.y, p.z) < 1000 and b.a > 55 select ... insert into hitStream from old = hitStream , b = hitStream [old. pid != pid ], n= hitStream[b.pid == pid]*, ( e1 = hitStream[b.pid != pid ] or e2= ballLeavingHitStream) select ... insert into BallPossessionStream hap://www.flickr.com/photos/glennharper/146164820/  
  • 38. Conclusions   •  What  is  Big  Data?     •  Big  Data  Architecture     – Collec8ng  data   – Storing  data   – Processing  Data   •  WSO2  Offerings   •  Case  Studies    
  • 40. Engage with WSO2 •  Helping you get the most out of your deployments •  From project evaluation and inception to development and going into production, WSO2 is your partner in ensuring 100% project success