SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Confidential
The  Rise  of  Cascading
2015  Cascading  User  
Survey  Results
Confidential
WHAT’S	
  BEHIND	
  THE	
  RISE	
  OF	
  CASCADING?
Enterprise	
  IT	
  teams	
  designing	
  their	
  big	
  data	
  platforms	
  must	
  choose	
  from	
  a	
  
daunting	
  array	
  of	
  development	
  frameworks	
  and	
  compute	
  fabrics.	
  On	
  the	
  one	
  
hand,	
  they	
  want	
  a	
  development	
  framework	
  that	
  leverages	
  existing	
  skillsets.	
  
At	
  the	
  same	
  time,	
  they	
  want	
  the	
  flexibility	
  to	
  benefit	
  from	
  performance	
  gains	
  
of	
  the	
  latest,	
  greatest	
  compute	
  fabrics.	
  
Cascading	
  is	
  a	
  robust	
  framework	
  with	
  over	
  10,000	
  known	
  production	
  
deployments,	
   over	
  275,000	
  downloads	
  per	
  month.	
  Twitter,	
  AirBnB,	
  Climate	
  
Corp,	
  Apple,	
  EBay,	
  Netflix,	
  are	
  examples	
  of	
  few	
  of	
  the	
  enterprises	
  that	
  have	
  
built	
  their	
  Hadoop	
  practices	
  with	
  Cascading.	
  The	
  Cascading	
  user	
  group	
  is	
  
diverse,	
  self-­‐supporting	
   community	
  who	
  are	
  helping	
  innovate	
  Cascading’s	
  
scalability,	
  portability,	
  performance	
  and	
  value.	
  In	
  addition,	
  the	
  presence	
  of	
  a	
  
large	
  number	
  of	
  open	
  source	
  projects	
  contributed	
  by	
  mainstream	
  enterprises	
  
such	
  as	
  by	
  Netflix,	
  Commonwealth	
  Bank	
  of	
  Australia,	
  Expedia	
  attests	
  to	
  
vibrancy	
  of	
  the	
  Cascading	
  ecosystem.
In	
  this	
  paper,	
  we'll	
  reveal	
  what’s	
  behind	
  Cascading's	
  growth	
  by	
  digging	
  into	
  
the	
  results	
  of	
  a	
  new	
  Cascading	
  user	
  survey.	
  In	
  general,	
  Cascading	
  users	
  turn	
  
out	
  to	
  be	
  extremely	
  concerned	
  about	
  reliability	
  and	
  performance	
  at	
  scale.	
  
Many	
  experimented	
  with	
  early	
  Hadoop	
  frameworks	
  like	
  Hive	
  and	
  Pig,	
  but	
  
found	
  Cascading	
  to	
  be	
  a	
  more	
  scalable	
  approach.	
  And	
  lately,	
  the	
  easy	
  
portability	
  of	
  Cascading	
  applications	
  between	
  compute	
  fabrics	
  has	
  generated	
  
a	
  lot	
  of	
  excitement	
  in	
  the	
  community.	
  
Confidential
0 10 20 30 40 50 60 70
Head/VP of IT
Head of IT Infrastructure
Application Manager/Director
BI/EDW Manager/Director
CIO/SVP of IT
IT Specialist
Architect
IT Manager or Director
Developer/Engineer
What title best describes your role?
N=121 Liverpool   Street   station   crowd  blur.  Photo   by David  Sim.
CASCADING	
  IS	
  MOST	
  POPULAR	
  AMONG	
  BUILDERS	
  AND	
  
MANAGERS	
  OF	
  BIG	
  DATA	
  APPLICATIONS	
  	
  
Confidential
CASCADING	
  COMMUNITY	
  MEMBERS	
  ARE	
  MATURE,	
  PRODUCTION	
  
USERS
8%
26%
25%
41%
How long have you been using
Hadoop?
0-12 months
12-24 months
24-36 months
Over 3 years
N=69
Most	
  respondents	
  have	
  been	
  using	
  Hadoop	
  for	
  over	
  3	
  years.	
  
Assuming	
  the	
  sample	
  is	
  representative,	
  the	
  Cascading	
  
community	
  largely	
  consists	
  of	
  early	
  Hadoop	
  adopters.	
  
Furthermore,	
  the	
  Cascading	
  community	
  isn’t	
  just	
  dabbling:	
  
Over	
  84% have	
  already	
  put	
  their	
  Cascading	
  applications	
  into	
  
production	
  or	
  plan	
  to	
  do	
  so.	
  
As	
  for	
  why,	
  many	
  likely	
  found	
  out	
  the	
  hard	
  way	
  that	
  
developing	
  directly	
  on	
  Hadoop	
  was	
  painful,	
  tedious	
  and	
  
poorly	
  suited	
  to	
  scale.
0 5 10 15 20 25 30 35 40 45
Other
Poor integration into existing IT
infrastructure
Lack of scalability
Lack of portability across compute
fabrics
Difficult to integrate to existing systems
Poor troubleshooting capabilities
Lack of skilled Hadoop resources
High cost of development in existing
platform
Slow development in existing platform
What challenges did you have that made you look for
an application development framework?
Confidential
THE	
  PATH	
  TO	
  CASCADING:	
  HIVE,	
  PIG,	
  AND	
  GUI	
  TOOLS
N=69
Given	
  the	
  maturity	
  of	
  Cascading	
  users,	
  it’s	
  no	
  surprise	
  that	
  
many	
  explored	
  alternatives	
  before	
  settling	
  on	
  Cascading.	
  
The	
  majority	
  (51%)	
  tried	
  Hive	
  and	
  Pig,	
  both	
  of	
  which	
  were	
  
early	
  abstraction	
  layers	
  for	
  MapReduce.	
  Today,	
  many	
  Pig	
  
applications	
  run	
  alongside	
  Cascading	
  and	
  many	
  Hive	
  
applications	
  run	
  within Cascading.	
  	
  
Why	
  didn’t	
  they	
  stick	
  with	
  Hive	
  and	
  Pig?	
  Most	
  
organizations	
  determined	
  they	
  could	
  not	
  scale	
  with	
  Hive	
  
and	
  Pig.	
  Typically	
  that	
  was	
  because	
  Hive	
  and	
  Pig	
  required	
  
scarce	
  technical	
  resources	
  and	
  because	
  development	
  in	
  
those	
  frameworks	
  was	
  slow.	
  Those	
  who	
  opted	
  for	
  other	
  
API	
  frameworks	
  found	
  them	
  not	
  yet	
  ready	
  for	
  the	
  
enterprise.	
  
A	
  smaller	
  group	
  experimented	
  with	
  GUI-­‐based	
  ETL	
  tools.	
  
While	
  these	
  tools	
  made	
  it	
  easy	
  to	
  leverage	
  existing	
  
resources	
  and	
  skill	
  sets,	
  their	
  capabilities	
  were	
  too	
  limited.	
  
They	
  also	
  required	
  building	
  special	
  scripts	
  to	
  achieve	
  
complex	
  functionality,	
   which	
  negated	
  the	
  benefits	
  of	
  
simplicity.	
  	
  Additionally,	
   many	
  users	
  did	
  not	
  like	
  being	
  
locked	
  into	
  a	
  single-­‐vendor	
   solution.
26%
25%22%
19%
8%
Before selecting Cascading, what alternative solutions
did you explore? (select all that apply)
Pig
Hive
Other API frameworks (Spark,
Crunch)
GUI-based ETL tools (Talend,
Informatica, Pentaho)
No other alternatives were
explored
Confidential
0 10 20 30 40 50 60
Other
Flink
Tez
Storm
Kafka
MapReduce
Spark
Which compute fabric(s) are you using or
planning to use in the next 18 mths?
PORTABILITY	
  ACROSS	
  FABRICS
N=69
New	
  compute fabrics	
  appear	
  all	
  the	
  time,	
  though	
  not	
  all	
  are	
  
production-­‐ready.	
  The	
  responses	
   reflect high	
  interest	
  in	
  Spark	
  and	
  a	
  
desire	
  for	
  true	
  streaming	
  (not	
  micro-­‐batches).	
  	
  
MapReduce isn’t going	
  away any	
  time	
  soon,	
  especially	
  where	
  
reliability	
  is	
  a	
  requirement.	
  	
  Still,	
  many	
  are	
  experimenting	
  with other	
  
compute	
  fabrics.	
  Because	
  each	
  fabric	
  offers	
  application-­‐specific	
  
advantages,	
  most	
  organizations	
  will	
  likely	
  wind	
  up	
  running	
  multiple	
  
fabrics.	
  
Cascading	
  3.0	
  supports	
  Tez,	
  MapReduce,	
  and	
  local/in-­‐memory,	
   so	
  
users	
  can	
  port	
  applications	
  from	
  MapReduce to	
  Tez simply	
  by	
  
changing	
  a	
  few	
  lines	
  of	
  code.	
  	
  Easy	
  portability	
  makes	
  Cascading	
  an	
  
ideal	
  platform	
  for	
  moving	
  from	
  MapReduce to	
  Tez without	
  incurring	
  
the	
  cost	
  of	
  rewriting	
  applications.	
  Soon,	
  Cascading	
  will	
  support	
  the	
  
same	
  portability	
  for	
  Spark	
  and	
  Flink (for	
  Flink,	
  support	
  will	
  be	
  
community	
  contributed).	
  
Confidential
CASCADING	
  BRIDGES	
  OTHER	
  DEVELOPMENT	
  FRAMEWORKS
N=69
Despite	
  their	
  shortcomings,	
  MapReduce,	
  Hive	
  and	
  Pig	
  are	
  still	
  
widely	
  in	
  use	
  as	
  development	
  frameworks,	
  largely	
  because	
  many	
  
early	
  Hadoop	
  applications	
  were	
  built	
  through	
  these	
  interfaces.	
  No	
  
surprise	
  that	
  	
  we	
  see	
  a	
  lot	
  of	
  excitement	
  about	
  Spark	
  as	
  a	
  new	
  
development	
  framework	
  as	
  well;	
  many	
  users	
  are	
  experimenting	
  
with	
  developing	
  directly	
  in	
  the	
  Spark	
  API.	
  
Cascading	
  will	
  support	
  Spark	
  in	
  a	
  future	
  WIP,	
  adding	
  an	
  important	
  
framework	
  option	
  for	
  Spark	
  developers.	
  Developers	
  who	
  build	
  in	
  
Cascading	
  will	
  be	
  able	
  to	
  port	
  their	
  applications	
  from	
  MapReduce to	
  
Spark	
  without	
  having	
  to	
  rewrite	
  them	
  in	
  the	
  Spark	
  API.
In	
  summary,	
  there	
  is	
  no	
  one-­‐size-­‐fits-­‐all	
  framework.	
  Flexibility	
  is	
  key	
  
as	
  organizations	
  build	
  out	
  their	
  big	
  data	
  strategies	
  and	
  platforms.	
  
Cascalog
Scalding
Pig
Hive
MapReduce
Cascading
Spark
0 10 20 30 40 50 60
What data application development
framework do you use?
“[Cascading] Best Hadoop API for enterprise data-
intensive apps.” – Architect.Fortune 500 Healthcare Payer
Confidential
COMMON	
  USE	
  CASES:	
  ETL,	
  ANALYTICS	
  &	
  DATA	
  INTEGRATION
N=69
Most	
  organizations	
  rely	
  on	
  Hadoop	
  for	
  heavy	
  processing	
  steps	
  
within	
  ETL,	
  analytics	
  or	
  data	
  integration	
  flows.	
  Some	
  have	
  moved	
  
their	
  entire	
  ETL	
  processing	
  to	
  Hadoop,	
  while	
  others	
  have	
  moved	
  
only	
  portions	
  of	
  their	
  workflows.	
  	
  
For	
  example,	
  AirBnB uses	
  Cascading	
  for	
  complicated	
  infrastructure	
  
tasks	
  such	
  as	
  data	
  normalization	
  and	
  cleansing.	
  AirBnB also	
  
leverages	
  Cascading	
  for	
  reconstructing	
  corrupted	
  files	
  and	
  merging	
  
data.	
  In	
  combination	
  with	
  Cascading,	
  Pig	
  and	
  Hive	
  are	
  used	
  by	
  
analysts	
  to	
  run	
  batch	
  scripts	
  to	
  perform	
  ad	
  hoc	
  analysis.	
  
With	
  these	
  tools,	
  analysts	
  are	
  able	
  to	
  more	
  easily	
  study	
  crucial	
  
metrics	
  like	
  click-­‐through	
  rates,	
  page	
  statistics,	
  and	
  drop-­‐off	
  rates.	
  
0 10 20 30 40 50
Other
Search Optimization
Recommendation Engines
Data Quality
Machine Learning and Scoring
Data Integration
Analytics
ETL
What best describes the projects where you
are using Cascading?
45%
Offloading
ETL to
Hadoop
40%
To Support
Analytics/BI
Projects
33%
Data
Integration
Projects
Confidential
Extremely
likely - 10
23%
9
10%
8
20%
7
19%
6
11%
5
6%
4
1%
3
3%
2
4%
Not at all
likely - 0
3%
How likely is it that you would
recommend Cascading to a friend or
colleague?
WHY	
  THEY	
  LOVE	
  CASCADING:	
  TDD,	
  JAVA	
  API,	
  PORTABILITY
N=79
Top	
  3	
  Most	
  Impactful	
  Capabilities
v Test	
  Driven	
  Development	
  (49%)	
  -­‐ Efficiently	
  test	
  code	
  and	
  process	
  
local	
  files	
  before	
  you	
  deploy	
  on	
  a	
  cluster	
  with	
  Cascading’s	
  local	
  or	
  in-­‐
memory	
  mode.	
  Incorporate	
  inline	
  data	
  assertions	
  to	
  define	
  results	
  at	
  
any	
  point	
  in	
  your	
  pipeline.	
  	
  Failed	
  assertions	
  are	
  easily	
  visible	
   and	
  
available	
  for	
  analysis.
v JavaAPI	
  (44%)	
  -­‐ Cascading	
  is	
  a	
  Java	
  library	
  and	
  does	
  not	
  require	
  
installation.	
  Cascading	
  fits	
  directly	
  into	
  a	
  standard	
  development	
  
process;	
  all	
  you	
  have	
  to	
  do	
  is	
  code	
  to	
  the	
  API.
v Application	
  Portability	
  (43%)	
  -­‐ When	
  you	
  compile	
  a	
  Cascading	
  job,	
  it	
  
automatically	
  creates	
  a	
  run-­‐time	
  executable	
  for	
  your	
  specified	
  
compute	
  fabric.	
  Simply	
  by	
  changing	
  a	
  few	
  lines	
  of	
  code,	
  you	
  can	
  test	
  
your	
  application	
  on	
  multiple	
  fabrics	
  and	
  choose	
  the	
  best	
  for	
  your	
  
needs.	
  
53%Of Respondents
are Promoters
(8/10)
Confidential
CASCADING	
  IMPROVES	
  PRODUCTIVITY
N=79
7%
16%
7%
18%26%
16%
10%
What percentage would you estimate the
productivity of your staff has improved?
Over 300%
Over 100%
80%-100%
60%-80%
40%-60%
20%-40%
Less than 20%
Most  increased  productivity  by  at  least  40%
Confidential
CASCADING	
  SLASHES	
  TIME	
  TO MARKET
N=79
Most  improved  time  to  market  by  at  least  
40%
5%
17%
12%
18%
17%
18%
13%
What percentage would you estimate your
time to market has improved?
Over 300%
Over 100%
80%-100%
60%-80%
40%-60%
20%-40%
Less than 20%
Confidential
N=69
0 10 20 30 40 50 60
Other
Supporting chargeback models
Forecasting big data infrastructure
needs
Monitoring SLA's for Hadoop
applications
Identify and resolve Hadoop
application issues faster
Optimizing application performance
What future challenges do you anticipate in
managing your data applications?
THE	
  FUTURE:	
  BETTER	
  PERFORMANCE,	
  DATA	
  PIPELINE	
  VISIBILITY
Application	
  performance	
  management	
  is	
  a	
  top-­‐of-­‐mind	
  concern	
  for	
  
most	
  respondents.	
  While	
  performance	
  tuning	
  happens	
  on	
  the	
  
operations	
  side,	
  optimizing	
  applications	
  to	
  meet	
  service-­‐ level	
  
commitments	
  is	
  usually	
  a	
  collaborative	
  effort	
  between	
  development	
  
and	
  operations teams.	
  
Developers	
  need	
  better	
  tools	
  to	
  visualize	
  data	
  pipelines	
   and	
  detect	
  
undesirable	
  behavior	
  before they	
  promote	
  applications	
  to	
  
production.	
  	
  Operations	
  teams	
  need	
  better	
  tools	
  to	
  monitor,	
  
manage	
  and	
  optimize	
  data	
  delivery.	
  
An	
  important,	
  though	
  secondary	
  concern,	
  is	
  tracking	
  the	
  rate	
  of	
  
Hadoop	
  resource	
  consumption	
  so	
  clusters	
  can	
  be	
  right-­‐sized	
  and	
  
costs	
  distributed	
  across	
  divisions.	
   This	
  is	
  particularly	
  true	
  as	
  more	
  of	
  
of	
  an	
  organization’s	
  departments/teams	
  build	
  and	
  rely	
  on	
  big	
  data	
  
applications,	
  transforming	
  their	
  Hadoop	
  cluster	
  from	
  a	
  side	
  project	
  
into	
  core	
  production	
  IT	
  infrastructure.	
  
With	
  new	
  application	
  performance	
  management	
  tools	
  such	
  as	
  
Driven,	
  teams	
  can	
  visualize	
  data	
  pipelines	
  and	
  identify	
  unwanted	
  
behavior	
  more	
  effectively.	
  Tools	
  like	
  Driven	
  also	
  arm	
  teams	
  with	
  the	
  
data	
  necessary	
  to	
  pinpoint	
  issues	
  quickly	
  and	
  resolve	
  them	
  
collaboratively.
Confidential
APPENDIX
Confidential
DISTRIBUTIONS
0 5 10 15 20 25 30 35 40
Count of Other (please specify)
Count of MapR
Count of Hortonworks
Count of Apache Hadoop
Count of Amazon EMR
Count of Cloudera
Distributions
N=69
Confidential
NUMBER OFAPPLICATIONSANDVOLUME
Over 100 60-100 30-60 15-30 5-15 1-5
Less than 250 pipelines 4 5 4 26
500 - 1,000 pipelines 2 2 1 1 2
250 - 500 pipelines 1 3 5
2,500 - 5,000 pipelines 1 1
1,000 - 2,500 pipelines 2 3 1
Over 5,000 pipelines 1
Over 10,000 pipelines 1 1 2
0
5
10
15
20
25
30
35
40
Average Numberof Cascading Applications and Pipelines N=69
Confidential
PRODUCTIONSTATUS
0 5 10 15 20 25 30 35 40 45 50
No and not planned
Not yet but planned
Yes
Are you using your Cascading data applications in a
production environment?
N=69

Contenu connexe

Tendances

Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Dataconomy Media
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Jan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceJan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceLviv Startup Club
 
Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Ellen Friedman
 
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk Ellen Friedman
 
ACCELERATE SAP® APPLICATIONS WITH CDNETWORKS
ACCELERATE SAP® APPLICATIONS WITH CDNETWORKSACCELERATE SAP® APPLICATIONS WITH CDNETWORKS
ACCELERATE SAP® APPLICATIONS WITH CDNETWORKSCDNetworks
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
The_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_ExternalThe_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_ExternalFernando Lucini
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Understanding The Cloud For Enterprise Businesses.
Understanding The Cloud For Enterprise Businesses. Understanding The Cloud For Enterprise Businesses.
Understanding The Cloud For Enterprise Businesses. Triaxil
 
Introduction to the graph technologies landscape
Introduction to the graph technologies landscapeIntroduction to the graph technologies landscape
Introduction to the graph technologies landscapeLinkurious
 
Power up! Mass Migrations at Speed and Scale - Accenture
Power up! Mass Migrations at Speed and Scale - AccenturePower up! Mass Migrations at Speed and Scale - Accenture
Power up! Mass Migrations at Speed and Scale - AccentureAmazon Web Services
 

Tendances (16)

Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
The collaborative cloud
The collaborative cloudThe collaborative cloud
The collaborative cloud
 
Jan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practiceJan van der Vegt. Challenges faced with machine learning in practice
Jan van der Vegt. Challenges faced with machine learning in practice
 
Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101
 
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
 
ACCELERATE SAP® APPLICATIONS WITH CDNETWORKS
ACCELERATE SAP® APPLICATIONS WITH CDNETWORKSACCELERATE SAP® APPLICATIONS WITH CDNETWORKS
ACCELERATE SAP® APPLICATIONS WITH CDNETWORKS
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
The_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_ExternalThe_Story_of_HavenOndemand_External
The_Story_of_HavenOndemand_External
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Understanding The Cloud For Enterprise Businesses.
Understanding The Cloud For Enterprise Businesses. Understanding The Cloud For Enterprise Businesses.
Understanding The Cloud For Enterprise Businesses.
 
Introduction to the graph technologies landscape
Introduction to the graph technologies landscapeIntroduction to the graph technologies landscape
Introduction to the graph technologies landscape
 
Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop Hadoop Summit Tokyo HDP Sandbox Workshop
Hadoop Summit Tokyo HDP Sandbox Workshop
 
Power up! Mass Migrations at Speed and Scale - Accenture
Power up! Mass Migrations at Speed and Scale - AccenturePower up! Mass Migrations at Speed and Scale - Accenture
Power up! Mass Migrations at Speed and Scale - Accenture
 

Similaire à Cascading 2015 User Survey Results

Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group MeetVinoth Kannan
 
IDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudIDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudWestern Digital
 
Accelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with CascadingAccelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with CascadingCascading
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitRandy Bias
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @IndixManoj Mahalingam
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big DataPaco Nathan
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
2012 Future of Cloud Computing
2012 Future of Cloud Computing 2012 Future of Cloud Computing
2012 Future of Cloud Computing Michael Skok
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_clouderaPrem Jain
 
SnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration PushSnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration PushSnapLogic
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Infochimps report 451 research impact report
Infochimps report   451 research impact reportInfochimps report   451 research impact report
Infochimps report 451 research impact reportAccenture
 
Infochimps report 451 research impact report
Infochimps report   451 research impact reportInfochimps report   451 research impact report
Infochimps report 451 research impact reportAccenture
 

Similaire à Cascading 2015 User Survey Results (20)

Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
IDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudIDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your Cloud
 
Accelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with CascadingAccelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with Cascading
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
 
Big Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil JadhavBig Data & Open Source - Neil Jadhav
Big Data & Open Source - Neil Jadhav
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
2012 Future of Cloud Computing
2012 Future of Cloud Computing 2012 Future of Cloud Computing
2012 Future of Cloud Computing
 
Comarch ICT - CLOUD EXPO
Comarch ICT - CLOUD EXPOComarch ICT - CLOUD EXPO
Comarch ICT - CLOUD EXPO
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 
SnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration PushSnapLogic Raises $37.5M to Fuel Big Data Integration Push
SnapLogic Raises $37.5M to Fuel Big Data Integration Push
 
Cloud Seeding
Cloud SeedingCloud Seeding
Cloud Seeding
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Infochimps report 451 research impact report
Infochimps report   451 research impact reportInfochimps report   451 research impact report
Infochimps report 451 research impact report
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
Infochimps report 451 research impact report
Infochimps report   451 research impact reportInfochimps report   451 research impact report
Infochimps report 451 research impact report
 

Dernier

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Dernier (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Cascading 2015 User Survey Results

  • 1. Confidential The  Rise  of  Cascading 2015  Cascading  User   Survey  Results
  • 2. Confidential WHAT’S  BEHIND  THE  RISE  OF  CASCADING? Enterprise  IT  teams  designing  their  big  data  platforms  must  choose  from  a   daunting  array  of  development  frameworks  and  compute  fabrics.  On  the  one   hand,  they  want  a  development  framework  that  leverages  existing  skillsets.   At  the  same  time,  they  want  the  flexibility  to  benefit  from  performance  gains   of  the  latest,  greatest  compute  fabrics.   Cascading  is  a  robust  framework  with  over  10,000  known  production   deployments,   over  275,000  downloads  per  month.  Twitter,  AirBnB,  Climate   Corp,  Apple,  EBay,  Netflix,  are  examples  of  few  of  the  enterprises  that  have   built  their  Hadoop  practices  with  Cascading.  The  Cascading  user  group  is   diverse,  self-­‐supporting   community  who  are  helping  innovate  Cascading’s   scalability,  portability,  performance  and  value.  In  addition,  the  presence  of  a   large  number  of  open  source  projects  contributed  by  mainstream  enterprises   such  as  by  Netflix,  Commonwealth  Bank  of  Australia,  Expedia  attests  to   vibrancy  of  the  Cascading  ecosystem. In  this  paper,  we'll  reveal  what’s  behind  Cascading's  growth  by  digging  into   the  results  of  a  new  Cascading  user  survey.  In  general,  Cascading  users  turn   out  to  be  extremely  concerned  about  reliability  and  performance  at  scale.   Many  experimented  with  early  Hadoop  frameworks  like  Hive  and  Pig,  but   found  Cascading  to  be  a  more  scalable  approach.  And  lately,  the  easy   portability  of  Cascading  applications  between  compute  fabrics  has  generated   a  lot  of  excitement  in  the  community.  
  • 3. Confidential 0 10 20 30 40 50 60 70 Head/VP of IT Head of IT Infrastructure Application Manager/Director BI/EDW Manager/Director CIO/SVP of IT IT Specialist Architect IT Manager or Director Developer/Engineer What title best describes your role? N=121 Liverpool   Street   station   crowd  blur.  Photo   by David  Sim. CASCADING  IS  MOST  POPULAR  AMONG  BUILDERS  AND   MANAGERS  OF  BIG  DATA  APPLICATIONS    
  • 4. Confidential CASCADING  COMMUNITY  MEMBERS  ARE  MATURE,  PRODUCTION   USERS 8% 26% 25% 41% How long have you been using Hadoop? 0-12 months 12-24 months 24-36 months Over 3 years N=69 Most  respondents  have  been  using  Hadoop  for  over  3  years.   Assuming  the  sample  is  representative,  the  Cascading   community  largely  consists  of  early  Hadoop  adopters.   Furthermore,  the  Cascading  community  isn’t  just  dabbling:   Over  84% have  already  put  their  Cascading  applications  into   production  or  plan  to  do  so.   As  for  why,  many  likely  found  out  the  hard  way  that   developing  directly  on  Hadoop  was  painful,  tedious  and   poorly  suited  to  scale. 0 5 10 15 20 25 30 35 40 45 Other Poor integration into existing IT infrastructure Lack of scalability Lack of portability across compute fabrics Difficult to integrate to existing systems Poor troubleshooting capabilities Lack of skilled Hadoop resources High cost of development in existing platform Slow development in existing platform What challenges did you have that made you look for an application development framework?
  • 5. Confidential THE  PATH  TO  CASCADING:  HIVE,  PIG,  AND  GUI  TOOLS N=69 Given  the  maturity  of  Cascading  users,  it’s  no  surprise  that   many  explored  alternatives  before  settling  on  Cascading.   The  majority  (51%)  tried  Hive  and  Pig,  both  of  which  were   early  abstraction  layers  for  MapReduce.  Today,  many  Pig   applications  run  alongside  Cascading  and  many  Hive   applications  run  within Cascading.     Why  didn’t  they  stick  with  Hive  and  Pig?  Most   organizations  determined  they  could  not  scale  with  Hive   and  Pig.  Typically  that  was  because  Hive  and  Pig  required   scarce  technical  resources  and  because  development  in   those  frameworks  was  slow.  Those  who  opted  for  other   API  frameworks  found  them  not  yet  ready  for  the   enterprise.   A  smaller  group  experimented  with  GUI-­‐based  ETL  tools.   While  these  tools  made  it  easy  to  leverage  existing   resources  and  skill  sets,  their  capabilities  were  too  limited.   They  also  required  building  special  scripts  to  achieve   complex  functionality,   which  negated  the  benefits  of   simplicity.    Additionally,   many  users  did  not  like  being   locked  into  a  single-­‐vendor   solution. 26% 25%22% 19% 8% Before selecting Cascading, what alternative solutions did you explore? (select all that apply) Pig Hive Other API frameworks (Spark, Crunch) GUI-based ETL tools (Talend, Informatica, Pentaho) No other alternatives were explored
  • 6. Confidential 0 10 20 30 40 50 60 Other Flink Tez Storm Kafka MapReduce Spark Which compute fabric(s) are you using or planning to use in the next 18 mths? PORTABILITY  ACROSS  FABRICS N=69 New  compute fabrics  appear  all  the  time,  though  not  all  are   production-­‐ready.  The  responses   reflect high  interest  in  Spark  and  a   desire  for  true  streaming  (not  micro-­‐batches).     MapReduce isn’t going  away any  time  soon,  especially  where   reliability  is  a  requirement.    Still,  many  are  experimenting  with other   compute  fabrics.  Because  each  fabric  offers  application-­‐specific   advantages,  most  organizations  will  likely  wind  up  running  multiple   fabrics.   Cascading  3.0  supports  Tez,  MapReduce,  and  local/in-­‐memory,   so   users  can  port  applications  from  MapReduce to  Tez simply  by   changing  a  few  lines  of  code.    Easy  portability  makes  Cascading  an   ideal  platform  for  moving  from  MapReduce to  Tez without  incurring   the  cost  of  rewriting  applications.  Soon,  Cascading  will  support  the   same  portability  for  Spark  and  Flink (for  Flink,  support  will  be   community  contributed).  
  • 7. Confidential CASCADING  BRIDGES  OTHER  DEVELOPMENT  FRAMEWORKS N=69 Despite  their  shortcomings,  MapReduce,  Hive  and  Pig  are  still   widely  in  use  as  development  frameworks,  largely  because  many   early  Hadoop  applications  were  built  through  these  interfaces.  No   surprise  that    we  see  a  lot  of  excitement  about  Spark  as  a  new   development  framework  as  well;  many  users  are  experimenting   with  developing  directly  in  the  Spark  API.   Cascading  will  support  Spark  in  a  future  WIP,  adding  an  important   framework  option  for  Spark  developers.  Developers  who  build  in   Cascading  will  be  able  to  port  their  applications  from  MapReduce to   Spark  without  having  to  rewrite  them  in  the  Spark  API. In  summary,  there  is  no  one-­‐size-­‐fits-­‐all  framework.  Flexibility  is  key   as  organizations  build  out  their  big  data  strategies  and  platforms.   Cascalog Scalding Pig Hive MapReduce Cascading Spark 0 10 20 30 40 50 60 What data application development framework do you use? “[Cascading] Best Hadoop API for enterprise data- intensive apps.” – Architect.Fortune 500 Healthcare Payer
  • 8. Confidential COMMON  USE  CASES:  ETL,  ANALYTICS  &  DATA  INTEGRATION N=69 Most  organizations  rely  on  Hadoop  for  heavy  processing  steps   within  ETL,  analytics  or  data  integration  flows.  Some  have  moved   their  entire  ETL  processing  to  Hadoop,  while  others  have  moved   only  portions  of  their  workflows.     For  example,  AirBnB uses  Cascading  for  complicated  infrastructure   tasks  such  as  data  normalization  and  cleansing.  AirBnB also   leverages  Cascading  for  reconstructing  corrupted  files  and  merging   data.  In  combination  with  Cascading,  Pig  and  Hive  are  used  by   analysts  to  run  batch  scripts  to  perform  ad  hoc  analysis.   With  these  tools,  analysts  are  able  to  more  easily  study  crucial   metrics  like  click-­‐through  rates,  page  statistics,  and  drop-­‐off  rates.   0 10 20 30 40 50 Other Search Optimization Recommendation Engines Data Quality Machine Learning and Scoring Data Integration Analytics ETL What best describes the projects where you are using Cascading? 45% Offloading ETL to Hadoop 40% To Support Analytics/BI Projects 33% Data Integration Projects
  • 9. Confidential Extremely likely - 10 23% 9 10% 8 20% 7 19% 6 11% 5 6% 4 1% 3 3% 2 4% Not at all likely - 0 3% How likely is it that you would recommend Cascading to a friend or colleague? WHY  THEY  LOVE  CASCADING:  TDD,  JAVA  API,  PORTABILITY N=79 Top  3  Most  Impactful  Capabilities v Test  Driven  Development  (49%)  -­‐ Efficiently  test  code  and  process   local  files  before  you  deploy  on  a  cluster  with  Cascading’s  local  or  in-­‐ memory  mode.  Incorporate  inline  data  assertions  to  define  results  at   any  point  in  your  pipeline.    Failed  assertions  are  easily  visible   and   available  for  analysis. v JavaAPI  (44%)  -­‐ Cascading  is  a  Java  library  and  does  not  require   installation.  Cascading  fits  directly  into  a  standard  development   process;  all  you  have  to  do  is  code  to  the  API. v Application  Portability  (43%)  -­‐ When  you  compile  a  Cascading  job,  it   automatically  creates  a  run-­‐time  executable  for  your  specified   compute  fabric.  Simply  by  changing  a  few  lines  of  code,  you  can  test   your  application  on  multiple  fabrics  and  choose  the  best  for  your   needs.   53%Of Respondents are Promoters (8/10)
  • 10. Confidential CASCADING  IMPROVES  PRODUCTIVITY N=79 7% 16% 7% 18%26% 16% 10% What percentage would you estimate the productivity of your staff has improved? Over 300% Over 100% 80%-100% 60%-80% 40%-60% 20%-40% Less than 20% Most  increased  productivity  by  at  least  40%
  • 11. Confidential CASCADING  SLASHES  TIME  TO MARKET N=79 Most  improved  time  to  market  by  at  least   40% 5% 17% 12% 18% 17% 18% 13% What percentage would you estimate your time to market has improved? Over 300% Over 100% 80%-100% 60%-80% 40%-60% 20%-40% Less than 20%
  • 12. Confidential N=69 0 10 20 30 40 50 60 Other Supporting chargeback models Forecasting big data infrastructure needs Monitoring SLA's for Hadoop applications Identify and resolve Hadoop application issues faster Optimizing application performance What future challenges do you anticipate in managing your data applications? THE  FUTURE:  BETTER  PERFORMANCE,  DATA  PIPELINE  VISIBILITY Application  performance  management  is  a  top-­‐of-­‐mind  concern  for   most  respondents.  While  performance  tuning  happens  on  the   operations  side,  optimizing  applications  to  meet  service-­‐ level   commitments  is  usually  a  collaborative  effort  between  development   and  operations teams.   Developers  need  better  tools  to  visualize  data  pipelines   and  detect   undesirable  behavior  before they  promote  applications  to   production.    Operations  teams  need  better  tools  to  monitor,   manage  and  optimize  data  delivery.   An  important,  though  secondary  concern,  is  tracking  the  rate  of   Hadoop  resource  consumption  so  clusters  can  be  right-­‐sized  and   costs  distributed  across  divisions.   This  is  particularly  true  as  more  of   of  an  organization’s  departments/teams  build  and  rely  on  big  data   applications,  transforming  their  Hadoop  cluster  from  a  side  project   into  core  production  IT  infrastructure.   With  new  application  performance  management  tools  such  as   Driven,  teams  can  visualize  data  pipelines  and  identify  unwanted   behavior  more  effectively.  Tools  like  Driven  also  arm  teams  with  the   data  necessary  to  pinpoint  issues  quickly  and  resolve  them   collaboratively.
  • 14. Confidential DISTRIBUTIONS 0 5 10 15 20 25 30 35 40 Count of Other (please specify) Count of MapR Count of Hortonworks Count of Apache Hadoop Count of Amazon EMR Count of Cloudera Distributions N=69
  • 15. Confidential NUMBER OFAPPLICATIONSANDVOLUME Over 100 60-100 30-60 15-30 5-15 1-5 Less than 250 pipelines 4 5 4 26 500 - 1,000 pipelines 2 2 1 1 2 250 - 500 pipelines 1 3 5 2,500 - 5,000 pipelines 1 1 1,000 - 2,500 pipelines 2 3 1 Over 5,000 pipelines 1 Over 10,000 pipelines 1 1 2 0 5 10 15 20 25 30 35 40 Average Numberof Cascading Applications and Pipelines N=69
  • 16. Confidential PRODUCTIONSTATUS 0 5 10 15 20 25 30 35 40 45 50 No and not planned Not yet but planned Yes Are you using your Cascading data applications in a production environment? N=69