SlideShare une entreprise Scribd logo
1  sur  3
Télécharger pour lire hors ligne
Apache Spark's success: Overhyped or preordained?
After an absence of about a year, and a stint as Research Director at the now defunct Gigaom
Research, I've returned to ZDNet to cover Big Data. The year went by pretty quickly, but a number
of things have changed:
SQL-on-Hadoop has become ubiquitous to the point that almost every Hadoop and relational
database vendor has its own solution
Industry consolidation has begun. Companies like Jaspersoft, Pentaho, Hadapt, RainStor and
Revolution Analytics have been acquired or soon will be.
YARN and Hadoop 2.x now have all the mindshare and old-school MapReduce is in retreat
But one change that has become especially noteworthy is the degree to which Apache Spark has
captured the attention and excitement of the industry.
Special Feature
Going Deep on Big Data
Big data is transitioning from one of the most hyped and anticipated tech trends of recent years into
one of the biggest challenges that IT is now trying to wrestle and harness. We examine the
technologies and best practices for taking advantage of big data and provide a look at organizations
that are putting it to good use.
Spark can run independently of Hadoop or as a YARN application on a Hadoop cluster. In the latter
configuration it can read data in the Hadoop Distributed File System (HDFS) and can then enable a
range of workloads to be carried out on that data. Spark SQL enables a HiveQL-compatible SQL
execution environment; Spark's MLLib enables machine learning; Spark Streaming provides for
high-speed stream processing of data and GraphX provide for graph processing.
See Spark run
In addition to the familiarity that Spark SQL provides, Spark code can be written in Scala, Java and
Python. Spark can (but does not have to) use memory, and in a distributed fashion across the RAM
facilities in its cluster's nodes. Getting a sample application running in Spark is fairly
straightforward. That, combined with its memory-based, non-batch processing capabilities, provide
interactive experimentation and near-instant gratification - something that has not been the norm in
the Hadoop world.
That relatively friction-free experience, even if at the command line, can be intoxicating. And
intoxicated the industry is. While Spark is still quite new and several people have reported to me
that it's not ready for prime time, industry support for Spark is intense. Cloudera has promised to re-
platform most Hadoop ecosystem components in its distribution onto Spark. MapR includes Spark in
its distro and Hortonworks, once a Spark holdout, has jumped on the bandwagon as well, including
Spark in HDP (Hortonworks Data Platform), its own Hadoop distribution.
Getting started is easy
While neither Amazon's Elastic MapReduce nor Microsoft's Azure HDInsight cloud Hadoop services
include Spark automatically, both companies have enabled installation of Spark via custom script
steps that simply require specifying a URL when a cluster is created. Both companies also provide
samples and tutorials that make it easy to run quick-and-dirty Scala code or SQL queries.
And if none of that works for you, then Databricks, the company founded by Spark's creators, has its
Databricks Cloud offering (something you might wish to call Spark as a Service, if that didn't
overload an already well-worn acronym) in the wings.
Some companies, like Paxata and ClearStory Data, have built their products on Spark. Others, like
Platfora, have deployed new product capabilities that have dependencies on, and certain
integrations with, the Apache Software Foundation project. Adoption of Spark in the enterprise may
be low so far, but industry adoption is formidable.
The Power of IoT and Big Data
As sensors spread across almost every industry, the internet of things is going to trigger a massive
influx of big data. We delve into where IoT will have the biggest impact and what it means for the
future of big data analytics.
So what happens next with Spark? Some in the industry have predicted that Spark's popularity and
its ability to run without Hadoop mean it may overtake it. Others, myself included, are more
skeptical of that, given that HDFS alone has become enough of a standard to keep Hadoop
entrenched, and YARN allows challengers to run as applications on the cluster.
Irrational exuberance?
In general, vendors seem so far ahead of customers on Spark that it's almost worrisome. If Spark
isn't yet stable and robust enough for big enterprise production jobs, if even the companies that
have standardized on Spark say they have had to write their own enhancements to make it work for
them (something I have been told by important vendors in the Big Data space), then is Spark just
hype?
Readiness is in the eye of the beholder. Robin Bloor of Bloor Research, a well-respected industry
analyst firm, once told me this (and I'm paraphrasing): when platforms get beyond a certain critical
mass of support, they eventually become what the hype has made them out to be. In other words,
belief in the quality of a platform tends to self-fulfill. Once the industry commits to something, it
creates an imperative around getting it stable and well-performing, even if the committers
themselves have to pitch in.
We're now a bit more than three months into the year; I saw my first Mr. Softee truck yesterday, a
sure sign that Spring has finally come to New York. Before the big Christmas tree goes up in
Rockefeller Center at the end of the year, Spark seems likely to achieve at least some of its own self-
fulfilling maturation and reliability. There's a bunch of shopping days to go, in the interim; let's wait
and see the outcome.
http://zdnet.com.feedsportal.com/c/35462/f/675847/s/4524d71f/sc/15/l/0L0Szdnet0N0Carticle0Cspar
ks0Esuccess0Eoverhyped0Eor0Epreordained0C0Tftag0FRSSbaffb68/story01.htm

Contenu connexe

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

En vedette (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Apache Spark's success: Overhyped or preordained?

  • 1. Apache Spark's success: Overhyped or preordained? After an absence of about a year, and a stint as Research Director at the now defunct Gigaom Research, I've returned to ZDNet to cover Big Data. The year went by pretty quickly, but a number of things have changed: SQL-on-Hadoop has become ubiquitous to the point that almost every Hadoop and relational database vendor has its own solution Industry consolidation has begun. Companies like Jaspersoft, Pentaho, Hadapt, RainStor and Revolution Analytics have been acquired or soon will be. YARN and Hadoop 2.x now have all the mindshare and old-school MapReduce is in retreat But one change that has become especially noteworthy is the degree to which Apache Spark has captured the attention and excitement of the industry. Special Feature Going Deep on Big Data Big data is transitioning from one of the most hyped and anticipated tech trends of recent years into one of the biggest challenges that IT is now trying to wrestle and harness. We examine the technologies and best practices for taking advantage of big data and provide a look at organizations that are putting it to good use. Spark can run independently of Hadoop or as a YARN application on a Hadoop cluster. In the latter configuration it can read data in the Hadoop Distributed File System (HDFS) and can then enable a range of workloads to be carried out on that data. Spark SQL enables a HiveQL-compatible SQL execution environment; Spark's MLLib enables machine learning; Spark Streaming provides for high-speed stream processing of data and GraphX provide for graph processing. See Spark run In addition to the familiarity that Spark SQL provides, Spark code can be written in Scala, Java and Python. Spark can (but does not have to) use memory, and in a distributed fashion across the RAM facilities in its cluster's nodes. Getting a sample application running in Spark is fairly straightforward. That, combined with its memory-based, non-batch processing capabilities, provide interactive experimentation and near-instant gratification - something that has not been the norm in the Hadoop world. That relatively friction-free experience, even if at the command line, can be intoxicating. And intoxicated the industry is. While Spark is still quite new and several people have reported to me that it's not ready for prime time, industry support for Spark is intense. Cloudera has promised to re- platform most Hadoop ecosystem components in its distribution onto Spark. MapR includes Spark in its distro and Hortonworks, once a Spark holdout, has jumped on the bandwagon as well, including Spark in HDP (Hortonworks Data Platform), its own Hadoop distribution. Getting started is easy
  • 2. While neither Amazon's Elastic MapReduce nor Microsoft's Azure HDInsight cloud Hadoop services include Spark automatically, both companies have enabled installation of Spark via custom script steps that simply require specifying a URL when a cluster is created. Both companies also provide samples and tutorials that make it easy to run quick-and-dirty Scala code or SQL queries. And if none of that works for you, then Databricks, the company founded by Spark's creators, has its Databricks Cloud offering (something you might wish to call Spark as a Service, if that didn't overload an already well-worn acronym) in the wings. Some companies, like Paxata and ClearStory Data, have built their products on Spark. Others, like Platfora, have deployed new product capabilities that have dependencies on, and certain integrations with, the Apache Software Foundation project. Adoption of Spark in the enterprise may be low so far, but industry adoption is formidable. The Power of IoT and Big Data As sensors spread across almost every industry, the internet of things is going to trigger a massive influx of big data. We delve into where IoT will have the biggest impact and what it means for the future of big data analytics. So what happens next with Spark? Some in the industry have predicted that Spark's popularity and its ability to run without Hadoop mean it may overtake it. Others, myself included, are more skeptical of that, given that HDFS alone has become enough of a standard to keep Hadoop entrenched, and YARN allows challengers to run as applications on the cluster. Irrational exuberance? In general, vendors seem so far ahead of customers on Spark that it's almost worrisome. If Spark isn't yet stable and robust enough for big enterprise production jobs, if even the companies that have standardized on Spark say they have had to write their own enhancements to make it work for them (something I have been told by important vendors in the Big Data space), then is Spark just hype? Readiness is in the eye of the beholder. Robin Bloor of Bloor Research, a well-respected industry analyst firm, once told me this (and I'm paraphrasing): when platforms get beyond a certain critical mass of support, they eventually become what the hype has made them out to be. In other words, belief in the quality of a platform tends to self-fulfill. Once the industry commits to something, it
  • 3. creates an imperative around getting it stable and well-performing, even if the committers themselves have to pitch in. We're now a bit more than three months into the year; I saw my first Mr. Softee truck yesterday, a sure sign that Spring has finally come to New York. Before the big Christmas tree goes up in Rockefeller Center at the end of the year, Spark seems likely to achieve at least some of its own self- fulfilling maturation and reliability. There's a bunch of shopping days to go, in the interim; let's wait and see the outcome. http://zdnet.com.feedsportal.com/c/35462/f/675847/s/4524d71f/sc/15/l/0L0Szdnet0N0Carticle0Cspar ks0Esuccess0Eoverhyped0Eor0Epreordained0C0Tftag0FRSSbaffb68/story01.htm