Hadoop Hadoop & Spark meetup - Altiscale

•Télécharger en tant que PPTX, PDF•

0 j'aime•601 vues

Mark Kerzner

Hadoop as a service by Ajay Jha

Technologie

Altiscale
Big Data-as-a-Service
Paul Tibaldi RSD & Ajay Jha SA

• Market Background
• Who is Altiscale?
• Why are we different/better?
• Hadoop Admin
• Apache Hadoop Stack
• Platform/Access/Demo
• Q/A
2
Big Data As A Service

5
Big Data in The Cloud is Accelerating
On-
Premises
32%
Cloud
Only
23%
Cloud
Plus On-
Premises
29%
Source: “Hadoop Expansion Boosts Cloud and Unsupported On-Premises Deployments,” Merv Adrian, Nick Huedecker, 3 September 2015

But the journey has dangers
Gartner:
70% of independent
Big Data implementations
will fail to meet revenue
and cost objectives,
through 2018.

Altiscale Data Cloud GA in 2014
Financed by top-tier technology investors
Recognized innovator in Hadoop-as-a-Service
About Altiscale

About Altiscale
Led by experienced, renowned Hadoop team from Yahoo!
• Raymie Stata, CEO. Former Yahoo! CTO,
well-known advocate of Apache Software Foundation
• David Chaiken, CTO. Former Yahoo! Chief Architect
Built and managed by veterans of Big Data, SaaS, and
enterprise software
• From Google, Netflix, LinkedIn, VMware, Oracle, and Yahoo!
40,000 nodes
500 PB
1,000 users
$ billions at stake
Raymie Stata, CEO David Chaiken, CTO Ricardo Jenez
VP of Engineering
Charles Wimmer
Head of Operations

Big data built for speed
Fast time to value—days not months
Easier, faster scalability—with elastic scaling
Operations support—so your jobs get done
Lower TCO—for fast investment payback

11
Unmatched Security
Altiscale is the only provider
that delivers integrated security
encompassing its Big Data platform offering

Big Data is complex.
It gets more complicated as you scale.

Altiscale Data Cloud is 100% based on Apache open source.
Our current Altiscale Data Cloud 4.0 release is composed of the following Apache components and
versions:
• Apache Hadoop 2.7.1
• Apache Spark 1.5*
• Apache Hive (& HCatalog) 1.2
• Apache Tez 0.7.0
• Apache Pig 0.15.1
• Apache Oozie 4.2.0
• Apache Flume 1.5.2
• Avro 1.7.4
• JDK/JRE 7 (Sun/Oracle version)
• HttpFS
In addition to the above, we also support the three latest versions of Spark to our customers. That
allows our customers the options of a conservative approach as well as a the option to work with
the “bleeding edge” fast moving Spark community.
Concurrency with Apache Versioning

Hire an expert to take care of the cluster
• Hardware setup and Cluster installation
• Address hardware failure
• Upgrade Hadoop stack
• Tuning config parameters
• yarn-site.xml  ex : yarn.nodemanager.resource.memory-mb
• mapred-site.xml  ex : mapreduce.task.io.sort.mb
• hdfs-site.xml  ex : dfs.blocksize
Hadoop Administration

 Spark example
• Build Spark code laptop using maven
• Build the jar and copy over Altiscale’s workbench (Gateway) node.
• Launch Spark job on YARN.
• Monitor using Resource Manager
Quick Spark Demo

Contenu connexe

Tendances

Data-In-Motion UnleashedDataWorks Summit

Hadoop for the MassesDataWorks Summit/Hadoop Summit

Insights into Real World Data Management ChallengesDataWorks Summit

2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock

The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit

Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureDataWorks Summit

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit

Built-In Security for the CloudDataWorks Summit

Scaling Data Science on Big DataDataWorks Summit

Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit

Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.

Harnessing the Power of Apache Hadoop Cloudera, Inc.

Extreme Sports & Beyond: Exploring a new frontier in data with GoProCloudera, Inc.

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit

Data Science and Machine Learning for the EnterpriseCloudera, Inc.

Supercharge Splunk with Cloudera Cloudera, Inc.

Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit

Smart Enterprise Big Data Bus for the Modern Responsive EnterpriseDataWorks Summit

Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.

Tendances (20)

Data-In-Motion Unleashed

Hadoop for the Masses

Insights into Real World Data Management Challenges

2017 OpenWorld Keynote for Data Integration

The DAP - Where YARN, HBase, Kafka and Spark go to Production

Addressing Enterprise Customer Pain Points with a Data Driven Architecture

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)

Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...

Built-In Security for the Cloud

Scaling Data Science on Big Data

Innovation in the Enterprise Rent-A-Car Data Warehouse

Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production

Harnessing the Power of Apache Hadoop

Extreme Sports & Beyond: Exploring a new frontier in data with GoPro

MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...

Data Science and Machine Learning for the Enterprise

Supercharge Splunk with Cloudera 

Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon

Smart Enterprise Big Data Bus for the Modern Responsive Enterprise

Part 3: Models in Production: A Look From Beginning to End

En vedette

Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner

ODPi is Now Open for Business: Here's What it MeansPivotalOpenSourceHub

BKK16-400B ODPI - Standardizing HadoopLinaro

Hadoop on ec2Mark Kerzner

Set up Hadoop Cluster on Amazon EC2IMC Institute

Toorcamp 2016Mark Kerzner

Cloudera searchMark Kerzner

Oil and gas big data editionMark Kerzner

Apache Cassandra CertificationVskills

Configuring Your First Hadoop Cluster On EC2benjaminwootton

The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies

SAP HANA Cloud Portal - Overview PresentationSAP Portal

Cost of Ownership for Hadoop ImplementationDataWorks Summit

ROI of Big Data Analytics Native on HadoopDataWorks Summit

Building a distributed search system with Hadoop and LuceneMirko Calvaresi

Hadoop Overview & Architecture EMC

Big Data Analytics with HadoopPhilippe Julio

En vedette (17)

Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup

ODPi is Now Open for Business: Here's What it Means

BKK16-400B ODPI - Standardizing Hadoop

Hadoop on ec2

Set up Hadoop Cluster on Amazon EC2

Toorcamp 2016

Cloudera search

Oil and gas big data edition

Apache Cassandra Certification

Configuring Your First Hadoop Cluster On EC2

The TCO Calculator - Estimate the True Cost of Hadoop

SAP HANA Cloud Portal - Overview Presentation

Cost of Ownership for Hadoop Implementation

ROI of Big Data Analytics Native on Hadoop

Building a distributed search system with Hadoop and Lucene

Hadoop Overview & Architecture

Big Data Analytics with Hadoop

Similaire à Hadoop Hadoop & Spark meetup - Altiscale

Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani

Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp

Hot Technologies of 2013: Hadoop 2.0Inside Analysis

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.

Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho

Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino

Trend Micro Big Data Platform and Apache BigtopEvans Ye

Unlocking Big Data Insights with MySQLMatt Lord

Presentation big dataappliance-overview_oow_v3xKinAnx

HP Helion Webinar #4 - Open stack the magic pillBeMyApp

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp

Parquet and AVROairisData

Level Up – How to Achieve Hadoop AccelerationInside Analysis

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters

Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit

Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle

Open Source SQL for Hadoop: Where are we and Where are we Going?DataWorks Summit

Transform Your Business with Big Data and Hortonworks Pactera_US

Munich HUG 21.11.2013Emil Andreas Siemes

Similaire à Hadoop Hadoop & Spark meetup - Altiscale (20)

Oracle Cloud : Big Data Use Cases and Architecture

Big Data at Oracle - Strata 2015 San Jose

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017

Hot Technologies of 2013: Hadoop 2.0

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5

Big Data Integration Webinar: Getting Started With Hadoop Big Data

Azure Cafe Marketplace with Hortonworks March 31 2016

Trend Micro Big Data Platform and Apache Bigtop

Unlocking Big Data Insights with MySQL

Presentation big dataappliance-overview_oow_v3

HP Helion Webinar #4 - Open stack the magic pill

Cloudera Analytics and Machine Learning Platform - Optimized for Cloud

Parquet and AVRO

Level Up – How to Achieve Hadoop Acceleration

Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...

Carpe Datum: Building Big Data Analytical Applications with HP Haven

Big Data Retrospective - STL Big Data IDEA Jan 2019

Open Source SQL for Hadoop: Where are we and Where are we Going?

Transform Your Business with Big Data and Hortonworks

Munich HUG 21.11.2013

Plus de Mark Kerzner

IBM Strategy for SparkMark Kerzner

Joe Witt presentation on Apache NiFiMark Kerzner

FreeEed popcorn overviewMark Kerzner

FreeEed presentationMark Kerzner

Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner

Night owl by Boyd Meyer of PROS Mark Kerzner

SHMcloud visionMark Kerzner

Porting your hadoop app to horton works hdpMark Kerzner

Automated Hadoop Cluster Construction on EC2Mark Kerzner

Open source e_discoveryMark Kerzner

FreEed - Open Source eDiscoveryMark Kerzner

Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner

Google Office in Zurich, SwitzerlandMark Kerzner

Fun art with fruit and vegetableMark Kerzner

Carnavale de VeniceMark Kerzner

Holocaust Memorial TatoMark Kerzner

Yehuda PenMark Kerzner

Mark ChagallMark Kerzner

Thailand VisiteMark Kerzner

Venice views with musicMark Kerzner

Plus de Mark Kerzner (20)

IBM Strategy for Spark

Joe Witt presentation on Apache NiFi

FreeEed popcorn overview

FreeEed presentation

Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)

Night owl by Boyd Meyer of PROS

SHMcloud vision

Porting your hadoop app to horton works hdp

Automated Hadoop Cluster Construction on EC2

Open source e_discovery

FreEed - Open Source eDiscovery

Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera

Google Office in Zurich, Switzerland

Fun art with fruit and vegetable

Carnavale de Venice

Holocaust Memorial Tato

Yehuda Pen

Mark Chagall

Thailand Visite

Venice views with music

Dernier

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

AI as an Interface for Commercial BuildingsMemoori

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

"ML in Production",Oleksandr BaganFwdays

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Search Engine Optimization SEO PDF for 2024.pdfRankYa

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Install Stable Diffusion in windows machinePadma Pradeep

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf

Vector Databases 101 - An introduction to the world of Vector Databases

AI as an Interface for Commercial Buildings

Developer Data Modeling Mistakes: From Postgres to NoSQL

Gen AI in Business - Global Trends Report 2024.pdf

Human Factors of XR: Using Human Factors to Design XR Systems

DevEX - reference for building teams, processes, and platforms

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

SIP trunking in Janus @ Kamailio World 2024

My Hashitalk Indonesia April 2024 Presentation

Ensuring Technical Readiness For Copilot in Microsoft 365

Are Multi-Cloud and Serverless Good or Bad?

Powerpoint exploring the locations used in television show Time Clash

"ML in Production",Oleksandr Bagan

The Future of Software Development - Devin AI Innovative Approach.pdf

My INSURER PTE LTD - Insurtech Innovation Award 2024

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Search Engine Optimization SEO PDF for 2024.pdf

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Install Stable Diffusion in windows machine

Hadoop Hadoop & Spark meetup - Altiscale

1. Altiscale Big Data-as-a-Service Paul Tibaldi RSD & Ajay Jha SA

2. • Market Background • Who is Altiscale? • Why are we different/better? • Hadoop Admin • Apache Hadoop Stack • Platform/Access/Demo • Q/A 2 Big Data As A Service

3. Market Background

4. 4 Interest in Big Data is growing fast

5. 5 Big Data in The Cloud is Accelerating On- Premises 32% Cloud Only 23% Cloud Plus On- Premises 29% Source: “Hadoop Expansion Boosts Cloud and Unsupported On-Premises Deployments,” Merv Adrian, Nick Huedecker, 3 September 2015

6. But the journey has dangers Gartner: 70% of independent Big Data implementations will fail to meet revenue and cost objectives, through 2018.

7. Who is Altiscale?

8. Altiscale Data Cloud GA in 2014 Financed by top-tier technology investors Recognized innovator in Hadoop-as-a-Service About Altiscale

9. About Altiscale Led by experienced, renowned Hadoop team from Yahoo! • Raymie Stata, CEO. Former Yahoo! CTO, well-known advocate of Apache Software Foundation • David Chaiken, CTO. Former Yahoo! Chief Architect Built and managed by veterans of Big Data, SaaS, and enterprise software • From Google, Netflix, LinkedIn, VMware, Oracle, and Yahoo! 40,000 nodes 500 PB 1,000 users $ billions at stake Raymie Stata, CEO David Chaiken, CTO Ricardo Jenez VP of Engineering Charles Wimmer Head of Operations

10. Big data built for speed Fast time to value—days not months Easier, faster scalability—with elastic scaling Operations support—so your jobs get done Lower TCO—for fast investment payback

11. 11 Unmatched Security Altiscale is the only provider that delivers integrated security encompassing its Big Data platform offering

12. Complete best of breed

13. Big Data is complex. It gets more complicated as you scale.

14. Big Data-as-a-Service

15. The Altiscale Data Cloud Core

16. Altiscale Data Cloud is 100% based on Apache open source. Our current Altiscale Data Cloud 4.0 release is composed of the following Apache components and versions: • Apache Hadoop 2.7.1 • Apache Spark 1.5* • Apache Hive (& HCatalog) 1.2 • Apache Tez 0.7.0 • Apache Pig 0.15.1 • Apache Oozie 4.2.0 • Apache Flume 1.5.2 • Avro 1.7.4 • JDK/JRE 7 (Sun/Oracle version) • HttpFS In addition to the above, we also support the three latest versions of Spark to our customers. That allows our customers the options of a conservative approach as well as a the option to work with the “bleeding edge” fast moving Spark community. Concurrency with Apache Versioning

17. Hire an expert to take care of the cluster • Hardware setup and Cluster installation • Address hardware failure • Upgrade Hadoop stack • Tuning config parameters • yarn-site.xml  ex : yarn.nodemanager.resource.memory-mb • mapred-site.xml  ex : mapreduce.task.io.sort.mb • hdfs-site.xml  ex : dfs.blocksize Hadoop Administration

18. Accessing the cloud

19.  Spark example • Build Spark code laptop using maven • Build the jar and copy over Altiscale’s workbench (Gateway) node. • Launch Spark job on YARN. • Monitor using Resource Manager Quick Spark Demo

20. 20 Thank You!

Hadoop Hadoop & Spark meetup - Altiscale

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (17)

Similaire à Hadoop Hadoop & Spark meetup - Altiscale

Similaire à Hadoop Hadoop & Spark meetup - Altiscale (20)

Plus de Mark Kerzner

Plus de Mark Kerzner (20)

Dernier

Dernier (20)

Hadoop Hadoop & Spark meetup - Altiscale