SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Page 1
Accelerate Big Data
Application Development with
Cascading and HDP
April 22, 2014
Page 2
Agenda
•  Take advantage of the latest Hadoop processing
frameworks like YARN and Tez in HDP 2.1
•  How developers can create future proof, data-driven
applications built on Apache Hadoop with Cascading
•  How Cascading accelerates Hadoop application
development by abstracting the platforms underneath
Page 3
Speakers
Ajay Singh, Director of
Technical Channels,
Hortonworks
Supreet Oberoi, VP of
Field Engineering,
Concurrent
Page 4
Open
Leadership
Drive innovation in
the open exclusively
via the Apache
community-driven
open source process
Enterprise
Rigor
Engineer, test and
certify Apache Hadoop
with the enterprise in
mind
Ecosystem
Endorsement
Focus on deep
integration with
existing data center
technologies and
skills
Enable your Modern Data Architecture
by delivering Enterprise Apache Hadoop
Our
Mission:
Reseller Partners:
Headquartered in Palo Alto, CA; 300+ employees and growing
Page 5
A data architecture under pressure
from new data
APPLICATIONS*DATA**SYSTEM*
REPOSITORIES*
SOURCES*
Exis4ng*Sources**
(CRM,*ERP,*Clickstream,*
Logs)*
RDBMS* EDW* MPP*
Business**
Analy4cs*
Custom*
Applica4ons*
Packaged*
Applica4ons*
Source: IDC
2.8*ZB*in*2012*
85%*from*New*Data*Types*
15x*Machine*Data*by*2020*
40*ZB*by*2020*
OLTP,&ERP,&CRM&
Systems&
Unstructured&documents,&
emails&
Clickstream&
Server&logs&
Sen>ment,&Web&
Data&
Sensor.&Machine&
Data&
Geoloca>on&
Page 6
A Modern Data ArchitectureAPPLICATIONS*DATA**SYSTEM*
REPOSITORIES*
SOURCES*
Exis4ng*Sources**
(CRM,*ERP,*Clickstream,*Logs)*
RDBMS* EDW* MPP*
Emerging*Sources**
(Sensor,*Sen4ment,*Geo,*Unstructured)*
OPERATIONAL*
TOOLS*
MANAGE*&*
MONITOR*
DEV*&*DATA*
TOOLS*
BUILD*&*
TEST*
Business**
Analy4cs*
Custom*
Applica4ons*
Packaged*
Applica4ons*
Governance
&Integration
ENTERPRISE HADOOP
Security
Operations
Data Access
Data Management
Page 7
Clickstream
Capture and
analyze website
visitors’ data trails
and optimize your
website
Sensors
Discover
patterns in data
streaming
automatically
from remote
sensors and
machines
Server Logs
Research logs to
diagnose process
failures and
prevent security
breaches
New types of dataHadoop
Value:
Sentiment
Understand how
your customers
feel about your
brand and
products –
right now
Geographic
Analyze
location-based
data to manage
operations
where they
occur
Unstructured
Understand patterns
in files across
millions of web
pages, emails, and
documents
Page 8
Enterprise Hadoop: Core
Foundation of Hadoop
Applications
Page 9
Core Capabilities of Enterprise Hadoop
Load data
and manage
according
to policy
Deploy and
effectively
manage the
platform
Store and process all of your Corporate Data Assets
&
Access your data simultaneously in multiple ways
(batch, interactive, real-time) Provide layered
approach to
security through
Authentication,
Authorization,
Accounting, and
Data Protection
&
DATA**MANAGEMENT*
SECURITY*DATA**ACCESS*
GOVERNANCE*&*
INTEGRATION*
OPERATIONS*
Enable both existing and new application to
provide value to the organization
PRESENTATION*&*APPLICATION*
Empower existing operations and
security tools to manage Hadoop
ENTERPRISE*MGMT*&*SECURITY*
Provide deployment choice across physical, virtual, cloud
DEPLOYMENT*OPTIONS*
Page 10
HDP 2.1: Enterprise Hadoop
HDP 2.1
Hortonworks Data Platform
**
Provision,*
Manage*&*
Monitor*
&
Ambari&
Zookeeper&
Scheduling*
&
Oozie&
Data*Workflow,*
Lifecycle*&*
Governance*
*
Falcon&
Sqoop&
Flume&
NFS&
WebHDFS&
YARN*:*Data*Opera4ng*System&
DATA**MANAGEMENT*
SECURITY*DATA**ACCESS*
GOVERNANCE*&*
INTEGRATION*
Authen4ca4on*
Authoriza4on*
Accoun4ng*
Data*Protec4on*
&
Storage:&HDFS&
Resources:&YARN&
Access:&Hive,&…&&
Pipeline:&Falcon&
Cluster:&Knox&
OPERATIONS*
Script*
&
Pig&
*
*
Search*
*
Solr&
*
*
SQL*
*
Hive/Tez,&
HCatalog&
*
*
NoSQL*
*
HBase&
Accumulo&
*
*
Stream*
**
Storm&
&
*
*
Others*
*
InUMemory&
Analy>cs,&&
ISV&engines&
1& °& °& °& °& °& °& °& °& °&
°& °& °& °& °& °& °& °& °& °&
°& °& °& °& °& °& °& °& °& °&
°&
°&
N*
HDFS**
(Hadoop&Distributed&File&System)&
Batch*
*
Map&
Reduce&
*
*
Deployment*Choice&
Linux Windows On-Premise Cloud
Page 11
Hadoop is wholly integrated
into the data center
APPLICATIONS*DATA**SYSTEM*SOURCES*
RDBMS* EDW* MPP*
Emerging*Sources**
(Sensor,*Sen4ment,*Geo,*Unstructured)*
HANA
BusinessObjects BI
OPERATIONAL*TOOLS*
DEV*&*DATA*TOOLS*
Exis4ng*Sources**
(CRM,*ERP,*Clickstream,*Logs)*
INFRASTRUCTURE*
HDP 2.1Governance
&Integration
Security
Operations
Data Access
Data Management
Page 12
Developing Apps on Hadoop
•  Spring XD Framework
–  Consistent configuration & Java API across wide range of Hadoop ecosystem
projects
•  Microsoft .NET SDK For Hadoop
–  API access to HDP on windows and HDInsight service
–  LINQ libraries for accessing Hive
•  Cascading
–  Delivers an easy to use abstraction layer for developing Hadoop applications
–  Supports development in Scala & Clojure
–  Hortonworks to Certify, Support & Deliver Cascading SDK with Hortonworks Data
Platform
DRIVINGINNOVATION
THROUGHDATAACCELERATEBIGDATAAPPLICATIONDEVELOPMENTWITH
CASCADINGANDHDP
Supreet Oberoi | April 22, 2014
VP Field Engineering, Concurrent Inc
HORTONWORKSPARTNERSWITHCONCURRENT
• The Cascading SDK will now be integrated with the
Hortonworks Data Platform (HDP)
• Hortonworks will certify and support Cascading™
SDK with HDP
• Cascading will support Apache Tez; companies using
Cascading or domain-specific languages on
Cascading can seamlessly migrate HDP supporting
Apache Tez
The partnership benefits users by combining the power and simplicity of
Cascading with the reliability and stability of HDP.
Confidential
AGENDA
3
• Who is Concurrent
• What is Cascading
• Where is it used
• What problems does Cascading solve
• What is included in the Cascading kit
!
Confidential
ABOUTCONCURRENT,INC.
4
Confidential
GETTOKNOWCONCURRENT
5
Leader in Application Infrastructure for Big Data!
• Building enterprise software to simplify Big Data application
development and management
Products and Technology!
• CASCADING

The most widely used application infrastructure for building Big
Data applications with over 150,000 downloads each month
• DRIVEN

Enterprise Data Application management for Big Data apps
Proven - Simple, Reliable, Robust!
• Thousands of enterprises rely on Concurrent to provide their
data application infrastructure.
Founded: 2008
HQ: San Francisco, CA
!
CEO: Gary Nakamura
CTO, Founder: Chris Wensel
!
www.concurrentinc.com
PRODUCTSANDTECHNOLOGY
!
!
Big Data Application Development!
Simple, Reliable, Repeatable
!
!
Unmatched Application Insight!
Visibility into your Data Applications
Open Source Commercial
www.concurrentinc.com/products
Open Source Community!
Focused on Data App Development
!
Project home of Cascading
Collection of sub-projects / tools
!
!
Data App Management!
Realtime monitoring
Performance Management
Operational Control
Data Provenance
Compliance Governance
BUSINESSESDEPENDONUS
• Cascading Java API
• Data normalization and cleansing of search and click-through
logs for use by analytics tools, Hive analysts
• Easy to operationalize heavy lifting of data
BUSINESSESDEPENDONUS
• Cascalog (Clojure)
• Weather pattern modeling to protect growers against loss
• ETL against 20+ datasets daily
• Machine learning to create models
• Purchased by Monsanto for $930M US
BUSINESSESDEPENDONUS
• Scalding (Scala)
• Machine learning (linear algebra) to improve
• User experience
• Ad quality (matching users and ad effectiveness)
• All revenue applications are running on Cascading/Scalding
• IPO
TWITTER
BUSINESSESDEPENDONUS
• Estimate suicide risk from what people write online
• Cascading + Cassandra
• You can do more than optimize add yields
• http://www.durkheimproject.org
CASCADINGDEPLOYMENTS
11
DRIVINGADVANTAGEWITHDATAAPPLICATIONS
Enterprise IT!
Extract Transform Load
Log File Analysis
Systems Integration
Operations Analysis
!
Corporate Apps!
HR Analytics
Employee Behavioral Analysis
Customer Support | eCRM
Business Reporting
!
Telecom!
Data processing of Open Data
Geospatial Indexing
Consumer Mobile Apps
Location based services
Marketing / Retail!
Mobile, Social, Search Analytics
Funnel analysis
Revenue attribution
Customer experiments
Ad Optimization
Retail recommenders
!
Consumer / Entertainment!
Music Recommendation
Comparison Shopping
Restaurant Rankings
Real Estate
Rental Listings
Travel Search & Forecast
!
!
Finance!
Fraud and Anomaly Detection
Fraud Experiments
Customer Analytics
Insurance Risk Metric
!
Health / Biotech!
Aggregate metrics for Govt
Person biometrics
Veterinary diagnostics
Next-Gen Genomics
Argonomics
Environmental Maps
!
BIGDATA—THENEXTPHASEOFMATURITY
“It’s all about the Apps”"
There needs to be a comprehensive solution for building, deploying, running and
managing these new class of enterprise applications
Business Strategy Data & Technology
Loyalty and promotions analysis
Retention campaigns
Marketing campaign optimization
Fraud detection
Risk management
Scientific research
Remote monitoring and diagnosis
and more!
Your Data & Systems
Hadoop, EDW, Mainframe,
System Logs, NO SQL DBs, etc.Challenges!
!
Leveraging existing skill sets,
existing systems, past investments
and existing business processes
Connecting Business and Data
Confidential
PRODUCTSOVERVIEW
14
• Java API (alternative to Hadoop MapReduce)
• Separates business logic from integration
• Testable at every lifecycle stage
• Works with any JVM language
• Many integration adapters
CASCADING
15
Process Planner
Processing API Integration API
Scheduler API
Scheduler
Apache Hadoop
Cascading
Data Stores
Scripting
Scala, Clojure, JRuby, Jython, Groovy
Enterprise Java
KEYCASCADINGCONCEPTS
Tap
KEYCASCADINGCONCEPTS
Pipe
Flow
• Functions
• Filters
• Joins
‣ Inner / Outer / Mixed
‣ Asymmetrical / Symmetrical
• Merge (Union)
• Grouping
‣ Secondary Sorting
‣ Unique (Distinct)
• Aggregations
‣ Count, Average, etc
‣ Rolling windows
SOMECOMMONPATTERNS
18
filter
filter
function
functionfilterfunction
data
Pipeline
Split Join
Merge
data
Topology
WORDCOUNTEXAMPLE!
!
String docPath = args[ 0 ];!
String wcPath = args[ 1 ];!
Properties properties = new Properties();!
AppProps.setApplicationJarClass( properties, Main.class );!
HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );!
!
configuration
integration
!
// create source and sink taps!
Tap docTap = new Hfs( new TextDelimited( true, "t" ), docPath );!
Tap wcTap = new Hfs( new TextDelimited( true, "t" ), wcPath );!
!
processing
// specify a regex to split "document" text lines into token stream!
Fields token = new Fields( "token" );!
Fields text = new Fields( "text" );!
RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ [](),.]" );!
// only returns "token"!
Pipe docPipe = new Each( "token", text, splitter, Fields.RESULTS );!
// determine the word counts!
Pipe wcPipe = new Pipe( "wc", docPipe );!
wcPipe = new GroupBy( wcPipe, token );!
wcPipe = new Every( wcPipe, Fields.ALL, new Count(), Fields.ALL );!
scheduling
!
// connect the taps, pipes, etc., into a flow definition!
FlowDef flowDef = FlowDef.flowDef().setName( "wc" )!
.addSource( docPipe, docTap )!
 .addTailSink( wcPipe, wcTap );!
// create the Flow!
Flow wcFlow = flowConnector.connect( flowDef ); // <<-- Unit of Work!
wcFlow.complete(); // <<-- Runs jobs on Cluster
CASCADINGOVERVIEW
www.cascading.org
Build Data
Apps that are
scale-free!!
!
!
Design principals ensure
best practices at any scale
Test-Driven
Development!
!
Efficiently test code and
process local files before
you deploy on a cluster
Staffing
Bottleneck!
!
Use existing Java, SQL,
modeling skills sets
Operational
Complexity!
!
Simple - Package up into
one jar and hand to
operations
Application
Portability!
!
!
Write once, then run on
different computation
fabrics.
Systems
Integration!
!
!
Hadoop never lives alone.
Easily integrate to your
existing systems!
Proven application development
framework for building Data
applications
Framework addresses
OPERATIONALREADINESS:DISCIPLINE&ABILITYTOMEASURE
• Visibility into app development
• Business SLA
• Balance & Controls
• Application testing
• Data quality
• Process to “productionalize” apps
• High fidelity execution analysis
• Real-time monitoring
• …
PRODUCTSANDTECHNOLOGY
LINGUAL Simplifying Systems Integration
PATTERN Enabling Machine Scoring Algorithms
!
!
Big Data Application Development!
Simple, Reliable, Repeatable
!
!
Unmatched Application Insight!
Visibility into your Data Applications
Open Source Commercial
www.concurrentinc.com/products
CASCADINGECOSYSTEMISMORETHANCASCADINGFRAMEWORK
Lingual, Pattern and other Dynamic
Programming Languages such as
Scalding are part of the Cascading
Ecosystem and are included as part
of the Cascading kit
http://www.cascading.org/extensions/
LINGUAL
• Lingual is an extension to Cascading that
executes ANSI SQL queries as Cascading
apps!
• Supports integrating with any data source
that can be accessed through JDBC —
Cascading Tap can be created for any
source supporting JDBC!
• Great for migration of data, integrating
with non-Big Data assets — extends life
of existing IT assets in an organization
Query Planner
JDBC API Lingual APIProvider API
Cascading
Apache Hadoop
Lingual
Data Stores
CLI / Shell Enterprise Java
Catalog
SCALDING
• Scalding is a language binding to Cascading for Scala!
- The name Scalding comes from the combining of SCALa and
cascaDING!
• Scalding is great for Scala developers; can crisply write
constructs for matrix math… !
• Scalding has very large commercial deployments at:!
- Twitter - Use cases such as the revenue quality team, ad
targeting and traffic quality!
- Ebay - Use cases include search analytics and other production
data pipelines
DRIVENOVERVIEW
What is Driven?!
The first application
performance management
product for Big Data
applications
Capabilities
Visualize your
Data App!
No more black box!
Instantly visualize your
running app in real-time
Diagnose App
Failures!
Identify where and how your
app failed… all without
sorting through logs!
Track App
Performance!
For all your apps, view and
compare history of your
app’s runtime performance
Insight into your
Applications!
At any moment, quickly
understand what your app
is doing on your cluster
LINGUAL
PATTERN
SCALDING
CASCALOG
Benefits
Key Features
• Accelerate Time to Market
• Build Reliable Applications
• Optimize Application Performance
• Application visualization
• Dashboard performance view
• Application performance history
• Insights for each application (workflow,
telemetry, error types)
• Team collaboration and management
Works with:
www.cascading.io
Driven is free for developer use (cloud)
Lingual Pattern
Availability
Cascading 2.5 

Available Now
Lingual 1.1 

Available Now
Pattern 1.0-WIP

WIP Available Now
License Apache License 2.0 Apache License 2.0 Apache License 2.0
Support
Community Forums &
Mailing List, Enterprise
Support
Community Forums &
Mailing List, Enterprise
Support
Community Forums &
Mailing List, Enterprise
Support
CASCADINGAVAILABILITY
Cascading, Lingual and Pattern are open source projects freely available to the general public under Apache License 2.0
ConfidentialConfidential29
Summary!
• APM for Big Data | The first application performance management product for Big Data applications
!
!
!
!
• For Developers and Operators | Significantly improves developer productivity and operations control by providing an
unprecedented level of insight into building and managing enterprise-grade data applications
• Collaboration | Facilitates and encourages user collaboration to build enterprise data applications
• Community Integration | Driven is a free cloud service integrated with the Cascading open source community
• Licensing | Driven is free for development (cloud only) and licensable for production or on-premise deployments
• Deployment Options | Deploy in the cloud or on-premise
Accelerate Time to Market
Process visualization and monitoring
capabilities in a rich UI
Build Reliable Apps
Detailed insight into data processing
logic and algorithms
Optimize App Performance
Key application behavior metrics with
historical data to trend performance
GETSTARTEDWITHCASCADINGONHDP2.1
1. Download HDP 2.1
2. Take Cascading for a spin
by running the Impatient
tutorial at http://docs.cascading.org/
impatient/
CONTACTINFORMATION
Supreet Oberoi!
supreet@concurrentinc.com
650-868-7675 (m)
@supreet_online
DRIVINGINNOVATION
THROUGHDATATHANKYOU
Supreet Oberoi | April 18, 2014
Page 13
SAN JOSE
June 3-5
AMSTERDAM
April 2-3
•  6 tracks, 3 days, and 120+ sessions to choose from
•  Community Focused - Sessions voted on by the public and
selected by a committee of industry luminaries
•  Deep Dive Technical Content - Including a Committer track with
content presented by Apache committers
•  Business and Technical Topics
•  Community Activities - Hadoop Summit will host community meet-
ups and birds of a feather sessions
www.hadoopsummit.org
The Largest Hadoop Community Events in
Europe and North America
Page 14
Questions?
Use the Q/A panel to ask your questions
Download the Hortonworks Sandbox and Cascading
•  Cascading and HDP 2.1 Sandbox
•  Hortonworks Sandbox
•  Cascading Impatient Tutorial

Contenu connexe

Tendances

Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 

Tendances (20)

Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 

En vedette

Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Hortonworks
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache AmbariHortonworks
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...Cascading
 
Extending Application Data In The Cloud
Extending Application Data In The CloudExtending Application Data In The Cloud
Extending Application Data In The CloudRonald Bradford
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS Amazon Web Services
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSSKeyur Thakore
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 

En vedette (18)

Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
HDP2 and YARN operations point
HDP2 and YARN operations pointHDP2 and YARN operations point
HDP2 and YARN operations point
 
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
 
Extending Application Data In The Cloud
Extending Application Data In The CloudExtending Application Data In The Cloud
Extending Application Data In The Cloud
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 

Similaire à Accelerate Big Data Application Development with Cascading and HDP, Hortonworks and Concurrent webinar 4-22-2014

Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearnCascading
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Etu Solution
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 

Similaire à Accelerate Big Data Application Development with Cascading and HDP, Hortonworks and Concurrent webinar 4-22-2014 (20)

Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Cascading concurrent yahoo lunch_nlearn
Cascading concurrent   yahoo lunch_nlearnCascading concurrent   yahoo lunch_nlearn
Cascading concurrent yahoo lunch_nlearn
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 

Plus de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Plus de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Accelerate Big Data Application Development with Cascading and HDP, Hortonworks and Concurrent webinar 4-22-2014

  • 1. Page 1 Accelerate Big Data Application Development with Cascading and HDP April 22, 2014
  • 2. Page 2 Agenda •  Take advantage of the latest Hadoop processing frameworks like YARN and Tez in HDP 2.1 •  How developers can create future proof, data-driven applications built on Apache Hadoop with Cascading •  How Cascading accelerates Hadoop application development by abstracting the platforms underneath
  • 3. Page 3 Speakers Ajay Singh, Director of Technical Channels, Hortonworks Supreet Oberoi, VP of Field Engineering, Concurrent
  • 4. Page 4 Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop Our Mission: Reseller Partners: Headquartered in Palo Alto, CA; 300+ employees and growing
  • 5. Page 5 A data architecture under pressure from new data APPLICATIONS*DATA**SYSTEM* REPOSITORIES* SOURCES* Exis4ng*Sources** (CRM,*ERP,*Clickstream,* Logs)* RDBMS* EDW* MPP* Business** Analy4cs* Custom* Applica4ons* Packaged* Applica4ons* Source: IDC 2.8*ZB*in*2012* 85%*from*New*Data*Types* 15x*Machine*Data*by*2020* 40*ZB*by*2020* OLTP,&ERP,&CRM& Systems& Unstructured&documents,& emails& Clickstream& Server&logs& Sen>ment,&Web& Data& Sensor.&Machine& Data& Geoloca>on&
  • 6. Page 6 A Modern Data ArchitectureAPPLICATIONS*DATA**SYSTEM* REPOSITORIES* SOURCES* Exis4ng*Sources** (CRM,*ERP,*Clickstream,*Logs)* RDBMS* EDW* MPP* Emerging*Sources** (Sensor,*Sen4ment,*Geo,*Unstructured)* OPERATIONAL* TOOLS* MANAGE*&* MONITOR* DEV*&*DATA* TOOLS* BUILD*&* TEST* Business** Analy4cs* Custom* Applica4ons* Packaged* Applica4ons* Governance &Integration ENTERPRISE HADOOP Security Operations Data Access Data Management
  • 7. Page 7 Clickstream Capture and analyze website visitors’ data trails and optimize your website Sensors Discover patterns in data streaming automatically from remote sensors and machines Server Logs Research logs to diagnose process failures and prevent security breaches New types of dataHadoop Value: Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  • 8. Page 8 Enterprise Hadoop: Core Foundation of Hadoop Applications
  • 9. Page 9 Core Capabilities of Enterprise Hadoop Load data and manage according to policy Deploy and effectively manage the platform Store and process all of your Corporate Data Assets & Access your data simultaneously in multiple ways (batch, interactive, real-time) Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection & DATA**MANAGEMENT* SECURITY*DATA**ACCESS* GOVERNANCE*&* INTEGRATION* OPERATIONS* Enable both existing and new application to provide value to the organization PRESENTATION*&*APPLICATION* Empower existing operations and security tools to manage Hadoop ENTERPRISE*MGMT*&*SECURITY* Provide deployment choice across physical, virtual, cloud DEPLOYMENT*OPTIONS*
  • 10. Page 10 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform ** Provision,* Manage*&* Monitor* & Ambari& Zookeeper& Scheduling* & Oozie& Data*Workflow,* Lifecycle*&* Governance* * Falcon& Sqoop& Flume& NFS& WebHDFS& YARN*:*Data*Opera4ng*System& DATA**MANAGEMENT* SECURITY*DATA**ACCESS* GOVERNANCE*&* INTEGRATION* Authen4ca4on* Authoriza4on* Accoun4ng* Data*Protec4on* & Storage:&HDFS& Resources:&YARN& Access:&Hive,&…&& Pipeline:&Falcon& Cluster:&Knox& OPERATIONS* Script* & Pig& * * Search* * Solr& * * SQL* * Hive/Tez,& HCatalog& * * NoSQL* * HBase& Accumulo& * * Stream* ** Storm& & * * Others* * InUMemory& Analy>cs,&& ISV&engines& 1& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& °& N* HDFS** (Hadoop&Distributed&File&System)& Batch* * Map& Reduce& * * Deployment*Choice& Linux Windows On-Premise Cloud
  • 11. Page 11 Hadoop is wholly integrated into the data center APPLICATIONS*DATA**SYSTEM*SOURCES* RDBMS* EDW* MPP* Emerging*Sources** (Sensor,*Sen4ment,*Geo,*Unstructured)* HANA BusinessObjects BI OPERATIONAL*TOOLS* DEV*&*DATA*TOOLS* Exis4ng*Sources** (CRM,*ERP,*Clickstream,*Logs)* INFRASTRUCTURE* HDP 2.1Governance &Integration Security Operations Data Access Data Management
  • 12. Page 12 Developing Apps on Hadoop •  Spring XD Framework –  Consistent configuration & Java API across wide range of Hadoop ecosystem projects •  Microsoft .NET SDK For Hadoop –  API access to HDP on windows and HDInsight service –  LINQ libraries for accessing Hive •  Cascading –  Delivers an easy to use abstraction layer for developing Hadoop applications –  Supports development in Scala & Clojure –  Hortonworks to Certify, Support & Deliver Cascading SDK with Hortonworks Data Platform
  • 14. HORTONWORKSPARTNERSWITHCONCURRENT • The Cascading SDK will now be integrated with the Hortonworks Data Platform (HDP) • Hortonworks will certify and support Cascading™ SDK with HDP • Cascading will support Apache Tez; companies using Cascading or domain-specific languages on Cascading can seamlessly migrate HDP supporting Apache Tez The partnership benefits users by combining the power and simplicity of Cascading with the reliability and stability of HDP.
  • 15. Confidential AGENDA 3 • Who is Concurrent • What is Cascading • Where is it used • What problems does Cascading solve • What is included in the Cascading kit !
  • 17. Confidential GETTOKNOWCONCURRENT 5 Leader in Application Infrastructure for Big Data! • Building enterprise software to simplify Big Data application development and management Products and Technology! • CASCADING
 The most widely used application infrastructure for building Big Data applications with over 150,000 downloads each month • DRIVEN
 Enterprise Data Application management for Big Data apps Proven - Simple, Reliable, Robust! • Thousands of enterprises rely on Concurrent to provide their data application infrastructure. Founded: 2008 HQ: San Francisco, CA ! CEO: Gary Nakamura CTO, Founder: Chris Wensel ! www.concurrentinc.com
  • 18. PRODUCTSANDTECHNOLOGY ! ! Big Data Application Development! Simple, Reliable, Repeatable ! ! Unmatched Application Insight! Visibility into your Data Applications Open Source Commercial www.concurrentinc.com/products Open Source Community! Focused on Data App Development ! Project home of Cascading Collection of sub-projects / tools ! ! Data App Management! Realtime monitoring Performance Management Operational Control Data Provenance Compliance Governance
  • 19. BUSINESSESDEPENDONUS • Cascading Java API • Data normalization and cleansing of search and click-through logs for use by analytics tools, Hive analysts • Easy to operationalize heavy lifting of data
  • 20. BUSINESSESDEPENDONUS • Cascalog (Clojure) • Weather pattern modeling to protect growers against loss • ETL against 20+ datasets daily • Machine learning to create models • Purchased by Monsanto for $930M US
  • 21. BUSINESSESDEPENDONUS • Scalding (Scala) • Machine learning (linear algebra) to improve • User experience • Ad quality (matching users and ad effectiveness) • All revenue applications are running on Cascading/Scalding • IPO TWITTER
  • 22. BUSINESSESDEPENDONUS • Estimate suicide risk from what people write online • Cascading + Cassandra • You can do more than optimize add yields • http://www.durkheimproject.org
  • 24. DRIVINGADVANTAGEWITHDATAAPPLICATIONS Enterprise IT! Extract Transform Load Log File Analysis Systems Integration Operations Analysis ! Corporate Apps! HR Analytics Employee Behavioral Analysis Customer Support | eCRM Business Reporting ! Telecom! Data processing of Open Data Geospatial Indexing Consumer Mobile Apps Location based services Marketing / Retail! Mobile, Social, Search Analytics Funnel analysis Revenue attribution Customer experiments Ad Optimization Retail recommenders ! Consumer / Entertainment! Music Recommendation Comparison Shopping Restaurant Rankings Real Estate Rental Listings Travel Search & Forecast ! ! Finance! Fraud and Anomaly Detection Fraud Experiments Customer Analytics Insurance Risk Metric ! Health / Biotech! Aggregate metrics for Govt Person biometrics Veterinary diagnostics Next-Gen Genomics Argonomics Environmental Maps !
  • 25. BIGDATA—THENEXTPHASEOFMATURITY “It’s all about the Apps”" There needs to be a comprehensive solution for building, deploying, running and managing these new class of enterprise applications Business Strategy Data & Technology Loyalty and promotions analysis Retention campaigns Marketing campaign optimization Fraud detection Risk management Scientific research Remote monitoring and diagnosis and more! Your Data & Systems Hadoop, EDW, Mainframe, System Logs, NO SQL DBs, etc.Challenges! ! Leveraging existing skill sets, existing systems, past investments and existing business processes Connecting Business and Data
  • 27. • Java API (alternative to Hadoop MapReduce) • Separates business logic from integration • Testable at every lifecycle stage • Works with any JVM language • Many integration adapters CASCADING 15 Process Planner Processing API Integration API Scheduler API Scheduler Apache Hadoop Cascading Data Stores Scripting Scala, Clojure, JRuby, Jython, Groovy Enterprise Java
  • 30. • Functions • Filters • Joins ‣ Inner / Outer / Mixed ‣ Asymmetrical / Symmetrical • Merge (Union) • Grouping ‣ Secondary Sorting ‣ Unique (Distinct) • Aggregations ‣ Count, Average, etc ‣ Rolling windows SOMECOMMONPATTERNS 18 filter filter function functionfilterfunction data Pipeline Split Join Merge data Topology
  • 31. WORDCOUNTEXAMPLE! ! String docPath = args[ 0 ];! String wcPath = args[ 1 ];! Properties properties = new Properties();! AppProps.setApplicationJarClass( properties, Main.class );! HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );! ! configuration integration ! // create source and sink taps! Tap docTap = new Hfs( new TextDelimited( true, "t" ), docPath );! Tap wcTap = new Hfs( new TextDelimited( true, "t" ), wcPath );! ! processing // specify a regex to split "document" text lines into token stream! Fields token = new Fields( "token" );! Fields text = new Fields( "text" );! RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ [](),.]" );! // only returns "token"! Pipe docPipe = new Each( "token", text, splitter, Fields.RESULTS );! // determine the word counts! Pipe wcPipe = new Pipe( "wc", docPipe );! wcPipe = new GroupBy( wcPipe, token );! wcPipe = new Every( wcPipe, Fields.ALL, new Count(), Fields.ALL );! scheduling ! // connect the taps, pipes, etc., into a flow definition! FlowDef flowDef = FlowDef.flowDef().setName( "wc" )! .addSource( docPipe, docTap )!  .addTailSink( wcPipe, wcTap );! // create the Flow! Flow wcFlow = flowConnector.connect( flowDef ); // <<-- Unit of Work! wcFlow.complete(); // <<-- Runs jobs on Cluster
  • 32. CASCADINGOVERVIEW www.cascading.org Build Data Apps that are scale-free!! ! ! Design principals ensure best practices at any scale Test-Driven Development! ! Efficiently test code and process local files before you deploy on a cluster Staffing Bottleneck! ! Use existing Java, SQL, modeling skills sets Operational Complexity! ! Simple - Package up into one jar and hand to operations Application Portability! ! ! Write once, then run on different computation fabrics. Systems Integration! ! ! Hadoop never lives alone. Easily integrate to your existing systems! Proven application development framework for building Data applications Framework addresses
  • 33. OPERATIONALREADINESS:DISCIPLINE&ABILITYTOMEASURE • Visibility into app development • Business SLA • Balance & Controls • Application testing • Data quality • Process to “productionalize” apps • High fidelity execution analysis • Real-time monitoring • …
  • 34. PRODUCTSANDTECHNOLOGY LINGUAL Simplifying Systems Integration PATTERN Enabling Machine Scoring Algorithms ! ! Big Data Application Development! Simple, Reliable, Repeatable ! ! Unmatched Application Insight! Visibility into your Data Applications Open Source Commercial www.concurrentinc.com/products
  • 35. CASCADINGECOSYSTEMISMORETHANCASCADINGFRAMEWORK Lingual, Pattern and other Dynamic Programming Languages such as Scalding are part of the Cascading Ecosystem and are included as part of the Cascading kit http://www.cascading.org/extensions/
  • 36. LINGUAL • Lingual is an extension to Cascading that executes ANSI SQL queries as Cascading apps! • Supports integrating with any data source that can be accessed through JDBC — Cascading Tap can be created for any source supporting JDBC! • Great for migration of data, integrating with non-Big Data assets — extends life of existing IT assets in an organization Query Planner JDBC API Lingual APIProvider API Cascading Apache Hadoop Lingual Data Stores CLI / Shell Enterprise Java Catalog
  • 37. SCALDING • Scalding is a language binding to Cascading for Scala! - The name Scalding comes from the combining of SCALa and cascaDING! • Scalding is great for Scala developers; can crisply write constructs for matrix math… ! • Scalding has very large commercial deployments at:! - Twitter - Use cases such as the revenue quality team, ad targeting and traffic quality! - Ebay - Use cases include search analytics and other production data pipelines
  • 38. DRIVENOVERVIEW What is Driven?! The first application performance management product for Big Data applications Capabilities Visualize your Data App! No more black box! Instantly visualize your running app in real-time Diagnose App Failures! Identify where and how your app failed… all without sorting through logs! Track App Performance! For all your apps, view and compare history of your app’s runtime performance Insight into your Applications! At any moment, quickly understand what your app is doing on your cluster LINGUAL PATTERN SCALDING CASCALOG Benefits Key Features • Accelerate Time to Market • Build Reliable Applications • Optimize Application Performance • Application visualization • Dashboard performance view • Application performance history • Insights for each application (workflow, telemetry, error types) • Team collaboration and management Works with: www.cascading.io
  • 39. Driven is free for developer use (cloud)
  • 40. Lingual Pattern Availability Cascading 2.5 
 Available Now Lingual 1.1 
 Available Now Pattern 1.0-WIP
 WIP Available Now License Apache License 2.0 Apache License 2.0 Apache License 2.0 Support Community Forums & Mailing List, Enterprise Support Community Forums & Mailing List, Enterprise Support Community Forums & Mailing List, Enterprise Support CASCADINGAVAILABILITY Cascading, Lingual and Pattern are open source projects freely available to the general public under Apache License 2.0
  • 41. ConfidentialConfidential29 Summary! • APM for Big Data | The first application performance management product for Big Data applications ! ! ! ! • For Developers and Operators | Significantly improves developer productivity and operations control by providing an unprecedented level of insight into building and managing enterprise-grade data applications • Collaboration | Facilitates and encourages user collaboration to build enterprise data applications • Community Integration | Driven is a free cloud service integrated with the Cascading open source community • Licensing | Driven is free for development (cloud only) and licensable for production or on-premise deployments • Deployment Options | Deploy in the cloud or on-premise Accelerate Time to Market Process visualization and monitoring capabilities in a rich UI Build Reliable Apps Detailed insight into data processing logic and algorithms Optimize App Performance Key application behavior metrics with historical data to trend performance
  • 42. GETSTARTEDWITHCASCADINGONHDP2.1 1. Download HDP 2.1 2. Take Cascading for a spin by running the Impatient tutorial at http://docs.cascading.org/ impatient/
  • 45. Page 13 SAN JOSE June 3-5 AMSTERDAM April 2-3 •  6 tracks, 3 days, and 120+ sessions to choose from •  Community Focused - Sessions voted on by the public and selected by a committee of industry luminaries •  Deep Dive Technical Content - Including a Committer track with content presented by Apache committers •  Business and Technical Topics •  Community Activities - Hadoop Summit will host community meet- ups and birds of a feather sessions www.hadoopsummit.org The Largest Hadoop Community Events in Europe and North America
  • 46. Page 14 Questions? Use the Q/A panel to ask your questions Download the Hortonworks Sandbox and Cascading •  Cascading and HDP 2.1 Sandbox •  Hortonworks Sandbox •  Cascading Impatient Tutorial