SlideShare une entreprise Scribd logo
1  sur  35
Tom Wheeler | Senior Curriculum Developer
April 2014
Introduction to Designing and
Building Big Data Applications
Agenda
 Cloudera's Learning Path for Developers
 Target Audience and Prerequisites
 Course Outline
 Short Presentation Based on Actual Course Material
 Question and Answer Session
Intro to
Data Science
HBase
Training
Learn to code and write MapReduce programs for production
Master advanced API topics required for real-world data analysis
Design schemas to minimize latency on massive data sets
Scale hundreds of thousands of operations per second
Implement recommenders and data experiments
Draw actionable insights from analysis of disparate data
Big Data
Applications
Build converged applications using multiple processing engines
Develop enterprise solutions using components across the EDH
Developer
Training
Learning Path: Developers
Create Powerful New Data Processing Tools
Aaron T. Myers
Software Engineer
25%
$115K
An engineer with Hadoop skills requires a min. salary premium of
Hadoop developers are now the top paid in tech, starting at
Sources: Business Insider, “10 Tech Skills That Will Instantly Net You A $100,000+ Salary,” 11 August 2012.
Business Insider, “30 Tech Skills That Will Instantly Net You A $100,000+ Salary,” 21 February 2013.
GigaOm, “Big Data Skills Bring Big Dough,” 17 February 2012.
$300K
Compensation for a very senior Data Scientist opens at
Hadoop Professionals: Build or Buy?
Professional Certification Decreases Hiring Risk
1 Broadest Range of Courses
Developer, Admin, Analyst, HBase, Data Science
2
3
Most Experienced Instructors
More than 20,000 students trained since 2009
6 Widest Geographic Coverage
Most classes offered: 50 cities worldwide plus online
7 Most Relevant Platform & Community
CDH deployed more than all other distributions combined
8 Depth of Training Material
Hands-on labs and VMs support live instruction
Leader in Certification
Over 8,000 accredited Cloudera professionals
4 Trusted Source for Training
100,000+ people have attended online courses 9 Ongoing Learning
Video tutorials and e-learning complement training
Why Cloudera Training?
Aligned to Best Practices and the Pace of Change
5 State of the Art Curriculum
Courses updated as Hadoop evolves 10Commitment to Big Data Education
University partnerships to teach Hadoop in the classroom
Designing and Building
Big Data Applications
About the Course
• Intended for people who write code, such as
• Software Engineers
• Data Engineers
• ETL Developers
Target Audience
• Successful completion of our Developer course
• Or equivalent practical experience
• Intermediate-level Java skills
• Basic familiarity with Linux
• Knowledge of SQL or HiveQL is also helpful
Course Prerequisites
Example of Required Java Skill Level
package com.cloudera.example;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class Example extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context ctx)
throws IOException, InterruptedException {
1
2
3
4
5
6
7
8
9
10
11
12
13
• Do you understand the following code? Could you write something similar?
Example of Required Linux Skill Level
• Are you comfortable editing text files on a Linux system?
• Are you familiar with the following commands?
$ mkdir -p /tmp/incoming/web_logs
$ cd /var/log/web
$ mv *.log /tmp/incoming/web_logs
• During this course, you will learn
• Determine which Hadoop-related tools are appropriate for specific tasks
• Understand how file formats, serialization, and data compression affect
application compatibility and performance
• Design and evolve schemas in Apache Avro
• Create, populate, and access data sets with the Kite SDK
• Integrate with external systems using Apache Sqoop and Apache Flume
• Integrate Apache Flume with existing applications and develop custom
components to extend Flume’s capabilities
Course Objectives
• Create, package, and deploy Oozie jobs to manage processing workflows
• Develop Java-based data processing pipelines with Apache Crunch
• Implement user-defined functions for use in Apache Hive and Impala
• Index both static and streaming data sets with Cloudera Search
• Use Hue to build a Web-based interface for Search queries
• Integrate results from Impala and Cloudera Search into your
applications
Course Objectives (continued)
• Frequent hands-on exercises
• Based on a hypothetical but realistic scenario
• Each works towards building a working application
Scenario for Hands-On Exercises
mobile
udacreo
L
Tools Used in Hands-On Exercises
HDFS
Sqoop Flume
Kite SDK / Morphlines
Ingest and Data Management
HCatalog Impala
Search
Interactive Queries
MapReduce
Crunch Hive
Batch Processing
Avro
Data Sources Used in Hands-On Exercises
RDBMS
Telecom Switches
Enterprise Data Hub
Equipment
Records
Customer
Records
Call Detail
Records
(Fixed-Width)
CRM System
Phone
Activations
(XML)
Point of Sale
Terminals
Web Servers
Static
Documents
(HTML)
Log Files
(Text)
Device Status
(CSV and TSV)
Chat
Transcripts
(JSON)
• Exercises use real-world development environment
• IDE (Eclipse)
• Unit testing library (JUnit)
• Build and configuration management tool (Maven)
Development Environment
• Introduction
• Application Architecture *
• Designing and Using Data Sets *
• Using the Kite SDK Data Module *
• Importing Relational Data with Apache Sqoop *
• Capturing Data with Apache Flume *
Course Outline
* This chapter contains a hands-on exercise
* This chapter contains multiple hands-on exercises
• Developing Custom Flume Components *
• Managing Workflows with Apache Oozie *
• Processing Data Pipelines with Apache Crunch *
• Working with Tables in Apache Hive *
• Developing User-Defined Functions *
• Executing Interactive Queries with Impala *
Course Outline (continued)
• Understanding Cloudera Search
• Indexing Data with Cloudera Search *
• Presenting Results to Users *
• Conclusion
Course Outline (continued)
• Based on chapter 3: Designing and Using Data Sets
Course Excerpt
• Define the concept of serialization
• Represents data as a series of bytes
• Allows us to store and transmit data
• There are many ways of serializing data
• How do you serialize the number 108125150?
• 4 bytes when stored as a Java int
• 9 bytes when stored as text
What is Data Serialization?
• Affects performance and storage space
• Chosen method may limit portability
• java.io.Serializable is Java-specific
• Writables are Hadoop-specific
• May also limit backwards compatibility
• Often depends on specific version of class
• Avro was developed to address these challenges
Implications of Data Serialization
• Avro is an open source data serialization framework
• Widely supported throughout Hadoop ecosystem
• Offers compatibility without sacrificing performance
• Data is serialized according to a schema you define
• Read and write from Java, C, C++, C#, Python, PHP, etc.
• Optimized binary encoding for efficient storage
• Defines rules for schema evolution
What is Apache Avro?
• Avro schemas define the structure of your data
• Similar to a CREATE TABLE in SQL, but more flexible
• Defined using JSON syntax
Avro Schemas
id name title bonus
108424 Alice Salesperson 2500
101837 Bob Manager 3000
107812 Chuck President 9000
105476 Dan Accountant 3000
Metadata
Data
• These are among the simple (scalar) types in Avro
Simple Types in Avro Schemas
Name Description
null An absence of a value
boolean A binary value
int 32-bit signed integer
long 64-bit signed integer
float Single-precision floating point value
double Double-precision floating point value
string Sequence of Unicode characters
• These are the complex types in Avro
Complex Types in Avro Schemas
Name Description
record A user-defined type composed of one or more named fields
enum A specified set of values
array Zero or more values of the same type
map Set of key-value pairs; key is string while value is of specified type
union Exactly one value matching a specified set of types
fixed A fixed number of 8-bit unsigned bytes
• SQL CREATE TABLE statement
Schema Example
CREATE TABLE employees
(id INT,
name VARCHAR(30),
title VARCHAR(20),
bonus INT);
• Equivalent Avro schema
Schema Example (Continued)
{"namespace": "com.loudacre.data",
"type": "record",
"name": "Employee",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"},
{"name": "title", "type": "string"},
{"name": "bonus", "type": "int"}
]}
• Approaches for mapping Java object to a schema
• Generic: Write code to map each field manually
• Reflect: Generate a schema from an existing class
• Specific: Generate a Java class from your schema
Mapping Avro Schema to Java Object
• Hadoop and its ecosystem support many file formats
• May ingest in one format and convert to another
• Format selection involves several considerations
• Ingest pattern
• Tool compatibility
• Expected lifetime
• Storage and performance requirements
Considerations for File Formats
• Each file format may also support compression
• Reduces amount of disk space required to store data
• Tradeoff between time and space
• Can greatly improve performance
• Many Hadoop jobs are I/O-bound
Data Compression
• Refers to organizing data according to access patterns
• Improves performance by limiting input
• Common partitioning schemes
• Customers: partition by state, province, or region
• Events: separate by year, month, and day
Data Partitioning
• Imagine that you store all Web server log files in HDFS
• Marketing runs monthly jobs for search engine optimization
• Security runs daily jobs to identify attempted exploits
Partitioning Example
2014
March May
05 06 07 08 09 1001 02 03 04 11 12 13
April
Input for monthly job
Input for daily job
Register for training and certification at
http://university.cloudera.com
Use discount code Apps10 to save 10%
on new enrollments in Big Data
Applications classes delivered by
Cloudera until July 4, 2014*
• Enter questions in the Q&A panel
• Follow Cloudera University: @ClouderaU
• Follow the Developer learning path:
http://university.cloudera.com/developers
• Learn about the enterprise data hub:
http://tinyurl.com/edh-webinar
• Join the Cloudera user community:
http://community.cloudera.com/
• Get Developer Certification:
http://university.cloudera.com/certification
• Explore Developer resources for Hadoop:
http://cloudera.com/content/dev-center/en/home.html
* Excludes classes sold or delivered by other partners

Contenu connexe

Tendances

Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerDataWorks Summit
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 

Tendances (20)

Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 

En vedette

Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Jeffrey Breen
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015Michael Ponomarew
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paulvandananicky
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysisCAST
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsPaul singh
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersjanritari
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesDataWorks Summit
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing managementBabasab Patil
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StorySauce Labs
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)Nurhazman Abdul Aziz
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work Cav1234
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
GRE Computer Raw Conversion Table
GRE Computer Raw Conversion TableGRE Computer Raw Conversion Table
GRE Computer Raw Conversion TableSuccess Prep
 
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...CA Technologies
 

En vedette (19)

Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysis
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applications
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answers
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing management
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life Story
 
Progeny LIMS
Progeny LIMSProgeny LIMS
Progeny LIMS
 
Getting Past No
Getting Past NoGetting Past No
Getting Past No
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)
 
Matrix Effect
Matrix EffectMatrix Effect
Matrix Effect
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
GRE Computer Raw Conversion Table
GRE Computer Raw Conversion TableGRE Computer Raw Conversion Table
GRE Computer Raw Conversion Table
 
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
Digital Assurance: Develop a Comprehensive Testing Strategy for Digital Trans...
 

Similaire à Introduction to Designing and Building Big Data Applications

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkAlex Zeltov
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldDatabricks
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaMopuru Babu
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 

Similaire à Introduction to Designing and Building Big Data Applications (20)

Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Ml2
Ml2Ml2
Ml2
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 

Dernier (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

Introduction to Designing and Building Big Data Applications

  • 1. Tom Wheeler | Senior Curriculum Developer April 2014 Introduction to Designing and Building Big Data Applications
  • 2. Agenda  Cloudera's Learning Path for Developers  Target Audience and Prerequisites  Course Outline  Short Presentation Based on Actual Course Material  Question and Answer Session
  • 3. Intro to Data Science HBase Training Learn to code and write MapReduce programs for production Master advanced API topics required for real-world data analysis Design schemas to minimize latency on massive data sets Scale hundreds of thousands of operations per second Implement recommenders and data experiments Draw actionable insights from analysis of disparate data Big Data Applications Build converged applications using multiple processing engines Develop enterprise solutions using components across the EDH Developer Training Learning Path: Developers Create Powerful New Data Processing Tools Aaron T. Myers Software Engineer
  • 4. 25% $115K An engineer with Hadoop skills requires a min. salary premium of Hadoop developers are now the top paid in tech, starting at Sources: Business Insider, “10 Tech Skills That Will Instantly Net You A $100,000+ Salary,” 11 August 2012. Business Insider, “30 Tech Skills That Will Instantly Net You A $100,000+ Salary,” 21 February 2013. GigaOm, “Big Data Skills Bring Big Dough,” 17 February 2012. $300K Compensation for a very senior Data Scientist opens at Hadoop Professionals: Build or Buy? Professional Certification Decreases Hiring Risk
  • 5. 1 Broadest Range of Courses Developer, Admin, Analyst, HBase, Data Science 2 3 Most Experienced Instructors More than 20,000 students trained since 2009 6 Widest Geographic Coverage Most classes offered: 50 cities worldwide plus online 7 Most Relevant Platform & Community CDH deployed more than all other distributions combined 8 Depth of Training Material Hands-on labs and VMs support live instruction Leader in Certification Over 8,000 accredited Cloudera professionals 4 Trusted Source for Training 100,000+ people have attended online courses 9 Ongoing Learning Video tutorials and e-learning complement training Why Cloudera Training? Aligned to Best Practices and the Pace of Change 5 State of the Art Curriculum Courses updated as Hadoop evolves 10Commitment to Big Data Education University partnerships to teach Hadoop in the classroom
  • 6. Designing and Building Big Data Applications About the Course
  • 7. • Intended for people who write code, such as • Software Engineers • Data Engineers • ETL Developers Target Audience
  • 8. • Successful completion of our Developer course • Or equivalent practical experience • Intermediate-level Java skills • Basic familiarity with Linux • Knowledge of SQL or HiveQL is also helpful Course Prerequisites
  • 9. Example of Required Java Skill Level package com.cloudera.example; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class Example extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context ctx) throws IOException, InterruptedException { 1 2 3 4 5 6 7 8 9 10 11 12 13 • Do you understand the following code? Could you write something similar?
  • 10. Example of Required Linux Skill Level • Are you comfortable editing text files on a Linux system? • Are you familiar with the following commands? $ mkdir -p /tmp/incoming/web_logs $ cd /var/log/web $ mv *.log /tmp/incoming/web_logs
  • 11. • During this course, you will learn • Determine which Hadoop-related tools are appropriate for specific tasks • Understand how file formats, serialization, and data compression affect application compatibility and performance • Design and evolve schemas in Apache Avro • Create, populate, and access data sets with the Kite SDK • Integrate with external systems using Apache Sqoop and Apache Flume • Integrate Apache Flume with existing applications and develop custom components to extend Flume’s capabilities Course Objectives
  • 12. • Create, package, and deploy Oozie jobs to manage processing workflows • Develop Java-based data processing pipelines with Apache Crunch • Implement user-defined functions for use in Apache Hive and Impala • Index both static and streaming data sets with Cloudera Search • Use Hue to build a Web-based interface for Search queries • Integrate results from Impala and Cloudera Search into your applications Course Objectives (continued)
  • 13. • Frequent hands-on exercises • Based on a hypothetical but realistic scenario • Each works towards building a working application Scenario for Hands-On Exercises mobile udacreo L
  • 14. Tools Used in Hands-On Exercises HDFS Sqoop Flume Kite SDK / Morphlines Ingest and Data Management HCatalog Impala Search Interactive Queries MapReduce Crunch Hive Batch Processing Avro
  • 15. Data Sources Used in Hands-On Exercises RDBMS Telecom Switches Enterprise Data Hub Equipment Records Customer Records Call Detail Records (Fixed-Width) CRM System Phone Activations (XML) Point of Sale Terminals Web Servers Static Documents (HTML) Log Files (Text) Device Status (CSV and TSV) Chat Transcripts (JSON)
  • 16. • Exercises use real-world development environment • IDE (Eclipse) • Unit testing library (JUnit) • Build and configuration management tool (Maven) Development Environment
  • 17. • Introduction • Application Architecture * • Designing and Using Data Sets * • Using the Kite SDK Data Module * • Importing Relational Data with Apache Sqoop * • Capturing Data with Apache Flume * Course Outline * This chapter contains a hands-on exercise * This chapter contains multiple hands-on exercises
  • 18. • Developing Custom Flume Components * • Managing Workflows with Apache Oozie * • Processing Data Pipelines with Apache Crunch * • Working with Tables in Apache Hive * • Developing User-Defined Functions * • Executing Interactive Queries with Impala * Course Outline (continued)
  • 19. • Understanding Cloudera Search • Indexing Data with Cloudera Search * • Presenting Results to Users * • Conclusion Course Outline (continued)
  • 20. • Based on chapter 3: Designing and Using Data Sets Course Excerpt
  • 21. • Define the concept of serialization • Represents data as a series of bytes • Allows us to store and transmit data • There are many ways of serializing data • How do you serialize the number 108125150? • 4 bytes when stored as a Java int • 9 bytes when stored as text What is Data Serialization?
  • 22. • Affects performance and storage space • Chosen method may limit portability • java.io.Serializable is Java-specific • Writables are Hadoop-specific • May also limit backwards compatibility • Often depends on specific version of class • Avro was developed to address these challenges Implications of Data Serialization
  • 23. • Avro is an open source data serialization framework • Widely supported throughout Hadoop ecosystem • Offers compatibility without sacrificing performance • Data is serialized according to a schema you define • Read and write from Java, C, C++, C#, Python, PHP, etc. • Optimized binary encoding for efficient storage • Defines rules for schema evolution What is Apache Avro?
  • 24. • Avro schemas define the structure of your data • Similar to a CREATE TABLE in SQL, but more flexible • Defined using JSON syntax Avro Schemas id name title bonus 108424 Alice Salesperson 2500 101837 Bob Manager 3000 107812 Chuck President 9000 105476 Dan Accountant 3000 Metadata Data
  • 25. • These are among the simple (scalar) types in Avro Simple Types in Avro Schemas Name Description null An absence of a value boolean A binary value int 32-bit signed integer long 64-bit signed integer float Single-precision floating point value double Double-precision floating point value string Sequence of Unicode characters
  • 26. • These are the complex types in Avro Complex Types in Avro Schemas Name Description record A user-defined type composed of one or more named fields enum A specified set of values array Zero or more values of the same type map Set of key-value pairs; key is string while value is of specified type union Exactly one value matching a specified set of types fixed A fixed number of 8-bit unsigned bytes
  • 27. • SQL CREATE TABLE statement Schema Example CREATE TABLE employees (id INT, name VARCHAR(30), title VARCHAR(20), bonus INT);
  • 28. • Equivalent Avro schema Schema Example (Continued) {"namespace": "com.loudacre.data", "type": "record", "name": "Employee", "fields": [ {"name": "id", "type": "int"}, {"name": "name", "type": "string"}, {"name": "title", "type": "string"}, {"name": "bonus", "type": "int"} ]}
  • 29. • Approaches for mapping Java object to a schema • Generic: Write code to map each field manually • Reflect: Generate a schema from an existing class • Specific: Generate a Java class from your schema Mapping Avro Schema to Java Object
  • 30. • Hadoop and its ecosystem support many file formats • May ingest in one format and convert to another • Format selection involves several considerations • Ingest pattern • Tool compatibility • Expected lifetime • Storage and performance requirements Considerations for File Formats
  • 31. • Each file format may also support compression • Reduces amount of disk space required to store data • Tradeoff between time and space • Can greatly improve performance • Many Hadoop jobs are I/O-bound Data Compression
  • 32. • Refers to organizing data according to access patterns • Improves performance by limiting input • Common partitioning schemes • Customers: partition by state, province, or region • Events: separate by year, month, and day Data Partitioning
  • 33. • Imagine that you store all Web server log files in HDFS • Marketing runs monthly jobs for search engine optimization • Security runs daily jobs to identify attempted exploits Partitioning Example 2014 March May 05 06 07 08 09 1001 02 03 04 11 12 13 April Input for monthly job Input for daily job
  • 34.
  • 35. Register for training and certification at http://university.cloudera.com Use discount code Apps10 to save 10% on new enrollments in Big Data Applications classes delivered by Cloudera until July 4, 2014* • Enter questions in the Q&A panel • Follow Cloudera University: @ClouderaU • Follow the Developer learning path: http://university.cloudera.com/developers • Learn about the enterprise data hub: http://tinyurl.com/edh-webinar • Join the Cloudera user community: http://community.cloudera.com/ • Get Developer Certification: http://university.cloudera.com/certification • Explore Developer resources for Hadoop: http://cloudera.com/content/dev-center/en/home.html * Excludes classes sold or delivered by other partners