SlideShare une entreprise Scribd logo
1  sur  92
Télécharger pour lire hors ligne
Java BigData Full Stack
Development as is ...
Alexey Zinovyev, Java Trainer in EPAM
About
With IT since 2007
With Java since 2009
With Hadoop since 2012
With EPAM since 2015
3Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs
4Java Big Data Full Stack Development
The Good Old Days
5Java Big Data Full Stack Development
HRs & RMs are looking for Java developers
6Java Big Data Full Stack Development
Is Java Dream Team waiting You?
7Java Big Data Full Stack Development
Required Skills
• Advanced SQL
• Basic Linux
• Core Java & JVM
• Backend Development Experience
• Basic Computer Science Level
8Java Big Data Full Stack Development
REAL WORLD
9Java Big Data Full Stack Development
Let’s just use Javascript in frontend ONLY
10Java Big Data Full Stack Development
In frontend
ONLY?
11Java Big Data Full Stack Development
Cruel world
12Java Big Data Full Stack Development
Do you know ML JS library?
13Java Big Data Full Stack Development
Wild animals everywhere
14Java Big Data Full Stack Development
And what I tell you
15Java Big Data Full Stack Development
And what I tell you
16Java Big Data Full Stack Development
It’s Time for Java Superhero, yeah!
17Java Big Data Full Stack Development
Before patterns discovering you should ..
• Select small pieces
• Define default values for missed
data
• Remove strange signals from data
• Merge some tables in one if
required
18Java Big Data Full Stack Development
How it really works
• Share your date with us
• Our magic manipulations
• Building an answering machine
• PROFIT!!!
19Java Big Data Full Stack Development
How to start?
20Java Big Data Full Stack Development
21Java Big Data Full Stack Development
WHAT IS BIG DATA?
22Java Big Data Full Stack Development
Joke about Excel
23Java Big Data Full Stack Development
5V
24Java Big Data Full Stack Development
Every 60 seconds…
25Java Big Data Full Stack Development
From Mobile Devices
26Java Big Data Full Stack Development
From Industry
27Java Big Data Full Stack Development
We started to keep and handle stupid new things!
28Java Big Data Full Stack Development
10^6 rows
in MySQL
29Java Big Data Full Stack Development
GB->TB->PB->?
30Java Big Data Full Stack Development
Is BigData about PBs?
31Java Big Data Full Stack Development
Is BigData about PBs?
32Java Big Data Full Stack Development
It’s hard to …
• .. store
• .. handle
• .. search in
• .. visualize
• .. send in network
33Java Big Data Full Stack Development
Likes in Classmates: how to count?
34Java Big Data Full Stack Development
Crazy Zoo
2012
35Java Big Data Full Stack Development
Crazy Zoo
2016
36Java Big Data Full Stack Development
What will be
lighted this
training
37Java Big Data Full Stack Development
NOSQL
38Java Big Data Full Stack Development
What’s the problem with RBDMS’s
• Caching
• Master/Slave
• Cluster
• Table Partitioning
• Sharding
39Java Big Data Full Stack Development
Family
40Java Big Data Full Stack Development
Database
party
41Java Big Data Full Stack Development
Spring Data
42Java Big Data Full Stack Development
How to start?
43Java Big Data Full Stack Development
Java MongoDB Driver + Robomongo
44Java Big Data Full Stack Development
BIG DATA TOOL MASTER
VS
DATA SCIENTIST
45Java Big Data Full Stack Development
TRAIN
MODEL
46Java Big Data Full Stack Development
Datasets
• Facebook users, tweets
• Trade transactions
• Government
• Medicine (genomic data)
• Telecommunications
47Java Big Data Full Stack Development
Data Sources
• Relational Databases
• Data warehouses (Historical data)
• Files in CSV or in binary format
• Internet or electronic mails
• Scientific, research (R, Octave,
Matlab)
48Java Big Data Full Stack Development
Hey, man, predict something!
49Java Big Data Full Stack Development
Man or sofa?
50Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
51Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
52Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
53Java Big Data Full Stack Development
Typical questions for DM
• Which loan applicants are high-risk?
• How do we detect phone card fraud?
• What is the revenue prediction for next year?
• Can you recommend music for users?
54Java Big Data Full Stack Development
Green circle is blue square or red
triangle? Let’s ask its neighbors!
kNN (k-nearest neighbor)
55Java Big Data Full Stack Development
Collaborative Filtering
56Java Big Data Full Stack Development
Machine Learning vs Traditional Programming
57Java Big Data Full Stack Development
Data
Science
58Java Big Data Full Stack Development
Can a Java programmer to be a Data Scientist?
59Java Big Data Full Stack Development
Sexy Data Scientist
60Java Big Data Full Stack Development
Real Data Scientist
61Java Big Data Full Stack Development
How to start?
62Java Big Data Full Stack Development
Weka
63Java Big Data Full Stack Development
HADOOP
64Java Big Data Full Stack Development
Hadoop and Data Knights
65Java Big Data Full Stack Development
Hadoop
66Java Big Data Full Stack Development
MapReduce in different languages
67Java Big Data Full Stack Development
MapReduce for WordCount
68Java Big Data Full Stack Development
Hadoop
Jobs
69Java Big Data Full Stack Development
Hadoop frameworks
• Universal (MapReduce, Tez, RDD in Spark)
• Abstract (Pig, Pipeline Spark)
• SQL - like (Hive, Impala, Spark SQL)
• Processing graph (Giraph, GraphX)
• Machine Learning (Mahout, MLib)
• Stream processing (Spark Streaming, Storm)
70Java Big Data Full Stack Development
SPARK
71Java Big Data Full Stack Development
SPARK: the bloody son of MR
• MapReduce in memory
• Up to 50x faster than Hadoop
• RDD is a basic building block
(immutable distributed
collections of objects)
• Pipeline API (no needs in PIG)
72Java Big Data Full Stack Development
Spark
Family
73Java Big Data Full Stack Development
MLlib supports
• Classification and regression
• Collaborative filtering
• Clustering
• Dimensionality reduction
• Optimization
74Java Big Data Full Stack Development
Code sample MLlib (K-Means)
// Cluster the data into two classes using KMeans
int numClusters = 2;
int numIterations = 20;
KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
// Evaluate clustering by computing Within Set Sum of Squared Errors
double WSSSE = clusters.computeCost(parsedData.rdd());
System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
// Save and load model
clusters.save(sc.sc(), "myModelPath");
KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
75Java Big Data Full Stack Development
MLlib
• .. extends scikit-learn (Python lib) and Mahout
• .. runs fully on Spark and supports Spark’s Pipeline API
• .. dataset is represented by Spark SQL’s SchemaRDD
• .. supports Hive like external data source
• .. is well for large datasets and parallelized algorithms
76Java Big Data Full Stack Development
It solves all problems!
77Java Big Data Full Stack Development
How to start?
78Java Big Data Full Stack Development
HDP Zoo
79Java Big Data Full Stack Development
Ok, Google!
80Java Big Data Full Stack Development
AWS Amazon
81Java Big Data Full Stack Development
Infrastructure issues are waiting YOU!
82Java Big Data Full Stack Development
DEEP LEARNING
83Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
84Java Big Data Full Stack Development
Deep Learning help us build NEW FUTURE
85Java Big Data Full Stack Development
HOW TO LEARN?
86Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
DIFFERENT WAYS
87Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
DIFFERENT WAYS
88Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
DIFFERENT WAYS
89Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
DIFFERENT WAYS
90Java Big Data Full Stack Development
1. Read books and write ‘pet’ projects
2. Become a mentee in Mentoring Process
3. MOOC
4. Take a training course
5. Visit conferences
DIFFERENT WAYS
91Java Big Data Full Stack Development
Recommended Books
92Java Big Data Full Stack Development
Contacts
E-mail : Alexey_Zinovyev@epam.com
Twitter : @zaleslaw @BigDataRussia
vk.com/big_data_russia Big Data Russia
vk.com/java_jvm Java & JVM langs

Contenu connexe

Tendances

Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
Mohit Jain
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
Stu Hood
 

Tendances (20)

MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
HPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL EcosystemHPTS 2011: The NoSQL Ecosystem
HPTS 2011: The NoSQL Ecosystem
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)introduction to Neo4j (Tabriz Software Open Talks)
introduction to Neo4j (Tabriz Software Open Talks)
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Sql vs nosql
Sql vs nosqlSql vs nosql
Sql vs nosql
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
 
Big Data tools in practice
Big Data tools in practiceBig Data tools in practice
Big Data tools in practice
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 
A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0A Hitchhiker's Guide to NOSQL v1.0
A Hitchhiker's Guide to NOSQL v1.0
 
NoSQL
NoSQLNoSQL
NoSQL
 

En vedette

Мастер-класс по BigData Tools для HappyDev'15
Мастер-класс по BigData Tools для HappyDev'15Мастер-класс по BigData Tools для HappyDev'15
Мастер-класс по BigData Tools для HappyDev'15
Alexey Zinoviev
 
HappyDev'15 Keynote: Когда все данные станут большими...
HappyDev'15 Keynote: Когда все данные станут большими...HappyDev'15 Keynote: Когда все данные станут большими...
HappyDev'15 Keynote: Когда все данные станут большими...
Alexey Zinoviev
 
MongoDB первые впечатления
MongoDB первые впечатленияMongoDB первые впечатления
MongoDB первые впечатления
fudz1k
 
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
phpdevby
 

En vedette (20)

Мастер-класс по BigData Tools для HappyDev'15
Мастер-класс по BigData Tools для HappyDev'15Мастер-класс по BigData Tools для HappyDev'15
Мастер-класс по BigData Tools для HappyDev'15
 
Google Docs. Zinoviev Alexey
Google Docs. Zinoviev AlexeyGoogle Docs. Zinoviev Alexey
Google Docs. Zinoviev Alexey
 
HappyDev'15 Keynote: Когда все данные станут большими...
HappyDev'15 Keynote: Когда все данные станут большими...HappyDev'15 Keynote: Когда все данные станут большими...
HappyDev'15 Keynote: Когда все данные станут большими...
 
MongoDB первые впечатления
MongoDB первые впечатленияMongoDB первые впечатления
MongoDB первые впечатления
 
MongoDB basics in Russian
MongoDB basics in RussianMongoDB basics in Russian
MongoDB basics in Russian
 
Кратко о MongoDB
Кратко о MongoDBКратко о MongoDB
Кратко о MongoDB
 
JBoss seam 2 part
JBoss seam 2 partJBoss seam 2 part
JBoss seam 2 part
 
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
 
A22 Introduction to DTrace by Kyle Hailey
A22 Introduction to DTrace by Kyle HaileyA22 Introduction to DTrace by Kyle Hailey
A22 Introduction to DTrace by Kyle Hailey
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDB
 
Docker 基本概念與指令操作
Docker  基本概念與指令操作Docker  基本概念與指令操作
Docker 基本概念與指令操作
 
Spark Solution for Rank Product
Spark Solution for Rank ProductSpark Solution for Rank Product
Spark Solution for Rank Product
 
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
 
Apache Spark Essentials
Apache Spark EssentialsApache Spark Essentials
Apache Spark Essentials
 
Performance in Spark 2.0, PDX Spark Meetup 8/18/16
Performance in Spark 2.0, PDX Spark Meetup 8/18/16Performance in Spark 2.0, PDX Spark Meetup 8/18/16
Performance in Spark 2.0, PDX Spark Meetup 8/18/16
 
JavaDayKiev'15 Java in production for Data Mining Research projects
JavaDayKiev'15 Java in production for Data Mining Research projectsJavaDayKiev'15 Java in production for Data Mining Research projects
JavaDayKiev'15 Java in production for Data Mining Research projects
 
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
Joker'16 Spark 2 (API changes; Structured Streaming; Encoders)
 
Meetup Spark 2.0
Meetup Spark 2.0Meetup Spark 2.0
Meetup Spark 2.0
 
使用 CLI 管理 OpenStack 平台
使用 CLI 管理 OpenStack 平台使用 CLI 管理 OpenStack 平台
使用 CLI 管理 OpenStack 平台
 
Joker'15 Java straitjackets for MongoDB
Joker'15 Java straitjackets for MongoDBJoker'15 Java straitjackets for MongoDB
Joker'15 Java straitjackets for MongoDB
 

Similaire à Java BigData Full Stack Development (version 2.0)

Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
Sri Ambati
 

Similaire à Java BigData Full Stack Development (version 2.0) (20)

Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Big Data made easy with a Spark
Big Data made easy with a SparkBig Data made easy with a Spark
Big Data made easy with a Spark
 
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
 
JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev
JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev
JDD2015: Thorny path to Data Mining projects - Alexey Zinoviev
 
Beauty and Big Data
Beauty and Big DataBeauty and Big Data
Beauty and Big Data
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Embracing Hadoop with a musical touch!
Embracing Hadoop with a musical touch!Embracing Hadoop with a musical touch!
Embracing Hadoop with a musical touch!
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Big Data and High Performance Computing
Big Data and High Performance ComputingBig Data and High Performance Computing
Big Data and High Performance Computing
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
DATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAIDATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAI
 

Plus de Alexey Zinoviev

ALMADA 2013 (computer science school by Yandex and Microsoft Research)
ALMADA 2013 (computer science school by Yandex and Microsoft Research)ALMADA 2013 (computer science school by Yandex and Microsoft Research)
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
Alexey Zinoviev
 

Plus de Alexey Zinoviev (20)

Kafka pours and Spark resolves
Kafka pours and Spark resolvesKafka pours and Spark resolves
Kafka pours and Spark resolves
 
Hadoop Jungle
Hadoop JungleHadoop Jungle
Hadoop Jungle
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
First steps in Data Mining Kindergarten
First steps in Data Mining KindergartenFirst steps in Data Mining Kindergarten
First steps in Data Mining Kindergarten
 
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
EST: Smart rate (Effective recommendation system for Taxi drivers based on th...
 
Android Geo Apps in Soviet Russia: Latitude and longitude find you
Android Geo Apps in Soviet Russia: Latitude and longitude find youAndroid Geo Apps in Soviet Russia: Latitude and longitude find you
Android Geo Apps in Soviet Russia: Latitude and longitude find you
 
Keynote on JavaDay Omsk 2014 about new features in Java 8
Keynote on JavaDay Omsk 2014 about new features in Java 8Keynote on JavaDay Omsk 2014 about new features in Java 8
Keynote on JavaDay Omsk 2014 about new features in Java 8
 
Big data algorithms and data structures for large scale graphs
Big data algorithms and data structures for large scale graphsBig data algorithms and data structures for large scale graphs
Big data algorithms and data structures for large scale graphs
 
"Говнокод-шоу"
"Говнокод-шоу""Говнокод-шоу"
"Говнокод-шоу"
 
Алгоритмы и структуры данных BigData для графов большой размерности
Алгоритмы и структуры данных BigData для графов большой размерностиАлгоритмы и структуры данных BigData для графов большой размерности
Алгоритмы и структуры данных BigData для графов большой размерности
 
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
ALMADA 2013 (computer science school by Yandex and Microsoft Research)ALMADA 2013 (computer science school by Yandex and Microsoft Research)
ALMADA 2013 (computer science school by Yandex and Microsoft Research)
 
GDG Devfest Omsk 2013. Year of events!
GDG Devfest Omsk 2013. Year of events!GDG Devfest Omsk 2013. Year of events!
GDG Devfest Omsk 2013. Year of events!
 
How to port JavaScript library to Android and iOS
How to port JavaScript library to Android and iOSHow to port JavaScript library to Android and iOS
How to port JavaScript library to Android and iOS
 
Поездка на IT-DUMP 2012
Поездка на IT-DUMP 2012Поездка на IT-DUMP 2012
Поездка на IT-DUMP 2012
 
MyBatis и Hibernate на одном проекте. Как подружить?
MyBatis и Hibernate на одном проекте. Как подружить?MyBatis и Hibernate на одном проекте. Как подружить?
MyBatis и Hibernate на одном проекте. Как подружить?
 
Google I/O туда и обратно.
Google I/O туда и обратно.Google I/O туда и обратно.
Google I/O туда и обратно.
 
Google Maps. Zinoviev Alexey.
Google Maps. Zinoviev Alexey.Google Maps. Zinoviev Alexey.
Google Maps. Zinoviev Alexey.
 
ORM battle. MyBatis vs Hibernate
ORM battle. MyBatis vs HibernateORM battle. MyBatis vs Hibernate
ORM battle. MyBatis vs Hibernate
 

Dernier

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Java BigData Full Stack Development (version 2.0)

  • 1. Java BigData Full Stack Development as is ... Alexey Zinovyev, Java Trainer in EPAM
  • 2. About With IT since 2007 With Java since 2009 With Hadoop since 2012 With EPAM since 2015
  • 3. 3Java Big Data Full Stack Development Contacts E-mail : Alexey_Zinovyev@epam.com Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs
  • 4. 4Java Big Data Full Stack Development The Good Old Days
  • 5. 5Java Big Data Full Stack Development HRs & RMs are looking for Java developers
  • 6. 6Java Big Data Full Stack Development Is Java Dream Team waiting You?
  • 7. 7Java Big Data Full Stack Development Required Skills • Advanced SQL • Basic Linux • Core Java & JVM • Backend Development Experience • Basic Computer Science Level
  • 8. 8Java Big Data Full Stack Development REAL WORLD
  • 9. 9Java Big Data Full Stack Development Let’s just use Javascript in frontend ONLY
  • 10. 10Java Big Data Full Stack Development In frontend ONLY?
  • 11. 11Java Big Data Full Stack Development Cruel world
  • 12. 12Java Big Data Full Stack Development Do you know ML JS library?
  • 13. 13Java Big Data Full Stack Development Wild animals everywhere
  • 14. 14Java Big Data Full Stack Development And what I tell you
  • 15. 15Java Big Data Full Stack Development And what I tell you
  • 16. 16Java Big Data Full Stack Development It’s Time for Java Superhero, yeah!
  • 17. 17Java Big Data Full Stack Development Before patterns discovering you should .. • Select small pieces • Define default values for missed data • Remove strange signals from data • Merge some tables in one if required
  • 18. 18Java Big Data Full Stack Development How it really works • Share your date with us • Our magic manipulations • Building an answering machine • PROFIT!!!
  • 19. 19Java Big Data Full Stack Development How to start?
  • 20. 20Java Big Data Full Stack Development
  • 21. 21Java Big Data Full Stack Development WHAT IS BIG DATA?
  • 22. 22Java Big Data Full Stack Development Joke about Excel
  • 23. 23Java Big Data Full Stack Development 5V
  • 24. 24Java Big Data Full Stack Development Every 60 seconds…
  • 25. 25Java Big Data Full Stack Development From Mobile Devices
  • 26. 26Java Big Data Full Stack Development From Industry
  • 27. 27Java Big Data Full Stack Development We started to keep and handle stupid new things!
  • 28. 28Java Big Data Full Stack Development 10^6 rows in MySQL
  • 29. 29Java Big Data Full Stack Development GB->TB->PB->?
  • 30. 30Java Big Data Full Stack Development Is BigData about PBs?
  • 31. 31Java Big Data Full Stack Development Is BigData about PBs?
  • 32. 32Java Big Data Full Stack Development It’s hard to … • .. store • .. handle • .. search in • .. visualize • .. send in network
  • 33. 33Java Big Data Full Stack Development Likes in Classmates: how to count?
  • 34. 34Java Big Data Full Stack Development Crazy Zoo 2012
  • 35. 35Java Big Data Full Stack Development Crazy Zoo 2016
  • 36. 36Java Big Data Full Stack Development What will be lighted this training
  • 37. 37Java Big Data Full Stack Development NOSQL
  • 38. 38Java Big Data Full Stack Development What’s the problem with RBDMS’s • Caching • Master/Slave • Cluster • Table Partitioning • Sharding
  • 39. 39Java Big Data Full Stack Development Family
  • 40. 40Java Big Data Full Stack Development Database party
  • 41. 41Java Big Data Full Stack Development Spring Data
  • 42. 42Java Big Data Full Stack Development How to start?
  • 43. 43Java Big Data Full Stack Development Java MongoDB Driver + Robomongo
  • 44. 44Java Big Data Full Stack Development BIG DATA TOOL MASTER VS DATA SCIENTIST
  • 45. 45Java Big Data Full Stack Development TRAIN MODEL
  • 46. 46Java Big Data Full Stack Development Datasets • Facebook users, tweets • Trade transactions • Government • Medicine (genomic data) • Telecommunications
  • 47. 47Java Big Data Full Stack Development Data Sources • Relational Databases • Data warehouses (Historical data) • Files in CSV or in binary format • Internet or electronic mails • Scientific, research (R, Octave, Matlab)
  • 48. 48Java Big Data Full Stack Development Hey, man, predict something!
  • 49. 49Java Big Data Full Stack Development Man or sofa?
  • 50. 50Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk?
  • 51. 51Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud?
  • 52. 52Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year?
  • 53. 53Java Big Data Full Stack Development Typical questions for DM • Which loan applicants are high-risk? • How do we detect phone card fraud? • What is the revenue prediction for next year? • Can you recommend music for users?
  • 54. 54Java Big Data Full Stack Development Green circle is blue square or red triangle? Let’s ask its neighbors! kNN (k-nearest neighbor)
  • 55. 55Java Big Data Full Stack Development Collaborative Filtering
  • 56. 56Java Big Data Full Stack Development Machine Learning vs Traditional Programming
  • 57. 57Java Big Data Full Stack Development Data Science
  • 58. 58Java Big Data Full Stack Development Can a Java programmer to be a Data Scientist?
  • 59. 59Java Big Data Full Stack Development Sexy Data Scientist
  • 60. 60Java Big Data Full Stack Development Real Data Scientist
  • 61. 61Java Big Data Full Stack Development How to start?
  • 62. 62Java Big Data Full Stack Development Weka
  • 63. 63Java Big Data Full Stack Development HADOOP
  • 64. 64Java Big Data Full Stack Development Hadoop and Data Knights
  • 65. 65Java Big Data Full Stack Development Hadoop
  • 66. 66Java Big Data Full Stack Development MapReduce in different languages
  • 67. 67Java Big Data Full Stack Development MapReduce for WordCount
  • 68. 68Java Big Data Full Stack Development Hadoop Jobs
  • 69. 69Java Big Data Full Stack Development Hadoop frameworks • Universal (MapReduce, Tez, RDD in Spark) • Abstract (Pig, Pipeline Spark) • SQL - like (Hive, Impala, Spark SQL) • Processing graph (Giraph, GraphX) • Machine Learning (Mahout, MLib) • Stream processing (Spark Streaming, Storm)
  • 70. 70Java Big Data Full Stack Development SPARK
  • 71. 71Java Big Data Full Stack Development SPARK: the bloody son of MR • MapReduce in memory • Up to 50x faster than Hadoop • RDD is a basic building block (immutable distributed collections of objects) • Pipeline API (no needs in PIG)
  • 72. 72Java Big Data Full Stack Development Spark Family
  • 73. 73Java Big Data Full Stack Development MLlib supports • Classification and regression • Collaborative filtering • Clustering • Dimensionality reduction • Optimization
  • 74. 74Java Big Data Full Stack Development Code sample MLlib (K-Means) // Cluster the data into two classes using KMeans int numClusters = 2; int numIterations = 20; KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); // Evaluate clustering by computing Within Set Sum of Squared Errors double WSSSE = clusters.computeCost(parsedData.rdd()); System.out.println("Within Set Sum of Squared Errors = " + WSSSE); // Save and load model clusters.save(sc.sc(), "myModelPath"); KMeansModel sameModel = KMeansModel.load(sc.sc(), "myModelPath");
  • 75. 75Java Big Data Full Stack Development MLlib • .. extends scikit-learn (Python lib) and Mahout • .. runs fully on Spark and supports Spark’s Pipeline API • .. dataset is represented by Spark SQL’s SchemaRDD • .. supports Hive like external data source • .. is well for large datasets and parallelized algorithms
  • 76. 76Java Big Data Full Stack Development It solves all problems!
  • 77. 77Java Big Data Full Stack Development How to start?
  • 78. 78Java Big Data Full Stack Development HDP Zoo
  • 79. 79Java Big Data Full Stack Development Ok, Google!
  • 80. 80Java Big Data Full Stack Development AWS Amazon
  • 81. 81Java Big Data Full Stack Development Infrastructure issues are waiting YOU!
  • 82. 82Java Big Data Full Stack Development DEEP LEARNING
  • 83. 83Java Big Data Full Stack Development Deep Learning help us build NEW FUTURE
  • 84. 84Java Big Data Full Stack Development Deep Learning help us build NEW FUTURE
  • 85. 85Java Big Data Full Stack Development HOW TO LEARN?
  • 86. 86Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects DIFFERENT WAYS
  • 87. 87Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process DIFFERENT WAYS
  • 88. 88Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC DIFFERENT WAYS
  • 89. 89Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course DIFFERENT WAYS
  • 90. 90Java Big Data Full Stack Development 1. Read books and write ‘pet’ projects 2. Become a mentee in Mentoring Process 3. MOOC 4. Take a training course 5. Visit conferences DIFFERENT WAYS
  • 91. 91Java Big Data Full Stack Development Recommended Books
  • 92. 92Java Big Data Full Stack Development Contacts E-mail : Alexey_Zinovyev@epam.com Twitter : @zaleslaw @BigDataRussia vk.com/big_data_russia Big Data Russia vk.com/java_jvm Java & JVM langs