SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Big Data
Dipl. Inform.(FH) Jony Sugianto, M. Comp. Sc.
Hp:0838-98355491
WA:0812-13086659
Email:jonysugianto@gmail.com
Agenda
● What is Big Data?
● Analytic
● Big Data Platforms
● Questions
What is Big Data?
● The Basic idea behind the phrase Big Data is that
everything we do is increasingly leaving a digital trace(or
data), which we(and others) can use and analyse
● Big Data therefore refers to our ability to make use of the
ever increasing volumes of data
● Big Data is not about the size of the data, it's about the
value within the data
Datafication of the world
● Activities
- Web Browser
- Credit Cards
- E-Commerce
● Conversations
- WhatsApp
- Email
- Twitter
● Photos/Videos
- Instagram
- You Tube
● Sensors
- Gps
● Etc...
Turning Big Data into Value
Datafication of
our world
● Activity
● Conversation
● Sensors
● Photo/Video
● Etc...
Analysing Big
Data
● Text Analytics
● Sentiment
Analysis
● Movement
Analytics
● Face/Voice
Recognition
● Etc...
Value
Webdata
● Log Data(all user)
- Anonymous ID from Cookie Data
- LoginID (if exist)
- ArticleId
- Kanal / Category
- Browser
- IP
- Etc...
● Registered User Data(10 %)
- Login ID
- Name
- Age
- Gender
- Education
- Etc...
Valuable data
● User activness
● User interest based on reading behaviour
● Personal Profile for all user
Compute
Compute
Why use the UA 2 ?
User Activness and User Interest
How to update the User activness?
How to update the User activness?
New-UA=w_history * UA-sofar + w_current * UA-Per-Minggu
w_history=0.75
w_current=0.25
Asigning personal profile
Final data
How to define the similarity?
● Linear |x1 – x2|
● Square (x1 – x2)^2
● Exponential 10^f(|x1-x2|)
Complexity Analysis
● Assume 30.000.000 click a day
● A week: 210.000.000 click
● Size log entry: 1 kb
● Total size: 210.000.000.000 byte = 210 Gb
Complexity Analysis
● All User : 10.000.000
● Loginuser: 1.000.000
● Comparison per second per CPU: 1.000.000
● Total Comparison: 9.000.000.000.000
● Total Time: 9.000.000 second=104 hari
Big Data Platforms
What is the different?
What is Hadoop?
● Hadoop:
an open-source framework that supports data-intensive
distributed applications, licensed under apache v2 license
● Goals:
- Abstract and facilitate the storage and processing of large
and/or rapidly growing data sets
- High scalability and availability
- Use commodity Hardware
- Fault-tolerance
- Move computation rather than data
Hadoop Components
● Hadoop Distributed File System(HDFS)
A distributed file system that provides high-throughput access to
application data
● Hadoop YARN
A framework for job scheduling and cluster resource
management
● Hadoop MapReduce
A Yarn-based system for parallel processing of large data sets
What is Hive?
● Hive is a data warehouse infrastructure built on top of
Hadoop
● Hive stored data in the HDFS
● Hive compile SQL Queries into MapReduce jobs
Example Hive Script
What is Pig?
● Pig is a platform for analyzing large data sets that consist
of a high-level language for expressing data analysis
programs
● Pig generates and compiles a MapReduce program on the
fly
Example Pig Script
What is Spark?
● Fast and general purpose cluster computing system
● 10x(on disk) – 100x(in-memory) faster than Hadoop
MapReduce
● Provides high level APIs in
-Scala
-Java
-Python
● Can be deployed through Apache Mesos, Apache Hadoop
via YARN, or Spark's cluster manager
Resilient Distributed Datasets
● Written in scala
● Fundamental Unit of data in spark
● Distributed collection of object
● Resilient-Ability to recompute missing partions(node failure)
● Distributed-Split across multiple partions
● Dataset-Can contains any type, Scala/Java/Python Object or User
defined object
● Operations
-Transformations(map, filter, groupBy,...)
-Actions(count, collect, save, ...)
Spark Example
// Spark wordcount
object WordCount {
def main(args: Array[String]) {
val env = new SparkContext("local","wordCount")
val data = List("hi","how are you","hi")
val dataSet = env.parallelize(data)
val words = dataSet.flatMap(value => value.split("s+"))
val mappedWords = words.map(value => (value,1))
val sum = mappedWords.reduceByKey(_+_)
println(sum.collect())
}
}
What is Flink?
● Written in java
● An open source platform for distributed stream and batch
data processing
● Several APIs in Java/Scala/Python
-DataSet API – Batch processing
-DataStream API – Stream processing
-Table API – Relational Queries
Flink Example
// Flink wordcount
object WordCount {
def main(args: Array[String]) {
val env = ExecutionEnvironment.getExecutionEnvironment
val data = List("hi","how are you","hi")
val dataSet = env.fromCollection(data)
val words = dataSet.flatMap(value => value.split("s+"))
val mappedWords = words.map(value => (value,1))
val grouped = mappedWords.groupBy(0)
val sum = grouped.sum(1)
println(sum.collect())
}
}
Question ?

Contenu connexe

Tendances

Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 

Tendances (20)

Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data 101
Big data 101Big data 101
Big data 101
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

En vedette

Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Brighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingBrighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingMuhammad Singgih Z.A
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsCodePolitan
 
How Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansHow Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansMuhammad Singgih Z.A
 
IoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnIoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnToni Haryanto
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?CodePolitan
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of ThingsCodePolitan
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for HackathonCodePolitan
 
E-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsE-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsAbdul Fauzan
 
React Webinar With CodePolitan
React Webinar With CodePolitanReact Webinar With CodePolitan
React Webinar With CodePolitanRiza Fahmi
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan
 
Scaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureScaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureRein Mahatma
 
Rekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteRekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteKhaerul Anwar
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless ArchitectureCodePolitan
 
Perkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriPerkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriKresna Galuh
 
Strategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaStrategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaKresna Galuh
 

En vedette (18)

Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Brighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingBrighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn Programming
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
Codepolitan profile 2016
Codepolitan profile 2016Codepolitan profile 2016
Codepolitan profile 2016
 
How Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansHow Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved Indonesians
 
IoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnIoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to Learn
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
E-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsE-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of Things
 
Technology Stack KUDO.co.id
Technology Stack KUDO.co.idTechnology Stack KUDO.co.id
Technology Stack KUDO.co.id
 
React Webinar With CodePolitan
React Webinar With CodePolitanReact Webinar With CodePolitan
React Webinar With CodePolitan
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 
Scaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureScaling tokopedia-past-present-future
Scaling tokopedia-past-present-future
 
Rekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteRekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi Website
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
Perkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriPerkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia Industri
 
Strategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaStrategi Gaul di Sosial Media
Strategi Gaul di Sosial Media
 

Similaire à What is Big Data?

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshersrajkamaltibacademy
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataRostislav Pashuto
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)tsliwowicz
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 

Similaire à What is Big Data? (20)

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
PyTables
PyTablesPyTables
PyTables
 
Py tables
Py tablesPy tables
Py tables
 

Plus de CodePolitan

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberCodePolitan
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0CodePolitan
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarovCodePolitan
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjajaCodePolitan
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialCodePolitan
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayatCodePolitan
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexanderCodePolitan
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3CodePolitan
 

Plus de CodePolitan (11)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 

Dernier

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 

Dernier (20)

Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

What is Big Data?

  • 1. Big Data Dipl. Inform.(FH) Jony Sugianto, M. Comp. Sc. Hp:0838-98355491 WA:0812-13086659 Email:jonysugianto@gmail.com
  • 2. Agenda ● What is Big Data? ● Analytic ● Big Data Platforms ● Questions
  • 3. What is Big Data? ● The Basic idea behind the phrase Big Data is that everything we do is increasingly leaving a digital trace(or data), which we(and others) can use and analyse ● Big Data therefore refers to our ability to make use of the ever increasing volumes of data ● Big Data is not about the size of the data, it's about the value within the data
  • 4. Datafication of the world ● Activities - Web Browser - Credit Cards - E-Commerce ● Conversations - WhatsApp - Email - Twitter ● Photos/Videos - Instagram - You Tube ● Sensors - Gps ● Etc...
  • 5. Turning Big Data into Value Datafication of our world ● Activity ● Conversation ● Sensors ● Photo/Video ● Etc... Analysing Big Data ● Text Analytics ● Sentiment Analysis ● Movement Analytics ● Face/Voice Recognition ● Etc... Value
  • 6. Webdata ● Log Data(all user) - Anonymous ID from Cookie Data - LoginID (if exist) - ArticleId - Kanal / Category - Browser - IP - Etc... ● Registered User Data(10 %) - Login ID - Name - Age - Gender - Education - Etc...
  • 7. Valuable data ● User activness ● User interest based on reading behaviour ● Personal Profile for all user
  • 10. Why use the UA 2 ?
  • 11. User Activness and User Interest
  • 12. How to update the User activness?
  • 13. How to update the User activness? New-UA=w_history * UA-sofar + w_current * UA-Per-Minggu w_history=0.75 w_current=0.25
  • 16. How to define the similarity? ● Linear |x1 – x2| ● Square (x1 – x2)^2 ● Exponential 10^f(|x1-x2|)
  • 17. Complexity Analysis ● Assume 30.000.000 click a day ● A week: 210.000.000 click ● Size log entry: 1 kb ● Total size: 210.000.000.000 byte = 210 Gb
  • 18. Complexity Analysis ● All User : 10.000.000 ● Loginuser: 1.000.000 ● Comparison per second per CPU: 1.000.000 ● Total Comparison: 9.000.000.000.000 ● Total Time: 9.000.000 second=104 hari
  • 20. What is the different?
  • 21. What is Hadoop? ● Hadoop: an open-source framework that supports data-intensive distributed applications, licensed under apache v2 license ● Goals: - Abstract and facilitate the storage and processing of large and/or rapidly growing data sets - High scalability and availability - Use commodity Hardware - Fault-tolerance - Move computation rather than data
  • 22. Hadoop Components ● Hadoop Distributed File System(HDFS) A distributed file system that provides high-throughput access to application data ● Hadoop YARN A framework for job scheduling and cluster resource management ● Hadoop MapReduce A Yarn-based system for parallel processing of large data sets
  • 23. What is Hive? ● Hive is a data warehouse infrastructure built on top of Hadoop ● Hive stored data in the HDFS ● Hive compile SQL Queries into MapReduce jobs
  • 25. What is Pig? ● Pig is a platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs ● Pig generates and compiles a MapReduce program on the fly
  • 27. What is Spark? ● Fast and general purpose cluster computing system ● 10x(on disk) – 100x(in-memory) faster than Hadoop MapReduce ● Provides high level APIs in -Scala -Java -Python ● Can be deployed through Apache Mesos, Apache Hadoop via YARN, or Spark's cluster manager
  • 28. Resilient Distributed Datasets ● Written in scala ● Fundamental Unit of data in spark ● Distributed collection of object ● Resilient-Ability to recompute missing partions(node failure) ● Distributed-Split across multiple partions ● Dataset-Can contains any type, Scala/Java/Python Object or User defined object ● Operations -Transformations(map, filter, groupBy,...) -Actions(count, collect, save, ...)
  • 29. Spark Example // Spark wordcount object WordCount { def main(args: Array[String]) { val env = new SparkContext("local","wordCount") val data = List("hi","how are you","hi") val dataSet = env.parallelize(data) val words = dataSet.flatMap(value => value.split("s+")) val mappedWords = words.map(value => (value,1)) val sum = mappedWords.reduceByKey(_+_) println(sum.collect()) } }
  • 30. What is Flink? ● Written in java ● An open source platform for distributed stream and batch data processing ● Several APIs in Java/Scala/Python -DataSet API – Batch processing -DataStream API – Stream processing -Table API – Relational Queries
  • 31. Flink Example // Flink wordcount object WordCount { def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnvironment val data = List("hi","how are you","hi") val dataSet = env.fromCollection(data) val words = dataSet.flatMap(value => value.split("s+")) val mappedWords = words.map(value => (value,1)) val grouped = mappedWords.groupBy(0) val sum = grouped.sum(1) println(sum.collect()) } }