SlideShare une entreprise Scribd logo
1  sur  34
Hadoop and IoT 
Darko Marjanović 
Đorđe Stepanić 
Miloš Milovanović
AGENDA 
BIG DATA 
HADOOP AND IOT MODEL 
HADOOP 
IOT 
HADOOP DATA PROCESSING 
HIVE 
STINGER INITIATIVE 
Q&A
BIG DATA 
Big Data describes the collection of complex and large data sets such that it’s 
difficult to capture, process, store, search and analyze using conventional data 
base systems. 
Anything that Won't Fit in Excel. 
*Definition taken from (www.bigdata-startups.com)
BIG DATA DIMESIONS 
1992 100GB/Day 
2002 100GB/Second 
2013 28,000GB/Second 
2018 50,000GB/Second
HADOOP AND IOT
HADOOP 
Apache Hadoop is an open-source software framework for storage and large-scale 
processing of data-sets on clusters of commodity hardware. 
Hadoop was created by Doug Cutting and Mike Cafarella in 2005 
All the modules in Hadoop are designed with a fundamental assumption that 
hardware failures are common and thus should be automatically handled in software 
by the framework.
HADOOP COMPONENTS 
Hadoop common 
HDFS 
Map Reduce 
YARN (Starting with Hadoop 2.x.x)
HADOOP HDFS 
The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system 
written in Java for the Hadoop framework.
HADOOP MAP REDUCE 
Map Reduce is a programming model and an associated implementation for processing 
and generating large data sets with a parallel, distributed algorithm on a cluster.
HADOOP YARN 
Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management 
technology. YARN is now characterized as a large-scale, distributed operating 
system for big data applications.
HADOOP ECOSYSTEM 
The main groups of tools in the Hadoop ecosystem: 
Data Ingestion (Flume, Sqoop …) 
Data Processing (Pig, Hive, Storm …) 
Cluster Management(Ambari) 
Security (Knox)
DATA INGESTION 
Flume 
Flume is a distributed, reliable, and available service for efficiently collecting, 
aggregating, and moving large amounts of streaming event data. 
Sqoop 
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache 
Hadoop and structured datastores such as relational databases. 
WEB HDFS REST API
FLUME EXAMPLE
SQOOP AND WEB HDFS API EXAMPLE
IOT
UBIQUITOUS COMPUTING & INTERNET OF THINGS 
Ubiquitous computing - trend (wave) in computing where computers are 
spreaded throughout our everyday environment. 
Concept: one person - many computers 
Internet Of Things - is the network of physical objects accessed through the 
Internet, which contains embedded technology to interact (sense and 
communicate) with internal states or the external environment 
(Cisco definition).
INTERNET OF THINGS COMPONENTS
INTERNET OF THINGS AND BIG DATA
REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
INTERNET OF THINGS - FIELDS OF APPLICATION 
* Production - energy savings, lower maintenance costs, prediction of 
machine failure, quality control etc. 
** Logistic - efficient supply control , optimization of transport, 
environmental controls in the warehouse, JIT, lean logistics, better capacity 
utilization etc. 
Smart cities & environment - smart parking, traffic congestion, smart 
lighting, waste management, noise urban maps, air pollution etc. 
Smart agriculture 
eHealth 
and everything you can imagine...
HADOOP DATA PROCESSING 
Input: 
- Raw data files 
- No metadata 
- No schema 
Objective: 
- Perform analysis, run interactive queries 
- Explore, structure and analyze the data 
- Real-time processing (Apache Storm) 
- Visualization
HIVE 
Apache Hive is a data warehousing software that facilitates querying and 
managing large datasets residing in distributed storage. 
Hive provides: 
- Tools ETL processes 
- A mechanism for imposing a structure on a variety of data formats 
- Access to files stored in HDFS or other storage systems 
- Query execution via MapReduce?
HIVE ARCHITECTURE 
Data Model: 
- Tables 
- Partitions 
- Buckets 
SERDEs 
Datatypes: 
Common primitive data types (int, 
boolean, float, double, string, char, date, 
timestamp, …) 
+Complex data types (structs, maps, 
arrays) 
UI 
Driver 
Compiler 
Metastore 
Execution 
engine
HIVE.NOW 
Hive defines a simple SQL-like query language, called HQL, that enables users 
familiar with SQL to query the data. 
Scalable and extensible. 
Most commonly used for: 
- Log analysis 
- Statistical analysis 
- Document indexing
HIVE SCRIPT EXAMPLE
STINGER INITIATIVE 
Stinger is the initiative to improve query execution time and increase SQL 
functionality for Apache Hive. Microsoft and Hortonworks worked actively in the 
Apache community towards completing Stinger. 
Announced in February 2013 
44 companies, 145 developers, 392,000 lines of Java code 
Hive 0.13 
Speed: Hive on Tez, vectorized query engine & cost-based optimizer 
Scale: dynamic partition loads and smaller hash tables 
SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN 
Improved Hive performance up to 100x.
STINGER.NEXT 
Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in 
Hive in the open Apache Hive community. 
Main goals: 
- transactions with ACID semantics 
- sub-second queries 
- SQL:2011 Analytics 
- usability improvements 
To be delivered in next 18 months.
STINGER.NEXT 
*Photo taken from the official Hortonworks website (www.hortonworks.com)
HIVE ON SPARK 
Apache Spark is a fast and general engine for large-scale data processing. 
Spark powers a stack of high-level tools including Spark SQL, MLlib for machine 
learning, GraphX, and Spark Streaming. 
Hive-Spark Machine Learning Integration will allow Hive users to run machine 
learning models via Hive.
Q&A 
darko@thingsolver.com 
djordje@thingsolver.com 
milosmilovanovic@outlook.com 
hadoop-srbija.com
Please rate this lecture 
and win Windows Phone NOKIA Lumia 1320 
Help us choose the best Sinergija lecturer! 
Microsoft will award you – at the conference end, 
we’ll give one NOKIA Lumia 1320 to someone 
from the audience – randomly. 
Go to www.mssinergija.net, log in and cast your 
votes! 
You can rate only lectures that you were present 
at, just once. More lectures you rate, more 
chances you have. 
Winner will be announced at the official Sinergija 
web portal, www.mssinergija.net
Hadoop and IoT Sinergija 2014

Contenu connexe

Tendances

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseAsis Mohanty
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detikk4ndar
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data scienceAndy Petrella
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 

Tendances (20)

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
963
963963
963
 

Similaire à Hadoop and IoT Sinergija 2014

Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsightEng Teong Cheah
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 

Similaire à Hadoop and IoT Sinergija 2014 (20)

Big data
Big dataBig data
Big data
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 

Plus de Darko Marjanovic

Big Data tools in practice
Big Data tools in practiceBig Data tools in practice
Big Data tools in practiceDarko Marjanovic
 
Hadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisa
Hadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisaHadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisa
Hadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisaDarko Marjanovic
 
Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?
Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?
Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?Darko Marjanovic
 
Data Science Conference Belgrade
Data Science Conference BelgradeData Science Conference Belgrade
Data Science Conference BelgradeDarko Marjanovic
 
Big data i arkitektura big data aplikacije meetup
Big data i arkitektura big data aplikacije meetupBig data i arkitektura big data aplikacije meetup
Big data i arkitektura big data aplikacije meetupDarko Marjanovic
 
Big data apache spark zamena za hadoop ili ne?
Big data   apache spark zamena za hadoop ili ne?Big data   apache spark zamena za hadoop ili ne?
Big data apache spark zamena za hadoop ili ne?Darko Marjanovic
 
Arhitektura big data aplikacije (tarabica)
Arhitektura big data aplikacije (tarabica)Arhitektura big data aplikacije (tarabica)
Arhitektura big data aplikacije (tarabica)Darko Marjanovic
 
Hadoop infrastructure for education
Hadoop infrastructure for educationHadoop infrastructure for education
Hadoop infrastructure for educationDarko Marjanovic
 
Hadoop i sveprisutno racunarstvo
Hadoop i sveprisutno racunarstvoHadoop i sveprisutno racunarstvo
Hadoop i sveprisutno racunarstvoDarko Marjanovic
 

Plus de Darko Marjanovic (9)

Big Data tools in practice
Big Data tools in practiceBig Data tools in practice
Big Data tools in practice
 
Hadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisa
Hadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisaHadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisa
Hadoop ekosistem u praksi - socijalne mreže, unapređenje prodaje i servisa
 
Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?
Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?
Big Data: Apache Spark -novo pojačanje tradicionalnom BI ili ne?
 
Data Science Conference Belgrade
Data Science Conference BelgradeData Science Conference Belgrade
Data Science Conference Belgrade
 
Big data i arkitektura big data aplikacije meetup
Big data i arkitektura big data aplikacije meetupBig data i arkitektura big data aplikacije meetup
Big data i arkitektura big data aplikacije meetup
 
Big data apache spark zamena za hadoop ili ne?
Big data   apache spark zamena za hadoop ili ne?Big data   apache spark zamena za hadoop ili ne?
Big data apache spark zamena za hadoop ili ne?
 
Arhitektura big data aplikacije (tarabica)
Arhitektura big data aplikacije (tarabica)Arhitektura big data aplikacije (tarabica)
Arhitektura big data aplikacije (tarabica)
 
Hadoop infrastructure for education
Hadoop infrastructure for educationHadoop infrastructure for education
Hadoop infrastructure for education
 
Hadoop i sveprisutno racunarstvo
Hadoop i sveprisutno racunarstvoHadoop i sveprisutno racunarstvo
Hadoop i sveprisutno racunarstvo
 

Dernier

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Hadoop and IoT Sinergija 2014

  • 1.
  • 2. Hadoop and IoT Darko Marjanović Đorđe Stepanić Miloš Milovanović
  • 3. AGENDA BIG DATA HADOOP AND IOT MODEL HADOOP IOT HADOOP DATA PROCESSING HIVE STINGER INITIATIVE Q&A
  • 4. BIG DATA Big Data describes the collection of complex and large data sets such that it’s difficult to capture, process, store, search and analyze using conventional data base systems. Anything that Won't Fit in Excel. *Definition taken from (www.bigdata-startups.com)
  • 5. BIG DATA DIMESIONS 1992 100GB/Day 2002 100GB/Second 2013 28,000GB/Second 2018 50,000GB/Second
  • 7. HADOOP Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop was created by Doug Cutting and Mike Cafarella in 2005 All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and thus should be automatically handled in software by the framework.
  • 8. HADOOP COMPONENTS Hadoop common HDFS Map Reduce YARN (Starting with Hadoop 2.x.x)
  • 9. HADOOP HDFS The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework.
  • 10. HADOOP MAP REDUCE Map Reduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
  • 11. HADOOP YARN Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. YARN is now characterized as a large-scale, distributed operating system for big data applications.
  • 12. HADOOP ECOSYSTEM The main groups of tools in the Hadoop ecosystem: Data Ingestion (Flume, Sqoop …) Data Processing (Pig, Hive, Storm …) Cluster Management(Ambari) Security (Knox)
  • 13. DATA INGESTION Flume Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Sqoop Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. WEB HDFS REST API
  • 15.
  • 16. SQOOP AND WEB HDFS API EXAMPLE
  • 17. IOT
  • 18. UBIQUITOUS COMPUTING & INTERNET OF THINGS Ubiquitous computing - trend (wave) in computing where computers are spreaded throughout our everyday environment. Concept: one person - many computers Internet Of Things - is the network of physical objects accessed through the Internet, which contains embedded technology to interact (sense and communicate) with internal states or the external environment (Cisco definition).
  • 19. INTERNET OF THINGS COMPONENTS
  • 20. INTERNET OF THINGS AND BIG DATA
  • 21. REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
  • 22. INTERNET OF THINGS - FIELDS OF APPLICATION * Production - energy savings, lower maintenance costs, prediction of machine failure, quality control etc. ** Logistic - efficient supply control , optimization of transport, environmental controls in the warehouse, JIT, lean logistics, better capacity utilization etc. Smart cities & environment - smart parking, traffic congestion, smart lighting, waste management, noise urban maps, air pollution etc. Smart agriculture eHealth and everything you can imagine...
  • 23. HADOOP DATA PROCESSING Input: - Raw data files - No metadata - No schema Objective: - Perform analysis, run interactive queries - Explore, structure and analyze the data - Real-time processing (Apache Storm) - Visualization
  • 24. HIVE Apache Hive is a data warehousing software that facilitates querying and managing large datasets residing in distributed storage. Hive provides: - Tools ETL processes - A mechanism for imposing a structure on a variety of data formats - Access to files stored in HDFS or other storage systems - Query execution via MapReduce?
  • 25. HIVE ARCHITECTURE Data Model: - Tables - Partitions - Buckets SERDEs Datatypes: Common primitive data types (int, boolean, float, double, string, char, date, timestamp, …) +Complex data types (structs, maps, arrays) UI Driver Compiler Metastore Execution engine
  • 26. HIVE.NOW Hive defines a simple SQL-like query language, called HQL, that enables users familiar with SQL to query the data. Scalable and extensible. Most commonly used for: - Log analysis - Statistical analysis - Document indexing
  • 28. STINGER INITIATIVE Stinger is the initiative to improve query execution time and increase SQL functionality for Apache Hive. Microsoft and Hortonworks worked actively in the Apache community towards completing Stinger. Announced in February 2013 44 companies, 145 developers, 392,000 lines of Java code Hive 0.13 Speed: Hive on Tez, vectorized query engine & cost-based optimizer Scale: dynamic partition loads and smaller hash tables SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN Improved Hive performance up to 100x.
  • 29. STINGER.NEXT Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in Hive in the open Apache Hive community. Main goals: - transactions with ACID semantics - sub-second queries - SQL:2011 Analytics - usability improvements To be delivered in next 18 months.
  • 30. STINGER.NEXT *Photo taken from the official Hortonworks website (www.hortonworks.com)
  • 31. HIVE ON SPARK Apache Spark is a fast and general engine for large-scale data processing. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. Hive-Spark Machine Learning Integration will allow Hive users to run machine learning models via Hive.
  • 32. Q&A darko@thingsolver.com djordje@thingsolver.com milosmilovanovic@outlook.com hadoop-srbija.com
  • 33. Please rate this lecture and win Windows Phone NOKIA Lumia 1320 Help us choose the best Sinergija lecturer! Microsoft will award you – at the conference end, we’ll give one NOKIA Lumia 1320 to someone from the audience – randomly. Go to www.mssinergija.net, log in and cast your votes! You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Winner will be announced at the official Sinergija web portal, www.mssinergija.net

Notes de l'éditeur

  1. Agenda
  2. Microsoft and Hortonworks have a shared vision of open innovation in and around Apache Hadoop and a commitment to deliver that via a 100% open source platform.