Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Hadoop and IoT Sinergija 2014

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 34 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Hadoop and IoT Sinergija 2014 (20)

Publicité

Plus récents (20)

Publicité

Hadoop and IoT Sinergija 2014

  1. 1. Hadoop and IoT Darko Marjanović Đorđe Stepanić Miloš Milovanović
  2. 2. AGENDA BIG DATA HADOOP AND IOT MODEL HADOOP IOT HADOOP DATA PROCESSING HIVE STINGER INITIATIVE Q&A
  3. 3. BIG DATA Big Data describes the collection of complex and large data sets such that it’s difficult to capture, process, store, search and analyze using conventional data base systems. Anything that Won't Fit in Excel. *Definition taken from (www.bigdata-startups.com)
  4. 4. BIG DATA DIMESIONS 1992 100GB/Day 2002 100GB/Second 2013 28,000GB/Second 2018 50,000GB/Second
  5. 5. HADOOP AND IOT
  6. 6. HADOOP Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop was created by Doug Cutting and Mike Cafarella in 2005 All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and thus should be automatically handled in software by the framework.
  7. 7. HADOOP COMPONENTS Hadoop common HDFS Map Reduce YARN (Starting with Hadoop 2.x.x)
  8. 8. HADOOP HDFS The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework.
  9. 9. HADOOP MAP REDUCE Map Reduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
  10. 10. HADOOP YARN Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. YARN is now characterized as a large-scale, distributed operating system for big data applications.
  11. 11. HADOOP ECOSYSTEM The main groups of tools in the Hadoop ecosystem: Data Ingestion (Flume, Sqoop …) Data Processing (Pig, Hive, Storm …) Cluster Management(Ambari) Security (Knox)
  12. 12. DATA INGESTION Flume Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Sqoop Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. WEB HDFS REST API
  13. 13. FLUME EXAMPLE
  14. 14. SQOOP AND WEB HDFS API EXAMPLE
  15. 15. IOT
  16. 16. UBIQUITOUS COMPUTING & INTERNET OF THINGS Ubiquitous computing - trend (wave) in computing where computers are spreaded throughout our everyday environment. Concept: one person - many computers Internet Of Things - is the network of physical objects accessed through the Internet, which contains embedded technology to interact (sense and communicate) with internal states or the external environment (Cisco definition).
  17. 17. INTERNET OF THINGS COMPONENTS
  18. 18. INTERNET OF THINGS AND BIG DATA
  19. 19. REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
  20. 20. INTERNET OF THINGS - FIELDS OF APPLICATION * Production - energy savings, lower maintenance costs, prediction of machine failure, quality control etc. ** Logistic - efficient supply control , optimization of transport, environmental controls in the warehouse, JIT, lean logistics, better capacity utilization etc. Smart cities & environment - smart parking, traffic congestion, smart lighting, waste management, noise urban maps, air pollution etc. Smart agriculture eHealth and everything you can imagine...
  21. 21. HADOOP DATA PROCESSING Input: - Raw data files - No metadata - No schema Objective: - Perform analysis, run interactive queries - Explore, structure and analyze the data - Real-time processing (Apache Storm) - Visualization
  22. 22. HIVE Apache Hive is a data warehousing software that facilitates querying and managing large datasets residing in distributed storage. Hive provides: - Tools ETL processes - A mechanism for imposing a structure on a variety of data formats - Access to files stored in HDFS or other storage systems - Query execution via MapReduce?
  23. 23. HIVE ARCHITECTURE Data Model: - Tables - Partitions - Buckets SERDEs Datatypes: Common primitive data types (int, boolean, float, double, string, char, date, timestamp, …) +Complex data types (structs, maps, arrays) UI Driver Compiler Metastore Execution engine
  24. 24. HIVE.NOW Hive defines a simple SQL-like query language, called HQL, that enables users familiar with SQL to query the data. Scalable and extensible. Most commonly used for: - Log analysis - Statistical analysis - Document indexing
  25. 25. HIVE SCRIPT EXAMPLE
  26. 26. STINGER INITIATIVE Stinger is the initiative to improve query execution time and increase SQL functionality for Apache Hive. Microsoft and Hortonworks worked actively in the Apache community towards completing Stinger. Announced in February 2013 44 companies, 145 developers, 392,000 lines of Java code Hive 0.13 Speed: Hive on Tez, vectorized query engine & cost-based optimizer Scale: dynamic partition loads and smaller hash tables SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN Improved Hive performance up to 100x.
  27. 27. STINGER.NEXT Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in Hive in the open Apache Hive community. Main goals: - transactions with ACID semantics - sub-second queries - SQL:2011 Analytics - usability improvements To be delivered in next 18 months.
  28. 28. STINGER.NEXT *Photo taken from the official Hortonworks website (www.hortonworks.com)
  29. 29. HIVE ON SPARK Apache Spark is a fast and general engine for large-scale data processing. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. Hive-Spark Machine Learning Integration will allow Hive users to run machine learning models via Hive.
  30. 30. Q&A darko@thingsolver.com djordje@thingsolver.com milosmilovanovic@outlook.com hadoop-srbija.com
  31. 31. Please rate this lecture and win Windows Phone NOKIA Lumia 1320 Help us choose the best Sinergija lecturer! Microsoft will award you – at the conference end, we’ll give one NOKIA Lumia 1320 to someone from the audience – randomly. Go to www.mssinergija.net, log in and cast your votes! You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Winner will be announced at the official Sinergija web portal, www.mssinergija.net

Notes de l'éditeur

  • Agenda
  • Microsoft and Hortonworks have a shared vision of open innovation in and around Apache Hadoop and a commitment to deliver that via a 100% open source platform.

×