SlideShare une entreprise Scribd logo
1  sur  15
DWH & BigData – 
architecture approaches 
Odessa 
Vladimir Slobodianiuk 
Date: 2014 
www.luxoft.com
Agenda 
www.luxoft.com 
1 
2 
Big Data – what is it 
Hadoop vs RDBMS – pros and cons 
3 Hadoop & Enterprise architecture 
4 Hadoop as ETL engine
Big Data 
– what is it 
www.luxoft.com
Current state 
 Big data - is an all-encompassing term for any collection of data sets so large and 
complex that it becomes difficult to process using traditional data processing 
applications. 
www.luxoft.com
Limitations & Problems 
www.luxoft.com 
 Big data is difficult to work with using 
most relational databases, requiring 
instead massively parallel software 
running on tens, hundreds, or even 
thousands of servers 
 eBay.com uses two data warehouses at 7.5 petabytes 
 Walmart handles more than 1 million customer 
transactions every hour 
 Facebook handles 50 billion photos from its user base 
 In 2012, the Obama administration announced the Big 
Data Research and Development Initiative
Hadoop vs RDBMS 
www.luxoft.com
CORE HADOOP - MapReduce 
In 2004, Google published a paper on a process called MapReduce 
www.luxoft.com 
 DISTRIBUTED 
COMPUTING 
FRAMEWORK 
 Process large jobs in 
parallel across many 
nodes and combine the 
results
Hadoop Structure 
www.luxoft.com 
 HDFS is a distributed file system designed to run on commodity hardware 
 HBase store data rows in labelled tables (sortable key and an arbitrary number of columns) 
 Hive provide data summarization, query, and analysis (SQL-like interface) 
 Pig is a platform for analyzing large data sets that consists of a high-level language
Hadoop vs RDBMS 
www.luxoft.com 
Hadoop RDBMS 
 Performance for relational data 
 Machine query optimization 
 Mature workload management 
 High concurrency interactive query 
processing 
 Schema-less Model 
 Human query optimization 
 Ability to create complex dataflow 
with multiple inputs and outputs 
 Parallelize many Analytic Functions 
How might this change in the future 
 Query Optimization Improvements in Hive 
– Statistics, better join ordering, more join types, etc 
 Startup Time Improvements 
– Simpler query plans to pass out 
 Runtime Performance Improvements
Hadoop & 
Enterprise architecture 
www.luxoft.com
Classic architecture approach 
www.luxoft.com
Hadoop & Enterprise architecture 
www.luxoft.com
Luxoft Big Data R&D 
Hadoop as ETL Data Quality tool 
www.luxoft.com 
BENEFITS 
 Reduced TCO (commodity hardware usage) 
 Traceability of all the data quality issues 
 Hadoop becomes clean data tool. 
PROBLEM 
Traditional tools show poor performance in exception 
and data cleansing. 
SOLUTION 
Hadoop transforms the data into single format and 
processes it using data cleansing workflows.
Summary 
Big Data: 
 
Cutting edge of DI technologies 
 
State-of-the-art design approaches 
 
A bit more than simple development, it's some of art, art 
of data management 
www.luxoft.com
THANK YOU 
www.luxoft.com

Contenu connexe

Tendances

Data Center Operating System
Data Center Operating SystemData Center Operating System
Data Center Operating SystemKeshav Yadav
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-presDaniel Abadi
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowGary Stafford
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 
Spark Data Streaming Pipeline
Spark Data Streaming PipelineSpark Data Streaming Pipeline
Spark Data Streaming PipelineJonathan Bradshaw
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionDataWorks Summit
 

Tendances (20)

Data Center Operating System
Data Center Operating SystemData Center Operating System
Data Center Operating System
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Case study on big data
Case study on big dataCase study on big data
Case study on big data
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016Tropos.io - Hadoop in the Cloud - BA4ALL 2016
Tropos.io - Hadoop in the Cloud - BA4ALL 2016
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Hadoop
HadoopHadoop
Hadoop
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
Spark Data Streaming Pipeline
Spark Data Streaming PipelineSpark Data Streaming Pipeline
Spark Data Streaming Pipeline
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
 

Similaire à Владимир Слободянюк «DWH & BigData – architecture approaches»

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 

Similaire à Владимир Слободянюк «DWH & BigData – architecture approaches» (20)

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Hadoop
HadoopHadoop
Hadoop
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 

Plus de Anna Shymchenko

Константин Маркович: "Creating modular application using Spring Boot "
Константин Маркович: "Creating modular application using Spring Boot "Константин Маркович: "Creating modular application using Spring Boot "
Константин Маркович: "Creating modular application using Spring Boot "Anna Shymchenko
 
Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...
Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...
Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...Anna Shymchenko
 
Евгений Руднев: "Programmers Approach to Error Handling"
Евгений Руднев: "Programmers Approach to Error Handling"Евгений Руднев: "Programmers Approach to Error Handling"
Евгений Руднев: "Programmers Approach to Error Handling"Anna Shymchenko
 
Александр Куцан: "Static Code Analysis in C++"
Александр Куцан: "Static Code Analysis in C++" Александр Куцан: "Static Code Analysis in C++"
Александр Куцан: "Static Code Analysis in C++" Anna Shymchenko
 
Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club”
Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club” Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club”
Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club” Anna Shymchenko
 
Орхан Гасимов: "Reactive Applications in Java with Akka"
Орхан Гасимов: "Reactive Applications in Java with Akka"Орхан Гасимов: "Reactive Applications in Java with Akka"
Орхан Гасимов: "Reactive Applications in Java with Akka"Anna Shymchenko
 
Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"
Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"
Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"Anna Shymchenko
 
Денис Прокопюк: “JMX in Java EE applications”
Денис Прокопюк: “JMX in Java EE applications”Денис Прокопюк: “JMX in Java EE applications”
Денис Прокопюк: “JMX in Java EE applications”Anna Shymchenko
 
Роман Яворский "Introduction to DevOps"
Роман Яворский "Introduction to DevOps"Роман Яворский "Introduction to DevOps"
Роман Яворский "Introduction to DevOps"Anna Shymchenko
 
Максим Сабарня “NoSQL: Not only SQL in developer’s life”
Максим Сабарня “NoSQL: Not only SQL in developer’s life” Максим Сабарня “NoSQL: Not only SQL in developer’s life”
Максим Сабарня “NoSQL: Not only SQL in developer’s life” Anna Shymchenko
 
Андрей Лисниченко "SQL Injection"
Андрей Лисниченко "SQL Injection"Андрей Лисниченко "SQL Injection"
Андрей Лисниченко "SQL Injection"Anna Shymchenko
 
Светлана Мухина "Metrics on agile projects"
Светлана Мухина "Metrics on agile projects"Светлана Мухина "Metrics on agile projects"
Светлана Мухина "Metrics on agile projects"Anna Shymchenko
 
Андрей Слободяник "Test driven development using mockito"
Андрей Слободяник "Test driven development using mockito"Андрей Слободяник "Test driven development using mockito"
Андрей Слободяник "Test driven development using mockito"Anna Shymchenko
 
Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Anna Shymchenko
 
Даурен Муса “IBM WebSphere - expensive but effective”
Даурен Муса “IBM WebSphere - expensive but effective” Даурен Муса “IBM WebSphere - expensive but effective”
Даурен Муса “IBM WebSphere - expensive but effective” Anna Shymchenko
 
Александр Пашинский "Reinventing Design Patterns with Java 8"
Александр Пашинский "Reinventing Design Patterns with Java 8"Александр Пашинский "Reinventing Design Patterns with Java 8"
Александр Пашинский "Reinventing Design Patterns with Java 8"Anna Shymchenko
 
Евгений Капинос "Advanced JPA (Java Persistent API)"
Евгений Капинос "Advanced JPA (Java Persistent API)"Евгений Капинос "Advanced JPA (Java Persistent API)"
Евгений Капинос "Advanced JPA (Java Persistent API)"Anna Shymchenko
 
Event-driven architecture with Java technology stack
Event-driven architecture with Java technology stackEvent-driven architecture with Java technology stack
Event-driven architecture with Java technology stackAnna Shymchenko
 
Do we need SOLID principles during software development?
Do we need SOLID principles during software development?Do we need SOLID principles during software development?
Do we need SOLID principles during software development?Anna Shymchenko
 
Guava - Elements of Functional Programming
Guava - Elements of Functional Programming Guava - Elements of Functional Programming
Guava - Elements of Functional Programming Anna Shymchenko
 

Plus de Anna Shymchenko (20)

Константин Маркович: "Creating modular application using Spring Boot "
Константин Маркович: "Creating modular application using Spring Boot "Константин Маркович: "Creating modular application using Spring Boot "
Константин Маркович: "Creating modular application using Spring Boot "
 
Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...
Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...
Евгений Бова: "Modularity in Java: introduction to Jigsaw through the prism o...
 
Евгений Руднев: "Programmers Approach to Error Handling"
Евгений Руднев: "Programmers Approach to Error Handling"Евгений Руднев: "Programmers Approach to Error Handling"
Евгений Руднев: "Programmers Approach to Error Handling"
 
Александр Куцан: "Static Code Analysis in C++"
Александр Куцан: "Static Code Analysis in C++" Александр Куцан: "Static Code Analysis in C++"
Александр Куцан: "Static Code Analysis in C++"
 
Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club”
Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club” Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club”
Алесей Решта: “Robotics Sport & Luxoft Open Robotics Club”
 
Орхан Гасимов: "Reactive Applications in Java with Akka"
Орхан Гасимов: "Reactive Applications in Java with Akka"Орхан Гасимов: "Reactive Applications in Java with Akka"
Орхан Гасимов: "Reactive Applications in Java with Akka"
 
Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"
Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"
Евгений Хыст: "Server-Side Geo-Clustering Based on Geohash"
 
Денис Прокопюк: “JMX in Java EE applications”
Денис Прокопюк: “JMX in Java EE applications”Денис Прокопюк: “JMX in Java EE applications”
Денис Прокопюк: “JMX in Java EE applications”
 
Роман Яворский "Introduction to DevOps"
Роман Яворский "Introduction to DevOps"Роман Яворский "Introduction to DevOps"
Роман Яворский "Introduction to DevOps"
 
Максим Сабарня “NoSQL: Not only SQL in developer’s life”
Максим Сабарня “NoSQL: Not only SQL in developer’s life” Максим Сабарня “NoSQL: Not only SQL in developer’s life”
Максим Сабарня “NoSQL: Not only SQL in developer’s life”
 
Андрей Лисниченко "SQL Injection"
Андрей Лисниченко "SQL Injection"Андрей Лисниченко "SQL Injection"
Андрей Лисниченко "SQL Injection"
 
Светлана Мухина "Metrics on agile projects"
Светлана Мухина "Metrics on agile projects"Светлана Мухина "Metrics on agile projects"
Светлана Мухина "Metrics on agile projects"
 
Андрей Слободяник "Test driven development using mockito"
Андрей Слободяник "Test driven development using mockito"Андрей Слободяник "Test driven development using mockito"
Андрей Слободяник "Test driven development using mockito"
 
Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"
 
Даурен Муса “IBM WebSphere - expensive but effective”
Даурен Муса “IBM WebSphere - expensive but effective” Даурен Муса “IBM WebSphere - expensive but effective”
Даурен Муса “IBM WebSphere - expensive but effective”
 
Александр Пашинский "Reinventing Design Patterns with Java 8"
Александр Пашинский "Reinventing Design Patterns with Java 8"Александр Пашинский "Reinventing Design Patterns with Java 8"
Александр Пашинский "Reinventing Design Patterns with Java 8"
 
Евгений Капинос "Advanced JPA (Java Persistent API)"
Евгений Капинос "Advanced JPA (Java Persistent API)"Евгений Капинос "Advanced JPA (Java Persistent API)"
Евгений Капинос "Advanced JPA (Java Persistent API)"
 
Event-driven architecture with Java technology stack
Event-driven architecture with Java technology stackEvent-driven architecture with Java technology stack
Event-driven architecture with Java technology stack
 
Do we need SOLID principles during software development?
Do we need SOLID principles during software development?Do we need SOLID principles during software development?
Do we need SOLID principles during software development?
 
Guava - Elements of Functional Programming
Guava - Elements of Functional Programming Guava - Elements of Functional Programming
Guava - Elements of Functional Programming
 

Dernier

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Dernier (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Владимир Слободянюк «DWH & BigData – architecture approaches»

  • 1. DWH & BigData – architecture approaches Odessa Vladimir Slobodianiuk Date: 2014 www.luxoft.com
  • 2. Agenda www.luxoft.com 1 2 Big Data – what is it Hadoop vs RDBMS – pros and cons 3 Hadoop & Enterprise architecture 4 Hadoop as ETL engine
  • 3. Big Data – what is it www.luxoft.com
  • 4. Current state  Big data - is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications. www.luxoft.com
  • 5. Limitations & Problems www.luxoft.com  Big data is difficult to work with using most relational databases, requiring instead massively parallel software running on tens, hundreds, or even thousands of servers  eBay.com uses two data warehouses at 7.5 petabytes  Walmart handles more than 1 million customer transactions every hour  Facebook handles 50 billion photos from its user base  In 2012, the Obama administration announced the Big Data Research and Development Initiative
  • 6. Hadoop vs RDBMS www.luxoft.com
  • 7. CORE HADOOP - MapReduce In 2004, Google published a paper on a process called MapReduce www.luxoft.com  DISTRIBUTED COMPUTING FRAMEWORK  Process large jobs in parallel across many nodes and combine the results
  • 8. Hadoop Structure www.luxoft.com  HDFS is a distributed file system designed to run on commodity hardware  HBase store data rows in labelled tables (sortable key and an arbitrary number of columns)  Hive provide data summarization, query, and analysis (SQL-like interface)  Pig is a platform for analyzing large data sets that consists of a high-level language
  • 9. Hadoop vs RDBMS www.luxoft.com Hadoop RDBMS  Performance for relational data  Machine query optimization  Mature workload management  High concurrency interactive query processing  Schema-less Model  Human query optimization  Ability to create complex dataflow with multiple inputs and outputs  Parallelize many Analytic Functions How might this change in the future  Query Optimization Improvements in Hive – Statistics, better join ordering, more join types, etc  Startup Time Improvements – Simpler query plans to pass out  Runtime Performance Improvements
  • 10. Hadoop & Enterprise architecture www.luxoft.com
  • 12. Hadoop & Enterprise architecture www.luxoft.com
  • 13. Luxoft Big Data R&D Hadoop as ETL Data Quality tool www.luxoft.com BENEFITS  Reduced TCO (commodity hardware usage)  Traceability of all the data quality issues  Hadoop becomes clean data tool. PROBLEM Traditional tools show poor performance in exception and data cleansing. SOLUTION Hadoop transforms the data into single format and processes it using data cleansing workflows.
  • 14. Summary Big Data:  Cutting edge of DI technologies  State-of-the-art design approaches  A bit more than simple development, it's some of art, art of data management www.luxoft.com