Introduction to Big Data Technologies & Applications

•

1 j'aime•258 vues

Big Data Myths, Current Mainstream Technologies related to Collecting, Storing, Computing & Stream Processing Data. Real-life experience with E-commerce businesses.

Données & analyses

Big Data
Technologies &
Applications
Nguyen D. Cao
December 28, 2015

Agenda
● Big Data Myths
● Big Data Technologies
● Big Data Applications
@123Mua

Big Data Myths
● People talk about Big Data all
the time: 3Vs
○ Volume
○ Variety
○ Velocity
● Business Value in Data
○ Customer Insights
○ Product Insights

Big Data Myths
VOLUME
● Data is BIG
● Storage capability of hard drives
increased massively compared
to Access speed

Big Data Myths
VARIETY
● Different kinds of data
○ Structured
○ Semi-structured
○ Unstructured
● Structured
● Semi-structured
○ Self-described Information (json,
xml, logs)
● Unstructured

Big Data Myths
VELOCITY
● Characteristics
○ How fast data available for
processing?
○ How fast the processing is?
● Data accumulation with very
high rates
○ Click streams
○ Supermarket transactions
○ Social media interactions

Big Data
Technologies
● Technologies
○ Collecting
○ Storage
○ Computation
○ Stream Processing
○ Data Mining

● Scribe is a server for
aggregating log data
that's streamed in real
time from clients.
● It is designed and
developed by
FaceBook.
● Not active any more
Scribe
Big Data
Collecting

● Kafka is a distributed,
partitioned, replicated commit
log service. It provides the
functionality of a messaging
system which allows producers
send messages over the
network to the Kafka cluster
which in turn serves them up to
consumers
Apache Kafka
Big Data
Collecting

The Hadoop Distributed File System (HDFS) is a distributed file
system designed to run on commodity hardware
Hadoop File System (HDFS)
Big Data
Storage

● NoSQL: Next Generation Databases mostly addressing
some of the points: being non-relational, distributed,
open-source and horizontally scalable.
● Types:
○ Key-Value Store
○ Document Store
○ Column Store
○ Graph Database
○ Content Delivery Network
NoSQL Datastores
Big Data
Storage

A distributed, scalable, versioned, non-relational datastore on top of
HDFS which models after Google's Bigtable.
HBase
Big Data
Storage

Hadoop MapReduce
Big Data
Computation
● Hadoop MapReduce is a software
framework for easily writing
applications which process vast
amounts of data (multi-terabyte
data-sets) in-parallel on large
clusters (thousands of nodes) of
commodity hardware in a
reliable, fault-tolerant manner.

Hadoop = HDFS + MapReduce
Big Data
Computation

Apache Spark
● Fast and general engine
for large-scale data
processing.
● Suitable for iterative
algorithms
Big Data
Computation

Apache Samza
● Apache Samza is a distributed
stream processing framework.
● Uses Kafka to guarantee that
messages are processed in the
order they were written to a
partition
● Whenever a machine in the cluster
fails, Samza works with Hadoop
YARN to transparently migrate
your tasks to another machine.
Big Data
Stream
Processing

Apache Mahout
Provide open-source implementations of distributed and
scalable machine learning algorithms focused primarily in the
areas:
● Collaborative Filtering
● Classification
● Clustering
● Dimension Reduction
Big Data
Mining

Big Data
Applications
@123Mua.vn
● Shop Dashboard
● Similar Product
Recommendation
● Personalized product
recommendation
● CPC Ads Display

Overview
~ 10,000
active shops
~ 40 Million
pageviews/month
~ 8,000
Add to Cart/day
~ 1,000
VIP shops

References
1. https://github.
com/facebookarchive/scribe/wiki/Scribe-Overview
2. http://hadoop.apache.org/
3. http://nosql-database.org/
4. http://samza.apache.org/

Contenu connexe

Tendances

Big data trends challenges opportunitiesMohammed Guller

Big Data in the Real WorldMark Kromer

Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media

Introduction to Big DataJoey Li

Great Expectations PresentationAdam Doyle

Big data analytics, survey r.nabatinabati

BigData Hadoop Kumari Surabhi

Big Data Streams Architectures. Why? What? How?Anton Nazaruk

Big data frameworksCuelogic Technologies Pvt. Ltd.

introduction to big data frameworksAmal Targhi

big data overview pptVIKAS KATARE

Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)

The evolution of data analyticsNatalino Busa

Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz

Big data analytics with hadoop volume 2Imviplav

Big data Big AnalyticsAjay Ohri

Big Tools for Big DataLewis Crawford

Big Data Architecture and Design PatternsJohn Yeung

Big Data AnalyticsTyrone Systems

Introduction to Big DataAmpoolIO

Tendances (20)

Big data trends challenges opportunities

Big Data in the Real World

Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...

Introduction to Big Data

Great Expectations Presentation

Big data analytics, survey r.nabati

BigData Hadoop

Big Data Streams Architectures. Why? What? How?

Big data frameworks

introduction to big data frameworks

big data overview ppt

Introduction to Apache Hadoop Eco-System

The evolution of data analytics

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

Big data analytics with hadoop volume 2

Big data Big Analytics

Big Tools for Big Data

Big Data Architecture and Design Patterns

Big Data Analytics

Introduction to Big Data

Similaire à Introduction to Big Data Technologies & Applications

Hadoop Training Tutorial for Freshersrajkamaltibacademy

Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari

An Open Source NoSQL solution for Internet Access Logs AnalysisJosé Manuel Ciges Regueiro

Big DataNeha Mehta

Big data processing systemshima jafari

AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty

Big data at scrapinghubDana Brophy

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB

E commerce data migration in moving systems across data centres Regunath B

Big Trends in Big DataNaresh Chintalcheru

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty

Austin bdug 2011_01_27_small_and_big_dataAlex Pinkin

Lunch & Learn Intro to Big DataMelissa Hornbostel

Data Platform in the CloudAmihay Zer-Kavod

Big data nyuEdward Capriolo

Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty

Big Data Analytics & ArchitectureAnjani Phuyal

Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic

Big data and hadoop overvewKunal Khanna

Data Science Machine Lerning Bigdat.pptxPriyadarshini648418

Similaire à Introduction to Big Data Technologies & Applications (20)

Hadoop Training Tutorial for Freshers

Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

An Open Source NoSQL solution for Internet Access Logs Analysis

Big Data

Big data processing system

AWS Big Data Demystified #1: Big data architecture lessons learned

Big data at scrapinghub

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas

E commerce data migration in moving systems across data centres

Big Trends in Big Data

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

Austin bdug 2011_01_27_small_and_big_data

Lunch & Learn Intro to Big Data

Data Platform in the Cloud

Big data nyu

Big Data in 200 km/h | AWS Big Data Demystified #1.3

Big Data Analytics & Architecture

Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

Big data and hadoop overvew

Data Science Machine Lerning Bigdat.pptx

Dernier

Introduction-to-Machine-Learning (1).pptxfirstjob4

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Data-Analysis for Chicago Crime Data 2023ymrp368

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Halmar dropshipping via API with DroFxolyaivanovalion

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Dernier (20)

Introduction-to-Machine-Learning (1).pptx

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Edukaciniai dropshipping via API with DroFx

Data-Analysis for Chicago Crime Data 2023

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

CebaBaby dropshipping via API with DroFX.pptx

BigBuy dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptx

Invezz.com - Grow your wealth with trading signals

Schema on read is obsolete. Welcome metaprogramming..pdf

Carero dropshipping via API with DroFx.pptx

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Halmar dropshipping via API with DroFx

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Mature dropshipping via API with DroFx.pptx

Introduction to Big Data Technologies & Applications

1. Big Data Technologies & Applications Nguyen D. Cao December 28, 2015

2. Agenda ● Big Data Myths ● Big Data Technologies ● Big Data Applications @123Mua

3. Big Data Myths ● People talk about Big Data all the time: 3Vs ○ Volume ○ Variety ○ Velocity ● Business Value in Data ○ Customer Insights ○ Product Insights

4. Big Data Myths VOLUME ● Data is BIG ● Storage capability of hard drives increased massively compared to Access speed

5. Big Data Myths VARIETY ● Different kinds of data ○ Structured ○ Semi-structured ○ Unstructured ● Structured ● Semi-structured ○ Self-described Information (json, xml, logs) ● Unstructured

6. Big Data Myths VELOCITY ● Characteristics ○ How fast data available for processing? ○ How fast the processing is? ● Data accumulation with very high rates ○ Click streams ○ Supermarket transactions ○ Social media interactions

8. Big Data Technologies ● Technologies ○ Collecting ○ Storage ○ Computation ○ Stream Processing ○ Data Mining

9. ● Scribe is a server for aggregating log data that's streamed in real time from clients. ● It is designed and developed by FaceBook. ● Not active any more Scribe Big Data Collecting

10. ● Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system which allows producers send messages over the network to the Kafka cluster which in turn serves them up to consumers Apache Kafka Big Data Collecting

11. Apache Kafka (II) Big Data Collecting

12. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware Hadoop File System (HDFS) Big Data Storage

13. ● NoSQL: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. ● Types: ○ Key-Value Store ○ Document Store ○ Column Store ○ Graph Database ○ Content Delivery Network NoSQL Datastores Big Data Storage

14. NoSQL: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. NoSQL Datastores Big Data Storage

15. A distributed, scalable, versioned, non-relational datastore on top of HDFS which models after Google's Bigtable. HBase Big Data Storage

16. Hadoop MapReduce Big Data Computation ● Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

17. Hadoop = HDFS + MapReduce Big Data Computation

18. Apache Spark ● Fast and general engine for large-scale data processing. ● Suitable for iterative algorithms Big Data Computation

19. Apache Samza ● Apache Samza is a distributed stream processing framework. ● Uses Kafka to guarantee that messages are processed in the order they were written to a partition ● Whenever a machine in the cluster fails, Samza works with Hadoop YARN to transparently migrate your tasks to another machine. Big Data Stream Processing

20. Apache Mahout Provide open-source implementations of distributed and scalable machine learning algorithms focused primarily in the areas: ● Collaborative Filtering ● Classification ● Clustering ● Dimension Reduction Big Data Mining

21. Big Data Applications @123Mua.vn ● Shop Dashboard ● Similar Product Recommendation ● Personalized product recommendation ● CPC Ads Display

22. Overview ~ 10,000 active shops ~ 40 Million pageviews/month ~ 8,000 Add to Cart/day ~ 1,000 VIP shops

23.

24.

25.

26. Product Performance

27. Product Performance

28. Similar Product Sources

29. References 1. https://github. com/facebookarchive/scribe/wiki/Scribe-Overview 2. http://hadoop.apache.org/ 3. http://nosql-database.org/ 4. http://samza.apache.org/