This document discusses big data, Hadoop, NoSQL databases, and graph databases. It provides an overview of these topics and outlines potential uses for a telecommunications company, such as using big data to prevent customer churn, offer customer-specific campaigns, and get more customers. The document includes definitions and examples of key concepts like Hadoop, MapReduce, NoSQL databases, and the graph database Neo4j. It also summarizes trends in big data and provides examples of how telecom companies can analyze call detail records, model networks, and manage master customer data using these technologies.
1. Big Data – Hadoop - NoSQL and Graph Database
Ramazan FIRIN
20.11.2012
This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are
confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission
of this document in any manner to any third parties that are not authorised to receive.
2. AGENDA
• Big Data
• Hadoop
• NoSQL
• Graph DB and Neoj
• Possible Usage in Tellco
• Demo
2
3. Executive Summary
• Big Data is a new IT trend
• Hadoop and NoSQL can used to process Big Data
• Possible usage area in Tellco :
- Prevent Churn
- to offer customer spesific campaign
- to get more customer
AVEA 3 R&D /MW Developement
4. What is Big Data?
Datasets that are too awkward to work with using traditional,
hands-ondatabase management tools.
4
6. Big Data Sources
1. Social network profiles -Facebook, LinkedIn, Yahoo, Google
2. Social influencers - blog comments, user forums, review sites,
3. Activity-generated data - application logs, sensor data
4. Public—Wikipedia, IMDb, etc
5. Data warehouse appliances - transactional data
6. Network and in-stream monitoring
7. Legacy documents—
6
7. Big Data To Smart Data
Cover of The Economist
7
9. New Data Sources - Internet
• 2 Billion internet users by 2011
• Twitter processes 7 terabytes data of every day
• Facebook processes 10 terabytes data of every day
• 4.6 billion mobile phone
• Google processes 24 petabytes data of every day
9
21. Gartner: Top 10 IT Trends for 2013
Avea 21 21R&D /MW Developement
22. Gartner:10 Critical IT Trends For The Next Five
Years
• Third trend is Bigger data and storage:
• By 2015, big data demand will generate 1 million jobs in the Global
1000,
• but only a one-third of jobs will get filled due to shortage of talent.
• Analytics and pattern recognition are key.
• Seeing new specialized ARM-based servers to do specialty analytics.
Avea 22 22R&D /MW Developement
24. What is HADOOP?
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming models
24
28. Hadoop Ecosystem
Pig - simplifies hadoop programming, data processing language
Hive - SQL like queries
HBase - Random read/write, billions of row and millions of colums
(NoSQL)
28
33. What is NoSQL?
• Stands for Not Only SQL
• Non relational
• Cheap, Easy to implement
• Scalability
– Vertically - Add more data
– Horizontally - Add more storage
• No pre-defined schema
• No join operations
• Not ACID, support CAP threom
33
34. NoSQL DB Types
1. Key-values Stores
2. Document Databases
3. Column Family Stores
4. Graph Databases
34
39. RMDBS Support ACID
• Atomicity - a transaction is all or nothing
• Consistency - only valid data is written to the database
• Isolation - pretend all transactions are happening serially and the data
is correct
• Durability - what you write is what you get
39
41. NoSQL Support CAP Theorem
• Consistency - each client always has the same view of the data.
• Availability - all clients can always read and write.
• Partition tolerance - if one or more nodes fails the system still works
You can pick only two...
41
42. Visual Guide to NoSQL Systems
Avea 42 42R&D /MW Developement
47. Graph DB
Graph database uses graph structures with nodes, edges, and properties
to represent and store data.
47
48. Graph DB Usage Area
• Recommendations • Time Series data
• Business Inteligence • Product Catalogue
• Social networking • Web Analitics
• MDM • Scientific Computing
• System Management • Indexing your slow
RMDBS
48
50. Neo4j
• Leading Graph • Opensource
Database
• Transaction • Traversal framework
support (ACID)
• High Performance
• Indexing (traverse 1.000.000 +
relationship/seconds)
• Querying
• REST support • Robust (in 7/24 operation
since 2003)
• Disk Based
• Massive scalability
50
51. Neo4j Data Model
Neo4j has Nodes and Relationship.
Nodes and realtionships have properties.
Relationship type : knows
Node1 Property : Date of meeting Node2
Relationship
Property:name
Property:name
Property:surname
Property:surname
51
53. Who use Neo4j?
• Cisco - Master Data Management
• Telenor Group : Customer organization scructure (203 million
subscribers )
• Deutsche Telekom: Social football site (150 million subscribers )
53
This template can be used as a starter file to give updates for project milestones.SectionsRight-click on a slide to add sections. Sections can help to organize your slides or facilitate collaboration between multiple authors.NotesUse the Notes section for delivery notes or to provide additional details for the audience. View these notes in Presentation View during your presentation. Keep in mind the font size (important for accessibility, visibility, videotaping, and online production)Coordinated colors Pay particular attention to the graphs, charts, and text boxes.Consider that attendees will print in black and white or grayscale. Run a test print to make sure your colors work when printed in pure black and white and grayscale.Graphics, tables, and graphsKeep it simple: If possible, use consistent, non-distracting styles and colors.Label all graphs and tables.
What is the project about?Define the goal of this projectIs it similar to projects in the past or is it a new effort?Define the scope of this projectIs it an independent project or is it related to other projects?* Note that this slide is not necessary for weekly status meetings
* If any of these issues caused a schedule delay or need to be discussed further, include details in next slide.
Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.