SlideShare une entreprise Scribd logo
1  sur  54
A Graph Service for Global Web
Entities Traversal and Reputation
Evaluation Based on HBase
Chris Huang, Scott Miao
2014/5/5
Who are we
• Scott Miao
• Developer, SPN, Trend Micro
• Worked on hadoop ecosystem since 2011
• Expertise in HDFS/MR/HBase
• Contributor for HBase/HDFS
• @takeshi.miao
• Chris Huang
• RD Manager, SPN, Trend Micro
• Hadoop Architect
• Worked on hadoop ecosystem since 2009
• Contributor for Bigtop
• @chenhsiu48
Our blog ‘Dumbo in TW’: http://dumbointaiwan.blogspot.tw/
Challenges We Faced
New Unique Malware Discovered
http://www.av-test.org/en/statistics/malware/
Social Engineering vs. Cyber Attacks
Trend Micro Defense Strategy
Layer of Protection
Exposure
Infection
Dynamic
Web, Emails
File Signatures
File Behaviors
See The Threat Entity
Connectivity
Connectivity From Different Data Sources
Threat
Connect
Sand-
box
File
Detecti
on
Threat
Web
Web
Reputa
tionFamily
Write-
up
TE
Virus
DB
APT KB
ThreatWeb: Threat Entities as a Graph
Threat Entities Relation Graph
F
D
I
DD
D
I
I
D
DF
F
D
I
I
I
E
E
E
E
E
E
I
I
ID
D
D
D
F
D
D F
I
I
F
D
E
File
IP
Domain
Email
Most Entity Reputations are Unknown
F
D
I
DD
D
I
I
D
DF
F
D
I
I
I
E
E
E
E
E
E
I
I
ID
D
D
D
F
D
D F
I
I
F
D
E
File
IP
Domain
Email
Security Solution Dilemma – Long Tail
Prevalence
Entities
Known good/bad
Traditional heuristic detection
Big Datacan help!
How can we detect the rest
effectively?
Inspired by PageRank
• Too many un-visited pages!
• Users browse pages through
links
• Let users’ clicks (BIG DATA) tell
us the rankings of those un-
visited pages!
Revised PageRank Algorithm
• Too many un-rated threat
entities!
• Malware activities interact with
threat entitles
• Let malware’s behaviors (BIG
DATA) tell us the reputations
of those un-rated threat entities!
F
D
I
DD
D
I
I
D
DF
F
D
II
I
E
E
E
E E
E
I
I
ID
D
D
D
F
D
D F
I
The Graph Problem
The Problems
• Store large size of Graph data
• Access large size of Graph data
• Process large size of Graph data
Data volume
• Dump ~450MB (150 bytes * 3,000,000 records) data
into Graph per day
– Extract from 3GB of data
• Keep it for 3 month
– ~450MB * 90 = ~40,500MB = ~39GB
– With Snappy compression
– ~20 - 22GB
• Dataset
– ~40,000,000 vertices and ~100,000,000 edges
• Data query volume about hundreds of thousands per
day
BIG
GRAPH
!!
http://news.gamme.com.tw/544510
Store
Property Graph Model
Name: Jack
Sex: Male
Marriage: true
Name: Merry
Sex: Female
Marriage: true
marries
Name: Vivian
Sex: Female
Marriage: false
Name: John
Sex: Male
Marriage: true
Date: 2010/5/5
Name: Emily
Sex: Female
Marriage: true
From a soap opera…
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-
Model
• We use HBase as a Graph Storage
– Google BigTable and PageRank
– HBaseCon2012
• Storing and manipulating graphs in HBase
The winner is…
Massive
scalable ?
Active
community ?
Analyzable ?
Use HBase to store Graph data (1/3)
• Tables
– create 'vertex', {NAME => 'property',
BLOOMFILTER => 'ROW', COMPRESSION
=> ‘SNAPPY', TTL => '7776000'}
– create 'edge', {NAME => 'property',
BLOOMFILTER => 'ROW', COMPRESSION
=> ‘SNAPPY', TTL => '7776000'}
Use HBase to store Graph data (2/3)
• Schema design
– Table: vertex
– Table: edge
‘<vertex-id>||<entity-type>’, ‘property:<property-key>@<property-value-type>’,
<property-value>
‘<vertex1-row-key>--><label>--><vertex2-row-key>’,
‘property:<property-key>@<property-value-type>’, <property-value>
Use HBase to store Graph data (3/3)
• Sample
– Table: vertex
– Table: edge
‘myapps-ups.com||domain’, ‘property:ip@String’, ‘…’
‘myapps-ups.com||domain’, ‘property:asn@String’, ‘…’
…
‘track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:path@String’, ‘…’
‘track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:parameter@String’, ‘…’
‘myapps-ups.com||domain-->host-->track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’,
‘property:property1’, ‘…’
‘myapps-ups.com||domain-->host-->track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’,
‘property:property2’, ‘…’
Keep your rowkey length short
• With long rowkey length
– It does not impact your query performance
– But it does impact your algorithm MR
• OutOfMemoryException
• Use something like HASH function to keep
your rowkey length short
– Use the hash value as rowkey
– Put the original value into a property
Overall Architecture
Src
table
snapshot
Clone
table
Clone
table
Graph
table
snapshot
Clone
table
Clone
table
HBase
Source
C
Source
B
source
A
HDFS
1. Collect data
2. reprocess
& dump data
3. Get Data
Client
4. Process Data
Algorithms
Intermediate data
On HDFS
Src
table
snapsho
t
Clone
table
Clone
table
Graph
table
snapsho
t
Clone
table
Clone
table
HBase
Source
C
Source
B
source
A
HDFS
reprocess &
dump data
Client
Algorithms
Intermediate data
On HDFS
Preprocess and Dump Data
• HBase schema design is simple and human-
readable
• It is easy to write your dumping tool if needed
– MR/Pig/Completebulkload
– Can write cron-job to clean up the broken-edge
data
– TTL can also help to retire old data
• We already have a lot practices for these
tasks
Access
Src
table
snapsho
t
Clone
table
Clone
table
Graph
table
snapsho
t
Clone
table
Clone
table
HBase
Source
C
Source
B
source
A
HDFS
Get Data
Client
Algorithms
Intermediate data
On HDFS
Get Data (1/2)
• A Graph API
• A better semantic for manipulating Graph data
– As a wrapper for HBase Client API
– Rather than use HBase Client API directly
• A malware exploring sample
Vertex vertex = this.graph.getVertex(“malware");
Vertex subVertex = null;
Iterable<Edge> edges =
vertex.getEdges(Direction.OUT, “connect", “infect", “trigger");
for(Edge edge : edges) {
subVertex = edge.getVertex(Direction.OUT);
...
}
Get Data (2/2)
• We implement blueprints API
– It provides interfaces as spec. for users to impl.
• 824 stars, 173 forks on github
– We can get more benefits from it
• plug-and-play different Blueprints-enabled graph
backends
– Traversal language, RESTful server, dataflow, etc
– http://www.tinkerpop.com/
– Currently basic query methods are implemented
Clients
• Real time Client
– Client systems
• they need associated Graph data for a specific entity via RESTful API
– Usually retrieve two levels of graph data
– Quick responsiveness supported by HBase
• With rowkey random access and appropriate schema design
• HTable.get(),Scan.setStartRow(), Scan.setStopRow()
• Batch client
– Threat experts
– Pick one entity and how many levels interested in, generate a graph file
format used by tools
• To visualize and navigate what whether users interested in
• Graph Exploring Tools
– Threat experts
– Find out sub-graphs by given criteria
• E.g. How many levels or associated vertices
Malware Exploring Performance (1/3)
• one request
– Use Malware exploring sample again
– 1 vertex with 2 levels associated instances (2 ~ 9 vertices)
• Dataset
– 42,133,610 vertices and 108,355,774 edges
• Total requests
– 31,764 requests * 100 clients = 3,176,400
Vertex vertex = this.graph.getVertex(“malware");
Vertex subVertex = null;
Iterable<Edge> edges =
vertex.getEdges(Direction.OUT, “connect", “infect", “trigger");
for(Edge edge : edges) {
subVertex = edge.getVertex(Direction.OUT);
...
}
Malware Exploring Performance (2/3)
Malware Exploring Performance (3/3)
• Some statistics
– Mean: 51.61 ms
– Standard Deviation: 653.57 ms
– Empirical rule: 68%, 95%, 99.7%
• 99.7% of requests below 2.1 seconds
• But response time variances still happen
– Use Cache layer between client and HBase
– Warm-up after new data come in
Process
Src
table
snapshot
Clone
table
Clone
table
Graph
table
snapshot
Clone
table
Clone
table
HBase
Source
C
Source
B
source
A
HDFS
Client
Process Data
Algorithms
Intermediate data
On HDFS
• Human-readable HBase schema design
– Write your own MR
– Write your own Pig/UDFs
• So we can write the algorithms to further
process our graph data
– To predict unknown reputation by known threats
– E.g. a revised PageRank algorithm
Data process flow
Src
table
snapshot
Clone
table Data on HDFS
Algorithms
(MR, Pig UDF)
Clone
table
Processed completed
Graph
table
snapshot
Clone
table
Clone
table
Intermediate data on HDFS
HBase
1. Dump daily
data
2. Take
snapshot
3. Clone
snapshot
4. Process data
iteratively
(takes hours)
4.1 generate
Intermediate
data
5. Process
complete 6. Dump
processed data
with timerange
A customized TableInputFormat (1/2)
• One Mapper for one region by default
– Each Mapper process too much data
• OutOfMemoryException
• Too long to process
– Use small split region size ?
• Will overload your HBase cluster !!
• Before: about ~40 Mappers
• After: about ~500 Mappers
A customized TableInputFormat (2/2)
Clone
Graph
Table
MR - Pick
Candidates
Combination
Candidates
list file
<encodedRegionName>t<startKey>t<endKey>
…
d3d1749f3486e850b263c7ecb2424dd3tstartKey_1tendKey_1
d3d1749f3486e850b263c7ecb2424dd3tstartKey_2tendKey_2
d3d1749f3486e850b263c7ecb2424dd3tstartKey_3tendKey_3
Cd91c08d656a19bdb180e0b7f8896575tstartKey_4tendKey_4
Cd91c08d656a19bdb180e0b7f8896575tstartKey_5tendKey_5
…
CustTableInp
utFormat
MR - algorithm
2. Scan table3. Output candidates
4. Load candidates
5. Run Algo. with
more Mappers
1. Run MR
HGraph
• A project is open and put on github
– https://github.com/trendmicro/HGraph
• A partial impl. released from our internal project
– Follow HBase schema design
– Read data via Blueprints API
– Process data with our pagerank default impl.
• Download or ‘git clone’ it
– Use ‘mvn clean package’
– Run on unix-like OS
• Use windows may encounter some errors
PageRank
Result
Experiment Result
• Testing Dataset
– 42,133,610 vertices and 108,355,774 edges
– 1 vertex usually associates 2 ~ 9 vertices
– 4.13% of the vertices are known bad
– 0.09% of the vertices are known good
– The rests are unknown
• Result
– Runs 34hrs for running 23 iterations.
– 1,291 unknown vertices are ranked out
– Top 200 has 99% accuracy (explain later)
Suspicious DGA Discovered
• 3nkp***cq-----x.esf.sinkdns.org
– 196 domains from Domain Generated Algorithms
https://www.virustotal.com/en/url/871004bd9a0fe27e61b0519ceb8457528ea00da0e7ffdc44d98e759ab3e3caa1/analysis/
Untested But Highly Malware Related IP
• 67.*.*.132
– Categorized as “Computers / Internet”, not tested
https://www.virustotal.com/en/ip-address/67.*.*.132/information/
Prevalence
Entities
Knowngood/bad
Traditionalheuristicdetection
IntelligencefromBigDataAnalyt
(90%)
Security in Old Days
Cannot Protect What
You Cannot See
Next Generation Security
Unleash the Power of Data
Discover What We Don’t Know
Q&A
Backups
Property Graph Model (2/2)
Property Graph Model Definition
• A property graph has these elements
– a set of vertices
• each vertex has a unique identifier.
• each vertex has a set of outgoing edges.
• each vertex has a set of incoming edges.
• each vertex has a collection of properties defined by a map from key to value.
– a set of edges
• each edge has a unique identifier.
• each edge has an outgoing tail vertex.
• each edge has an incoming head vertex.
• each edge has a label that denotes the type of relationship between its two vertices.
• each edge has a collection of properties defined by a map from key to value.
About regions
• Keep reasonable amount of regions for each
regionserver
• Notice your splitted regions from one table
– Dump data daily, cause regions splitting
– Make sure your regions scattered evenly on each
regionserver
<hbase.regionserver.global.memstore.upperLimit> / <hbase.hregion.memstore.flush.size> =
<active-regions-per-rs>
e.g. (10G * 0.4) / 128MB = 32 active regions HBase Sizing Notes by Lars George

Contenu connexe

Tendances

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSTreasure Data, Inc.
 

Tendances (20)

A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Hadoop to spark-v2
Hadoop to spark-v2Hadoop to spark-v2
Hadoop to spark-v2
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
 

En vedette

從改寫後台 jQuery 開始的 Vue.js 宣告式渲染
從改寫後台 jQuery 開始的 Vue.js 宣告式渲染從改寫後台 jQuery 開始的 Vue.js 宣告式渲染
從改寫後台 jQuery 開始的 Vue.js 宣告式渲染Sheng-Han Su
 
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...Amazon Web Services
 
Neural Turing Machine Tutorial
Neural Turing Machine TutorialNeural Turing Machine Tutorial
Neural Turing Machine TutorialMark Chang
 
Stateless authentication with OAuth 2 and JWT - JavaZone 2015
Stateless authentication with OAuth 2 and JWT - JavaZone 2015Stateless authentication with OAuth 2 and JWT - JavaZone 2015
Stateless authentication with OAuth 2 and JWT - JavaZone 2015Alvaro Sanchez-Mariscal
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計鍾誠 陳鍾誠
 
Chat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlowChat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlowJeongkyu Shin
 
Lean Prototyping - A Practical Guide
Lean Prototyping - A Practical GuideLean Prototyping - A Practical Guide
Lean Prototyping - A Practical GuideFramebench
 
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)Lean Analytics
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAmazon Web Services
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 
[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析台灣資料科學年會
 
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務台灣資料科學年會
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務台灣資料科學年會
 
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨台灣資料科學年會
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用台灣資料科學年會
 
[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單
[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單
[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單台灣資料科學年會
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123台灣資料科學年會
 

En vedette (20)

從改寫後台 jQuery 開始的 Vue.js 宣告式渲染
從改寫後台 jQuery 開始的 Vue.js 宣告式渲染從改寫後台 jQuery 開始的 Vue.js 宣告式渲染
從改寫後台 jQuery 開始的 Vue.js 宣告式渲染
 
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
 
Neural Turing Machine Tutorial
Neural Turing Machine TutorialNeural Turing Machine Tutorial
Neural Turing Machine Tutorial
 
Stateless authentication with OAuth 2 and JWT - JavaZone 2015
Stateless authentication with OAuth 2 and JWT - JavaZone 2015Stateless authentication with OAuth 2 and JWT - JavaZone 2015
Stateless authentication with OAuth 2 and JWT - JavaZone 2015
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計
 
Chat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlowChat bot making process using Python 3 & TensorFlow
Chat bot making process using Python 3 & TensorFlow
 
Lean Prototyping - A Practical Guide
Lean Prototyping - A Practical GuideLean Prototyping - A Practical Guide
Lean Prototyping - A Practical Guide
 
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)
Lean Analytics for Intrapreneurs (Lean Startup Conf 2013)
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
 
[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
[DSC 2016] 系列活動:李祈均 / 人類行為大數據分析
 
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
 
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務
 
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
手把手教你 R 語言資料分析實務/張毓倫&陳柏亨
 
[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用[系列活動] 智慧城市中的時空大數據應用
[系列活動] 智慧城市中的時空大數據應用
 
[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單
[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單
[DSC 2016] 系列活動:吳牧恩、林佳緯 / 用 R 輕鬆做交易策略分析及自動下單
 
[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123
 

Similaire à A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase

Attack on graph
Attack on graphAttack on graph
Attack on graphScott Miao
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In HbaseGeorge Ang
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Prabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years ExperiencePrabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years ExperiencePRABHAKAR T
 
Prabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years ExperiencePrabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years ExperiencePRABHAKAR T
 
Analyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBaseAnalyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBaseWibiData
 

Similaire à A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase (20)

Attack on graph
Attack on graphAttack on graph
Attack on graph
 
מיכאל
מיכאלמיכאל
מיכאל
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In Hbase
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Prabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years ExperiencePrabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years Experience
 
Prabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years ExperiencePrabhakar_Hadoop_2 years Experience
Prabhakar_Hadoop_2 years Experience
 
Analyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBaseAnalyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBase
 

Plus de Chris Huang

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learningChris Huang
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10Chris Huang
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2Chris Huang
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Chris Huang
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)Chris Huang
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)Chris Huang
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2Chris Huang
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1Chris Huang
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)Chris Huang
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)Chris Huang
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)Chris Huang
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)Chris Huang
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)Chris Huang
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsChris Huang
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong heChris Huang
 
Social English Class HW4
Social English Class HW4Social English Class HW4
Social English Class HW4Chris Huang
 

Plus de Chris Huang (20)

Data compression, data security, and machine learning
Data compression, data security, and machine learningData compression, data security, and machine learning
Data compression, data security, and machine learning
 
Kks sre book_ch10
Kks sre book_ch10Kks sre book_ch10
Kks sre book_ch10
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...Applying Media Content Analysis to the Production of Musical Videos as Summar...
Applying Media Content Analysis to the Production of Musical Videos as Summar...
 
Wissbi osdc pdf
Wissbi osdc pdfWissbi osdc pdf
Wissbi osdc pdf
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 12,13)
 
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 10)
 
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 9)
 
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 2
 
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 8)part 1
 
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 7)
 
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 6)
 
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 4,5)
 
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 2,3)
 
重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)重構—改善既有程式的設計(chapter 1)
重構—改善既有程式的設計(chapter 1)
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Hw5 my house in yong he
Hw5 my house in yong heHw5 my house in yong he
Hw5 my house in yong he
 
Social English Class HW4
Social English Class HW4Social English Class HW4
Social English Class HW4
 

Dernier

%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Dernier (20)

%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase

  • 1. A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase Chris Huang, Scott Miao 2014/5/5
  • 2. Who are we • Scott Miao • Developer, SPN, Trend Micro • Worked on hadoop ecosystem since 2011 • Expertise in HDFS/MR/HBase • Contributor for HBase/HDFS • @takeshi.miao • Chris Huang • RD Manager, SPN, Trend Micro • Hadoop Architect • Worked on hadoop ecosystem since 2009 • Contributor for Bigtop • @chenhsiu48 Our blog ‘Dumbo in TW’: http://dumbointaiwan.blogspot.tw/
  • 4. New Unique Malware Discovered http://www.av-test.org/en/statistics/malware/
  • 5. Social Engineering vs. Cyber Attacks
  • 7. Layer of Protection Exposure Infection Dynamic Web, Emails File Signatures File Behaviors
  • 8. See The Threat Entity Connectivity
  • 9. Connectivity From Different Data Sources Threat Connect Sand- box File Detecti on Threat Web Web Reputa tionFamily Write- up TE Virus DB APT KB
  • 11. Threat Entities Relation Graph F D I DD D I I D DF F D I I I E E E E E E I I ID D D D F D D F I I F D E File IP Domain Email
  • 12. Most Entity Reputations are Unknown F D I DD D I I D DF F D I I I E E E E E E I I ID D D D F D D F I I F D E File IP Domain Email
  • 13. Security Solution Dilemma – Long Tail Prevalence Entities Known good/bad Traditional heuristic detection Big Datacan help! How can we detect the rest effectively?
  • 14. Inspired by PageRank • Too many un-visited pages! • Users browse pages through links • Let users’ clicks (BIG DATA) tell us the rankings of those un- visited pages!
  • 15. Revised PageRank Algorithm • Too many un-rated threat entities! • Malware activities interact with threat entitles • Let malware’s behaviors (BIG DATA) tell us the reputations of those un-rated threat entities! F D I DD D I I D DF F D II I E E E E E E I I ID D D D F D D F I
  • 17. The Problems • Store large size of Graph data • Access large size of Graph data • Process large size of Graph data
  • 18. Data volume • Dump ~450MB (150 bytes * 3,000,000 records) data into Graph per day – Extract from 3GB of data • Keep it for 3 month – ~450MB * 90 = ~40,500MB = ~39GB – With Snappy compression – ~20 - 22GB • Dataset – ~40,000,000 vertices and ~100,000,000 edges • Data query volume about hundreds of thousands per day
  • 20. Store
  • 21. Property Graph Model Name: Jack Sex: Male Marriage: true Name: Merry Sex: Female Marriage: true marries Name: Vivian Sex: Female Marriage: false Name: John Sex: Male Marriage: true Date: 2010/5/5 Name: Emily Sex: Female Marriage: true From a soap opera… https://github.com/tinkerpop/blueprints/wiki/Property-Graph- Model
  • 22. • We use HBase as a Graph Storage – Google BigTable and PageRank – HBaseCon2012 • Storing and manipulating graphs in HBase The winner is… Massive scalable ? Active community ? Analyzable ?
  • 23. Use HBase to store Graph data (1/3) • Tables – create 'vertex', {NAME => 'property', BLOOMFILTER => 'ROW', COMPRESSION => ‘SNAPPY', TTL => '7776000'} – create 'edge', {NAME => 'property', BLOOMFILTER => 'ROW', COMPRESSION => ‘SNAPPY', TTL => '7776000'}
  • 24. Use HBase to store Graph data (2/3) • Schema design – Table: vertex – Table: edge ‘<vertex-id>||<entity-type>’, ‘property:<property-key>@<property-value-type>’, <property-value> ‘<vertex1-row-key>--><label>--><vertex2-row-key>’, ‘property:<property-key>@<property-value-type>’, <property-value>
  • 25. Use HBase to store Graph data (3/3) • Sample – Table: vertex – Table: edge ‘myapps-ups.com||domain’, ‘property:ip@String’, ‘…’ ‘myapps-ups.com||domain’, ‘property:asn@String’, ‘…’ … ‘track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:path@String’, ‘…’ ‘track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:parameter@String’, ‘…’ ‘myapps-ups.com||domain-->host-->track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:property1’, ‘…’ ‘myapps-ups.com||domain-->host-->track.muapps-ups.com/InvoiceA1423AC.JPG.exe||url’, ‘property:property2’, ‘…’
  • 26. Keep your rowkey length short • With long rowkey length – It does not impact your query performance – But it does impact your algorithm MR • OutOfMemoryException • Use something like HASH function to keep your rowkey length short – Use the hash value as rowkey – Put the original value into a property
  • 27. Overall Architecture Src table snapshot Clone table Clone table Graph table snapshot Clone table Clone table HBase Source C Source B source A HDFS 1. Collect data 2. reprocess & dump data 3. Get Data Client 4. Process Data Algorithms Intermediate data On HDFS
  • 29. Preprocess and Dump Data • HBase schema design is simple and human- readable • It is easy to write your dumping tool if needed – MR/Pig/Completebulkload – Can write cron-job to clean up the broken-edge data – TTL can also help to retire old data • We already have a lot practices for these tasks
  • 32. Get Data (1/2) • A Graph API • A better semantic for manipulating Graph data – As a wrapper for HBase Client API – Rather than use HBase Client API directly • A malware exploring sample Vertex vertex = this.graph.getVertex(“malware"); Vertex subVertex = null; Iterable<Edge> edges = vertex.getEdges(Direction.OUT, “connect", “infect", “trigger"); for(Edge edge : edges) { subVertex = edge.getVertex(Direction.OUT); ... }
  • 33. Get Data (2/2) • We implement blueprints API – It provides interfaces as spec. for users to impl. • 824 stars, 173 forks on github – We can get more benefits from it • plug-and-play different Blueprints-enabled graph backends – Traversal language, RESTful server, dataflow, etc – http://www.tinkerpop.com/ – Currently basic query methods are implemented
  • 34. Clients • Real time Client – Client systems • they need associated Graph data for a specific entity via RESTful API – Usually retrieve two levels of graph data – Quick responsiveness supported by HBase • With rowkey random access and appropriate schema design • HTable.get(),Scan.setStartRow(), Scan.setStopRow() • Batch client – Threat experts – Pick one entity and how many levels interested in, generate a graph file format used by tools • To visualize and navigate what whether users interested in • Graph Exploring Tools – Threat experts – Find out sub-graphs by given criteria • E.g. How many levels or associated vertices
  • 35. Malware Exploring Performance (1/3) • one request – Use Malware exploring sample again – 1 vertex with 2 levels associated instances (2 ~ 9 vertices) • Dataset – 42,133,610 vertices and 108,355,774 edges • Total requests – 31,764 requests * 100 clients = 3,176,400 Vertex vertex = this.graph.getVertex(“malware"); Vertex subVertex = null; Iterable<Edge> edges = vertex.getEdges(Direction.OUT, “connect", “infect", “trigger"); for(Edge edge : edges) { subVertex = edge.getVertex(Direction.OUT); ... }
  • 37. Malware Exploring Performance (3/3) • Some statistics – Mean: 51.61 ms – Standard Deviation: 653.57 ms – Empirical rule: 68%, 95%, 99.7% • 99.7% of requests below 2.1 seconds • But response time variances still happen – Use Cache layer between client and HBase – Warm-up after new data come in
  • 40. • Human-readable HBase schema design – Write your own MR – Write your own Pig/UDFs • So we can write the algorithms to further process our graph data – To predict unknown reputation by known threats – E.g. a revised PageRank algorithm
  • 41. Data process flow Src table snapshot Clone table Data on HDFS Algorithms (MR, Pig UDF) Clone table Processed completed Graph table snapshot Clone table Clone table Intermediate data on HDFS HBase 1. Dump daily data 2. Take snapshot 3. Clone snapshot 4. Process data iteratively (takes hours) 4.1 generate Intermediate data 5. Process complete 6. Dump processed data with timerange
  • 42. A customized TableInputFormat (1/2) • One Mapper for one region by default – Each Mapper process too much data • OutOfMemoryException • Too long to process – Use small split region size ? • Will overload your HBase cluster !! • Before: about ~40 Mappers • After: about ~500 Mappers
  • 43. A customized TableInputFormat (2/2) Clone Graph Table MR - Pick Candidates Combination Candidates list file <encodedRegionName>t<startKey>t<endKey> … d3d1749f3486e850b263c7ecb2424dd3tstartKey_1tendKey_1 d3d1749f3486e850b263c7ecb2424dd3tstartKey_2tendKey_2 d3d1749f3486e850b263c7ecb2424dd3tstartKey_3tendKey_3 Cd91c08d656a19bdb180e0b7f8896575tstartKey_4tendKey_4 Cd91c08d656a19bdb180e0b7f8896575tstartKey_5tendKey_5 … CustTableInp utFormat MR - algorithm 2. Scan table3. Output candidates 4. Load candidates 5. Run Algo. with more Mappers 1. Run MR
  • 44. HGraph • A project is open and put on github – https://github.com/trendmicro/HGraph • A partial impl. released from our internal project – Follow HBase schema design – Read data via Blueprints API – Process data with our pagerank default impl. • Download or ‘git clone’ it – Use ‘mvn clean package’ – Run on unix-like OS • Use windows may encounter some errors
  • 46. Experiment Result • Testing Dataset – 42,133,610 vertices and 108,355,774 edges – 1 vertex usually associates 2 ~ 9 vertices – 4.13% of the vertices are known bad – 0.09% of the vertices are known good – The rests are unknown • Result – Runs 34hrs for running 23 iterations. – 1,291 unknown vertices are ranked out – Top 200 has 99% accuracy (explain later)
  • 47. Suspicious DGA Discovered • 3nkp***cq-----x.esf.sinkdns.org – 196 domains from Domain Generated Algorithms https://www.virustotal.com/en/url/871004bd9a0fe27e61b0519ceb8457528ea00da0e7ffdc44d98e759ab3e3caa1/analysis/
  • 48. Untested But Highly Malware Related IP • 67.*.*.132 – Categorized as “Computers / Internet”, not tested https://www.virustotal.com/en/ip-address/67.*.*.132/information/
  • 49. Prevalence Entities Knowngood/bad Traditionalheuristicdetection IntelligencefromBigDataAnalyt (90%) Security in Old Days Cannot Protect What You Cannot See Next Generation Security Unleash the Power of Data Discover What We Don’t Know
  • 50. Q&A
  • 53. Property Graph Model Definition • A property graph has these elements – a set of vertices • each vertex has a unique identifier. • each vertex has a set of outgoing edges. • each vertex has a set of incoming edges. • each vertex has a collection of properties defined by a map from key to value. – a set of edges • each edge has a unique identifier. • each edge has an outgoing tail vertex. • each edge has an incoming head vertex. • each edge has a label that denotes the type of relationship between its two vertices. • each edge has a collection of properties defined by a map from key to value.
  • 54. About regions • Keep reasonable amount of regions for each regionserver • Notice your splitted regions from one table – Dump data daily, cause regions splitting – Make sure your regions scattered evenly on each regionserver <hbase.regionserver.global.memstore.upperLimit> / <hbase.hregion.memstore.flush.size> = <active-regions-per-rs> e.g. (10G * 0.4) / 128MB = 32 active regions HBase Sizing Notes by Lars George